Apr 092015
 

Ever since I found Backblaze, and I don’t know why I did not find it earlier, I have been quite happy with money well spent. I currently have 4 USB drives apart from 2.5TB internal space. This totals to a lot of space and I only use 4Tb of it. This is my home setup. I have tons of VMs and development stuff lying around. I recently lost a fairly new Western Digital Drive of 2TB for no good reason. It just died…. like physically. Taking 1.5TB of VMs and snapshots. Terrible day as I spent 2-3 hours opening the drive up and confirming it was completely dead. I should have seen the signs…

I decided online backup was the way to go. While slow atleast it would keep my data recoverable without adding more drives. I looked at Amazon S3 as my first Choice. However with Terrabytes of data the cost would become VERY VERY prohibitive. I don’t mind paying but the difference between buying drives and backup has to be equal or less to be economic. I could just as easily built a raid 1 by buying lots of cheap drives and putting them into a NAS for backup.

After (a few) quick Google search revealed BackBlaze. A lot of Sysadmins around swear by it. I took a look at what these guys do and this post and knew they try very hard to make data backup affordable and reliable.

Their plan amazed me. US$50 for 1 year of unlimited backup per system and it includes attached drives. Secondly the ability to ship drives of your data back to you. While I have 1GBps connection it would still be slow to get a few Terrabytes from across the world and it is not going to be healthy use of time.

I downloaded the trial. I had some issues initially getting it running because the interface isn’t very intuitive but once you know what’s where it’s pretty much on it’s own as it starts its first back up. So off I want happily to sign up and backup everything.

Unfortunately the upload speed was painfully slow…. somewhere in the region of 200-500KB/s at best. It would take a whole 3 years to transfer 5TB of data if not more. I emailed support to ask if there is option for multiple parallel upload since their application uses only ONE upload thread. They replied back that they would “soon” be releasing the multithreaded upload/download. I waited and 3 weeks later I was still left with over 90% of the volume of my files. However Backblaze was smart enough to keep the larger files for the last. The number of at-risk files were a few thousand. I usually keep my PC on at all times but I dont mind restarting now and then due to forced Windows 8 updates.This time I have had it on since Backblaze for almost 21 days non-stop. Something is better than nothing I say.

 

I was hypothesizing (read:daydreaming)  upon what could change in my life over the next 3 years as the first backup actually started to come to a closure…. what would I be doing on the day when I  discover it actually completed. Perhaps I would celebrate and then find out a few more files have been added to the never ending list…. or may be Joe from support was telling the truth

I hate their site for finding any new information so I googled for the answer once again hoping someone made a workaround…. I could not believe the search result… It has the word “Multithreaded”. In my entire programming life , MT has not made me as happy as it did now.

so the GOOD news! The latest version of Backblaze enabled multi threaded uploads. So despite the physical distances between my BackBlaze and my local desktop, the conspiring bast**d of a  RTT I could now upload in parallel and use the max of my 500Mbps upload speed. Well anything better than the 2Mbps I was getting normally anyway 🙂

Excited. I could not wait and installed the latest version. Sure enough the option was there as per their pitch. With one thread my transfer was shown as 2.35 Mbps which I can easily confirm with other tools and my UBNT router.

With the thread set to 4 backups were already Flying. The file names whizzed by in the Backblaze control panel. Still small files though as the program had found new tiny useless fragments of my entire Raspberry Pi mirror to backup again. Who cares anyway. The only  I could tell if the speed was actually being maximized was when large files would be transferred. I was already seeing an improvement by 10x times

I am currently able to get a consistent 25-30 Mbits/ Sec.  The next step is only to contact my ISP MyRepublic and get then to do a better job of this. A Lower RTT could mean the world of difference.

Apr 062015
 

I am bored. I have decided to code for WordPress but instead of using the regular code base I am going to write a fork which is AWS specific or rather cloud specific API like Ceph Object storage API if possible. You could theoretically  re use the code base from AWS S3 on to other platforms like DreamObjects etc. There are a few wordpress plugins around that copy your images and files to S3 and do basic CDN. But they are inadequate and buggy. For e.g. on www.GamingIO.com I am using a plugin for pushing to S3 but there are files locally and it defeats the purpose. Not to mention that due to a bug the image resize functionality in WordPress seems to create several duplicate copies of each image. I had to delete all the images to make space on the site and recovered 50% of it. I got a few broken links for all the trouble. But it was free plugin…what did I expect.

When I was digging through WordPress I noticed how much database it uses. Every post has a version and it all goes to DB. This is great for small sites not looking at more than 200 posts in it’s lifetime. Not so much for news publishing where we had 3 people doing 6-9 posts a day with several revisions. That crapped the database. With Application accelerators like Akamai, Cloudflare or even Cloudfront it isn’t so hard to reduce Disk IO on the images but Database is the one thing that kills it all. This is where S3 can be useful. It’s what I want to start with.

My setup.

So far….I have looked at the awesome guide for wannabe WordPress Developers here . Since I am familiar with WordPress and have played with a bit I went ahead with setting up my dev.
I chose to go with Turnkey WordPress even though I hate that they have all these API keys etc to link their VM to their “Cloud”. Not to mention for a development environment it lacks any repository management tools to be able to merge and update code. Seems like it wasn’t necessary but well…

Back to square one I decided to check out code on to my dev machine which has a ready LAMP setup. Did a simple,

svn co http://core.svn.wordpress.org/trunk/

An edit of config file, populate test DB  and it’s all good. I could load the wordpress just fine.

Since the dev machine is connected via Samba to my regular Windows do-it-all desktop I can access the files and load it in my editor. Normally I use Notepad++ since it doesn’t crap my directory structure but this time I am going with Eclipse. Yes I am solid broke after attending Black Hat Asia 2015 (more on that in another post). Even if I had cash to spare I would prioritize my PyCharm License.

Forward.

It might be difficult to change the wordpress core and keep up with nightly builds. I might just end up with the Plugin but hopefully it won’t be a disaster.