Dec 272016
 

Twitter Heron is an alternative to Apache Storm with compatible API as it was developed from Apache Storm. The major difference I found was that Heron opts to get rid of Threads and going with Process based workers. This is surprising and I am not sure the any advantage it would have over Threads+Process.

Heron also has done away with DRPC which is the one thing I need to provide direct access to a Storm cluster. I haven’t seen an alternative being mentioned.

Most of everything else is same as Storm except the sales page for Heron is better of course 🙂 The reason to explore Heron is ease of Production use and DevOps. It’s kind of difficult with Storm but Azure HDInsight might help as it has now Apache Storm 1.x supported in their HDInsight 3.5 version.

I hope to learn more from users of Storm and Heron who have been operating it in Production on what’s the pros and cons.

 

Dec 162016
 

I have been getting familiar with Java a lot more than I had planned to in my career. It so happens I need to develop a standard set of APIs for public consumption and I happened upon Swagger.io which is an API Framework/Guideline. Its similar to WSDL. It allows you to auto-generate code from Contract-first Document. So you could write what your JSON looks like and Swagger will generate code. I noticed it supported tons of JAX-RS framework. Now JAX-RS is another API-Spec for Java https://jax-rs-spec.java.net/ and it works so well together. There isn’t any official bridge between Swagger and JAX-RS but I was sold. So were other folks in my team. We can generate both server and client code from API specification. Ofcourse we need to do the Business Logic but none of the grunt work.

So JAX-RS is what I am going with. It’s perfect for someone coming from Python Flask/Django world. I suggest this good tutorial videos on JAX-RS with jersey done by Koushik, from Java Brains. You can find the playlist here https://www.youtube.com/playlist?list=PLqq-6Pq4lTTZh5U8RbdXq0WaYvZBz2rbn There is also advanced JAX-RS which I haven’t checked out yet, but good to know if I need more info. Even though I have production APIs serving millions of users every day, there is so much I don’t know and looking for answers always surprises me.

I am currently using Azure API services for deploying the JAX-RS services and it’s quite a easy as they support git push.

The reason for getting in depth in to JAX-RS is so i can do contract last and annotate for Swagger. But it really depends on your requirements and project

Swagger is not covered in the videos. I will post about it once I have worked out a sample.

Edit: This is the best sample project I have found https://github.com/swagger-api/swagger-samples/tree/master/java/java-jersey-jaxrs

It has everything you need. A JAX-RS sample project with Swagger annotations. I recommend this way, JAX-RS->Swagger, instead if you are not going Contract First method

Jun 082016
 

Finally I signed up for Google Cloud free trial. I wanted to setup GamingIO.com game servers in a way that I could manage it better. I found AWS EC2 was not up to the task and to get even a usable performance for a single game instance I would need to shell out 25$ a day at the least. I dropped it there and then, save for events I used to run for charitable causes.

Back to GCloud. Here I ran the same test using the small instance and was able to launch and play a decent CS:GO match for 10 players on it including bots. I would go as far as to say that even a micro instance on Gcloud can get you that performance. Just the way it should be. I used the Compute Engine and regular Disk (not SSD) for setting up the Steam Dedicated servers.

Compared to hosted Game servers that charge $5 a month for 10 slots/players/bots, I think folks could do better with this method. You could go upto 20 players.

To be fair to AWS, it has tons of tenants sharing that vCPU. GCloud maybe underselling to make that performance look good for now. Only time will tell how far we can go.

I did sign up for Azure Cloud but never got around to testing it. May be another time.

Apr 282016
 

After years of not having to deal with AWS except for my pet projects , I decided to visit the AWS Summit in Singapore. Its right here, I thought great time to join and learn some Lambda as well as Lumberyard/gamelift. Suffice to say, I was disappointed all together at the future of Amazon Web services in general.

The backstory is where I stand now and this person captures it completely http://openmymind.net/Why-I-Dislike-ec2/    . tl:dr AWS is VERY expensive and it gets a lot more expensive every year and you don’t even see it.

I considered the fact that microservices like Lambda are great way to move forward. Unfortunately AWS is getting on a high horse right now. Great ideas but …

AWS is over, in my opinion. The hype, the love, the innovation…it’s old. It’s just that in Asia things are slow as a wooden raft. I see all the marketing hype picking up and over 2000 strong crowd showed up in undermanned exhibitor booths of the same 8 year old tech. underwhelming is an understatement.

What’s the right now?

Well, AWS has taught everyone in the Computer and Software Architecture world a lot of things and it’s still a great starting point. Like signing up with that 5$ web hosting provider to put a php page up. It’s getting less complex and massively multi-tenanted. It’s a great business challenge solved. There will always be space for PB scale data and number crunching challenges to solve. Thats always going to be AWS.

What CTO’s should look into is how to take this shared web hosting with awesome (although locked-in API) and abstract it away. Implement the same in private clouds , dedicated hardware  that is built on open standards. OpenStack & Cloudstack had the right idea years ago. I want to use the Dedicated managed or unmanaged solution with bare metal performance and the same “designed-for-failure” architecture as AWS.

If I refer back to my work with Razer Synapse and for years before that, I have always avoided vendor lock in to as far as possible. By abstracting layers of the code and making them stateless, independent of underlying platform. For e.g. S3 was written with wrapper class that can do FileStore, ObejctStore via Ceph or 3rd parties  or S3. Provider and IaaS independent. There was no elastic beanstalk but a combination of Python Fabric and Shell scripting.

I had used DynamoDB in 2011-12 early days before it was  GA. It was great, limited but great. I found I was just as happy if not happier with CouchDB and Redis on a multihomed setup with failover.

Still love AWS and by no means this is a rant or simply because I work for a competitor. Personally I believe we are complimentary. If the approach of today is anything to go by, then we have our work cut out and AWS is going to be left behind despite being pioneers.

I am going to start writing again, mostly rants against fat tech but alternatives to AWS/IaaS/Paas and the likes.

 

P.S. I couldn’t join the developer breakout for the lamba and stuff I wanted to give a whirl. I decided not to because the event people were being extremely Kafkaesque and difficult with their rules and “maybe”. You need to que and wait, maybe you can join, maybe you can’t, if you go out you maybe can’t come back in….. But there are videos uploaded later “The Hell!, why didnt you guys just say so before I decided to take a day off and come down” . So I walked out extremely pissed.

 

 

 

Sep 252015
 

Sometime ago I wrote about my backup woes. I have used backblaze and crashplan simultaneously for backups of my PC and Crashplan for my Mac/Linux. While I could have easily created my own backup with Amazon S3 or Google Drive or One drive et al, a cost analysis shows that Backblaze was the MOST cost effective.

Today I for news from Backblaze (and they don’t pay me, its the other way around) that they are offering Storage as a Service similar to S3. As in , it’s not a Drive in itself but its similar to S3.

Comparison:

here is a comparison on Backblaze site https://www.backblaze.com/b2/cloud-storage-providers.html

This is incredible and it is huge. What does it mean? Is Backblaze a real competitor to S3?

For now Backblaze is known to have terrible connectivity. Its alright for slow backups that take 1 year for 4TB but S3 is great due to it’s regional support. S3 is also proven. There is no doubt there will be downward pressure on providers as more “proper”competition comes into play. Right now B2 service from backblaze isnt directly comparable to Drive services since it still needs to be built on top of B2. In comparison to S3 it’s a great product. If you are looking to put up files via CDN then Backblaze might just do the trick.

 

How to migrate?

There is no migration tool like we had for Dreamhost DreamObjects (which suffers a lot of downtime just like their hosting) which made the job easier. However you might be pleased to know that you can possibly translate your S3 code to B2. It seems to me that they might be using Ceph in which case you could use your Code as is. The API however seems to use a different methods.

I am yet to test it but it’s worth a shot to put up a test project if you have time. I have just been burned too many times by these “me too S3” services that I don’t want to waste time on these. It’s wait and watch for now.

possible pitfalls:

On beta.3ezy.net I used ACL extensively to control access to files for individual users. It fine grained and recursive permission code was not easy to write but knowing S3 it worked quite well. It will not be possible to do so on B2 yet.

If I do have a working test code with B2 i will post back with my findings.

Apr 092015
 

Ever since I found Backblaze, and I don’t know why I did not find it earlier, I have been quite happy with money well spent. I currently have 4 USB drives apart from 2.5TB internal space. This totals to a lot of space and I only use 4Tb of it. This is my home setup. I have tons of VMs and development stuff lying around. I recently lost a fairly new Western Digital Drive of 2TB for no good reason. It just died…. like physically. Taking 1.5TB of VMs and snapshots. Terrible day as I spent 2-3 hours opening the drive up and confirming it was completely dead. I should have seen the signs…

I decided online backup was the way to go. While slow atleast it would keep my data recoverable without adding more drives. I looked at Amazon S3 as my first Choice. However with Terrabytes of data the cost would become VERY VERY prohibitive. I don’t mind paying but the difference between buying drives and backup has to be equal or less to be economic. I could just as easily built a raid 1 by buying lots of cheap drives and putting them into a NAS for backup.

After (a few) quick Google search revealed BackBlaze. A lot of Sysadmins around swear by it. I took a look at what these guys do and this post and knew they try very hard to make data backup affordable and reliable.

Their plan amazed me. US$50 for 1 year of unlimited backup per system and it includes attached drives. Secondly the ability to ship drives of your data back to you. While I have 1GBps connection it would still be slow to get a few Terrabytes from across the world and it is not going to be healthy use of time.

I downloaded the trial. I had some issues initially getting it running because the interface isn’t very intuitive but once you know what’s where it’s pretty much on it’s own as it starts its first back up. So off I want happily to sign up and backup everything.

Unfortunately the upload speed was painfully slow…. somewhere in the region of 200-500KB/s at best. It would take a whole 3 years to transfer 5TB of data if not more. I emailed support to ask if there is option for multiple parallel upload since their application uses only ONE upload thread. They replied back that they would “soon” be releasing the multithreaded upload/download. I waited and 3 weeks later I was still left with over 90% of the volume of my files. However Backblaze was smart enough to keep the larger files for the last. The number of at-risk files were a few thousand. I usually keep my PC on at all times but I dont mind restarting now and then due to forced Windows 8 updates.This time I have had it on since Backblaze for almost 21 days non-stop. Something is better than nothing I say.

 

I was hypothesizing (read:daydreaming)  upon what could change in my life over the next 3 years as the first backup actually started to come to a closure…. what would I be doing on the day when I  discover it actually completed. Perhaps I would celebrate and then find out a few more files have been added to the never ending list…. or may be Joe from support was telling the truth

I hate their site for finding any new information so I googled for the answer once again hoping someone made a workaround…. I could not believe the search result… It has the word “Multithreaded”. In my entire programming life , MT has not made me as happy as it did now.

so the GOOD news! The latest version of Backblaze enabled multi threaded uploads. So despite the physical distances between my BackBlaze and my local desktop, the conspiring bast**d of a  RTT I could now upload in parallel and use the max of my 500Mbps upload speed. Well anything better than the 2Mbps I was getting normally anyway 🙂

Excited. I could not wait and installed the latest version. Sure enough the option was there as per their pitch. With one thread my transfer was shown as 2.35 Mbps which I can easily confirm with other tools and my UBNT router.

With the thread set to 4 backups were already Flying. The file names whizzed by in the Backblaze control panel. Still small files though as the program had found new tiny useless fragments of my entire Raspberry Pi mirror to backup again. Who cares anyway. The only  I could tell if the speed was actually being maximized was when large files would be transferred. I was already seeing an improvement by 10x times

I am currently able to get a consistent 25-30 Mbits/ Sec.  The next step is only to contact my ISP MyRepublic and get then to do a better job of this. A Lower RTT could mean the world of difference.

Apr 062015
 

I am bored. I have decided to code for WordPress but instead of using the regular code base I am going to write a fork which is AWS specific or rather cloud specific API like Ceph Object storage API if possible. You could theoretically  re use the code base from AWS S3 on to other platforms like DreamObjects etc. There are a few wordpress plugins around that copy your images and files to S3 and do basic CDN. But they are inadequate and buggy. For e.g. on www.GamingIO.com I am using a plugin for pushing to S3 but there are files locally and it defeats the purpose. Not to mention that due to a bug the image resize functionality in WordPress seems to create several duplicate copies of each image. I had to delete all the images to make space on the site and recovered 50% of it. I got a few broken links for all the trouble. But it was free plugin…what did I expect.

When I was digging through WordPress I noticed how much database it uses. Every post has a version and it all goes to DB. This is great for small sites not looking at more than 200 posts in it’s lifetime. Not so much for news publishing where we had 3 people doing 6-9 posts a day with several revisions. That crapped the database. With Application accelerators like Akamai, Cloudflare or even Cloudfront it isn’t so hard to reduce Disk IO on the images but Database is the one thing that kills it all. This is where S3 can be useful. It’s what I want to start with.

My setup.

So far….I have looked at the awesome guide for wannabe WordPress Developers here . Since I am familiar with WordPress and have played with a bit I went ahead with setting up my dev.
I chose to go with Turnkey WordPress even though I hate that they have all these API keys etc to link their VM to their “Cloud”. Not to mention for a development environment it lacks any repository management tools to be able to merge and update code. Seems like it wasn’t necessary but well…

Back to square one I decided to check out code on to my dev machine which has a ready LAMP setup. Did a simple,

svn co http://core.svn.wordpress.org/trunk/

An edit of config file, populate test DB  and it’s all good. I could load the wordpress just fine.

Since the dev machine is connected via Samba to my regular Windows do-it-all desktop I can access the files and load it in my editor. Normally I use Notepad++ since it doesn’t crap my directory structure but this time I am going with Eclipse. Yes I am solid broke after attending Black Hat Asia 2015 (more on that in another post). Even if I had cash to spare I would prioritize my PyCharm License.

Forward.

It might be difficult to change the wordpress core and keep up with nightly builds. I might just end up with the Plugin but hopefully it won’t be a disaster.

 

 

 

 

Dec 272014
 

What is Offloading?

Offloading is in gist, transferring the resource consumption from one infrastructure to another without affecting the overall functionality of the service and retaining full control over the transactional components.

Therefore offload is never 100% unless you have 100% static website. However it can be close to 100% even for dynamic sites.

We know that using a CDN like Akamai (or what Akamai calls Dynamic Site Accelerator) for Static files etc on a website can help offload much of the work from the Servers. This can provide significant load reduction in the Data Center. This adds more scalability to your infrastructure and offers better experience to the end user. Akamai offers a report to elaborate on the offload but it can only show you the offload of traffic passing via Akamai and hence it’s the Network offload. This does translate to some Disk I/O and CPU offload on your Data Center usage. However it may not be very significant.

Akamai offers an inclusive service for Logging the access logs on their own 100% SLA Storage which is extremely underrated and generally ignored. As I mentioned in my previous post about how logs can bring down your own servers this logging service can actually help. Here is an example of my setup with apache2 web server and already on a CDN. 

The Test

at 10 req/second via CDN as measured by GA my test server is using the following resources

CPU: 105% avg to 146% on a Xeon Quad Core class CPU.

Disk I/O: 14MB/s overall (mostly apache2) Apache2 alone is using 10MB/s (thats Megabytes)

Memory: 3.3 GB (mostly for apache2 alone.)

Apache workers: 21 , process count 43

Linode longview showing high cpu/disk and network usage

 

I disabled logging altogether for the site and kept the Logs only for Critical events. For example 503 Errors would log here. The error logs feed the fail2ban service hence they are needed to dynamically block IPs attempting funny stuff. Akamai does offer the same at the edge as well but I am not using that. I disabled the logs because all the logs are already available on Akamai in Apache common log format with additional fields like referrer , cookies and full headers (if you need it) and it has zero impact on the service as it is all offloaded to Akamai.

Folks, Data Transfer is cheap but CPU and memory is not. When you get a service like Akamai you cannot rely on them alone to solve all your problems. If you are not being charged additional for the CPU usage you might as well make the most of it and maximize offload. Here is what I get after disabling server/load balancer logs.

At now 45 req/sec (so more than 4 times the original)

CPU: 10% (average)

Apache Workers: 7

Memory: 900MB average (again mostly Apache2)

Apache Workers: 7-9 Process count: 21

DISK I/O: 7 KB/s for apache2 and (1.5 MB/sec average overall)

Ok, the DISK IO needs more explanation. The other processes like Database server is also on the same host and they all are using the same resource constrained mechanical disk. When Apache starts using 10000 KB/s it was causing a race condition requiring longer times for other processes to complete their transactions. Now with Web server Disk I/O out of the picture the bottleneck is significantly reduced. The same impacts CPU indirectly.

See for yourself.


2014-12-27_08h50_29

Note that , by the time I took the screenshot the traffic had moved up to 75 req/sec. Normally this would require aggressive caching or adding Nodes. However this time I had to do neither.

The solution is there but it is never actually used by most people. I am hoping it would change once more SysAdmins get to this. And to imagine the time folks spent on Database Caching, memcache and stuff.

 

Tools and Services used:

Linode
Linode LongView
Apache2, CentOS, Apache2 with mod_status
Akamai DSA with Netstorage logging.

 

Feb 072014
 

I was trying out the new PHP SDK for AWS and relearning the methods. While it works well on AWS I was a bit confused with DHO documentation and AWS docs. I wanted to use the SDK but with DHO.

There are advantages to the new library which you can explore. Some of the tasks we programmed manually are already done for you in the library using latest stuff in PHP 5.3 like composer, guzzle etc libraries and tools

After a couple of trial and error in the values I got things working. Here are the samples in case it helps someone else.

AWS Docs: http://docs.aws.amazon.com/aws-sdk-php/guide/latest/quick-start.html

DHO Docs: http://docs.dreamobjects.net/s3-examples/php2.html

 

require('vendor/autoload.php');

use AwsS3S3Client;
use AwsS3ExceptionS3Exception;
use AwsS3EnumCannedAcl;
// Instantiate the S3 client with your AWS credentials
$s3 = S3Client::factory(array(
    'key'    => 'from DH panel',
    'secret' => 'from DH panel',

    'base_url' => 'https://objects.dreamhost.com',
));

$blist = $s3->listBuckets();
echo "   Buckets belonging to " . $blist['Owner']['ID'] . ":
";
foreach ($blist['Buckets'] as $b) {
    echo "{$b['Name']}	{$b['CreationDate']}
";
}

try {
$result = $s3->putObject(array(
    'Bucket'     => '<mybucket>',
    'Key'        => 'data_from_file.txt',
    'SourceFile' => './test.txt',
    'ACL'        => CannedAcl::PUBLIC_READ,
));
}
catch (Exception $e) {
    echo "There was an error uploading the file. 
";

}

 

Uploading entire directory but new objects only. It’s not perfect when used with DHO. sometimes I get missing folders when trying multiple folders. Start with the class instance as above and the following code shows how to upload differences from the directory to the target. The difference part does work well. Tested with 2000 files and 4 folders only 1 level deep.

 

try {
	$dir = 'folder1/';
$bucket = '<muhbuckit>';
$keyPrefix = 'myprefix/';

$s3->uploadDirectory($dir, $bucket, $keyPrefix, array(
    'params'      => array('ACL' => 'public-read'),
    'concurrency' => 20,
    'debug'       => true
));
}
catch (S3Exception $e) {
    echo "There was an error uploading the file. 
";
}

The concurrency works well on DHO as well. If you encounter issues with setting up the SDK do post and I will try to assist. With composer it’s quite easy but for old timers like me it might take awhile to wrap your head around on how easy it is.  I’ll post more samples as and when I write and use them.

Dec 202012
 

I have had this site hosted on Redhat Openshift for almost an year. Considering I got this hosting free (and you can too) I was a bit apprehensive of whether I should move the site elsewhere. I let it be here anyway. It was surprising that not only do I get awesome performance but the uptime has been incredible. I was going to set this blog up on an AWS EC2 micro instance.  However micro instance being what it is would have costed as much as a Linode VPS and would be a time shared CPU. This can get very annoying as you find that on and off the site would be slower.

Redhat openshift offers a free micro instance equivalent, the difference being that you are probably on a much bigger Instance since the PaaS run atop of AWS cloud making this setup akin to a VPS. This makes more sense instead of spending on a Small instance or settling for micro. In fact I don’t recommend Micro instance at all for any purpose other than testing, compiling or other such process where on-demand CPU is not necessary.

Comparatively my Linode VPS which costs me $40 a month is not doing as well in terms of performance when I use it to host WordPress sites. I am still not clear as to why the memory usage and swap is higher on Linode VPS (perhaps the CPU is over subscribed) but this Openshift instance is at 512 MB Ram and doing just great.

If you are a developer who does not want to get into the hassle of setting up servers and services and just want to get down to coding your stuff I recommend you give Redhat Openshift a try. You will not be disappointed. Specially if you build sites for your clients.

Considering the way things are changing with PaaS and Cloud , the price being what it is, I am thinking why do I still put up with my Dreamhost account which is barely usable and hosts 1000s of users and sites on a single server. I coulld not even do a basic PHP development and test on it. Same goes for pretty much any webhost or reseler like Godaddy, Media temple and blah blah.

 

Since I also use Google App Engine for development and learning it is worth adding why you would choose Redhat over App Engine. Familiarity is possibly number 1. Known that App Engine support MySQL now but it remains that you have shell access to your instance on Openshift much as you would on your own instance. You can also access some basic metrics and new services are being built all the time on Open Shift. Check out the Websockets beta here https://openshift.redhat.com/community/blogs/newest-release-websockets-port-forwarding-more recently launched.

 

My favorite Python web framework Flask is effing supported as well https://openshift.redhat.com/community/get-started/flask . I cannot describe how much pain is involved in hosting these python apps on just about any distro. I think I am going to setup my Flask sites over at Openshift. Ofcourse Django is supported. I have tried neither of them.

 

Now that I am confident about Openshift here are the things I would like to learn to get to production deployment of my Python projects.

  1. How do I add SSL certificates
  2. How do I enable autoscale ( I suppose this is just to do with your AWS account and ofcourse you have to Openshift Enterprise for it it seems)
  3. How do I use my existing RDS with openshift (and securely)
  4. I am sure there is a whole bunch of things I haven’t thought of yet.
All of the above is in the docs somewhere.
Concern: Given that Redhat’s own Enterprise support is not highly regarded among devs and ops, I wonder what does Openshift “Enterprise” will do for us?
OpenShift Enterprise
truth is I do not have experience with Redhat Enterprise Support or with OpenShift Enterprise Sservice. Question is, should I be the one to take the first hand experience myself? Or to bet my job on it? That is going to be some daring. xD