Dec 272016
 

Twitter Heron is an alternative to Apache Storm with compatible API as it was developed from Apache Storm. The major difference I found was that Heron opts to get rid of Threads and going with Process based workers. This is surprising and I am not sure the any advantage it would have over Threads+Process.

Heron also has done away with DRPC which is the one thing I need to provide direct access to a Storm cluster. I haven’t seen an alternative being mentioned.

Most of everything else is same as Storm except the sales page for Heron is better of course 🙂 The reason to explore Heron is ease of Production use and DevOps. It’s kind of difficult with Storm but Azure HDInsight might help as it has now Apache Storm 1.x supported in their HDInsight 3.5 version.

I hope to learn more from users of Storm and Heron who have been operating it in Production on what’s the pros and cons.

 

Dec 242016
 

I have been scouring the internet for opinions and alternatives to what Java has to offer in terms of both ORM and non-ORM for database access. Unlike Python, I do not clearly know what is out there for Java. Rails is easy, if you use Rails you use ActiveRecord and forget about anything else. Hibernate seems to be much-used historically for Java and Spring framework as ORM layer. In any case, for building small apps and micro-services which are data centric it would be overkill to use ORM. Unlike Rails/Ruby, Java projects are much more active and well maintained.

Found an interesting thread on Reddit discussing alternatives to ORM even though the ask was about ORM. I found the following possible non-ORM alternatives that could come in handy.

JDBC : Pure database and SQL query part of Java API. Verbose (which Java tends to be anyway) but no frills. This is the direct interface to Database using the API standard. I think it’s important to learn this way first anyway plus it’s always going to work if you get it right. Where complex POJO are not required , JDBC is the best option.

JDBI: An interesting wrapper to JDBC that makes it easier to use JDBC and less verbose. Surely worth a look and the DAO layer is all we need mostly to not have to write the same queries over and over again.

JOOQ: Java 8 lambda supporting interface to JDBC much like JDBI but with lot more features including code generation from database. A lot of focus on functional programming and lambdas. Good reason for it as the author also created lambdaj for Java 5. It has neat data binding practices which I would certainly consider for SparkJava. It is commercial though OSS. it is still free for non-commercial databases. So if you use MariaDB or PostgresSQL you’re fine. Overkill for small apps as some suggest but it’s worth mentioning.

SQL2o – I only discovered it when I was evaluating SparkJava and it is amazing. It is as low footprint as JDBC but adds Java class mapping, extremely terse. This could be a go to instead of JDBC. Try-with-resources for added ease of use.  Definitely looking like my go-to after JDBC

ORMLite – It’s a lightweight ORM. Commonly used for Android apps but can be used as server side just the same. Weird licensing so I am unlikely to use it.

Once I run out of JDBC which is likely as I write more POJO to save objects, I might consider SQL2O

Dec 162016
 

I have been getting familiar with Java a lot more than I had planned to in my career. It so happens I need to develop a standard set of APIs for public consumption and I happened upon Swagger.io which is an API Framework/Guideline. Its similar to WSDL. It allows you to auto-generate code from Contract-first Document. So you could write what your JSON looks like and Swagger will generate code. I noticed it supported tons of JAX-RS framework. Now JAX-RS is another API-Spec for Java https://jax-rs-spec.java.net/ and it works so well together. There isn’t any official bridge between Swagger and JAX-RS but I was sold. So were other folks in my team. We can generate both server and client code from API specification. Ofcourse we need to do the Business Logic but none of the grunt work.

So JAX-RS is what I am going with. It’s perfect for someone coming from Python Flask/Django world. I suggest this good tutorial videos on JAX-RS with jersey done by Koushik, from Java Brains. You can find the playlist here https://www.youtube.com/playlist?list=PLqq-6Pq4lTTZh5U8RbdXq0WaYvZBz2rbn There is also advanced JAX-RS which I haven’t checked out yet, but good to know if I need more info. Even though I have production APIs serving millions of users every day, there is so much I don’t know and looking for answers always surprises me.

I am currently using Azure API services for deploying the JAX-RS services and it’s quite a easy as they support git push.

The reason for getting in depth in to JAX-RS is so i can do contract last and annotate for Swagger. But it really depends on your requirements and project

Swagger is not covered in the videos. I will post about it once I have worked out a sample.

Edit: This is the best sample project I have found https://github.com/swagger-api/swagger-samples/tree/master/java/java-jersey-jaxrs

It has everything you need. A JAX-RS sample project with Swagger annotations. I recommend this way, JAX-RS->Swagger, instead if you are not going Contract First method

Dec 272014
 

What is Offloading?

Offloading is in gist, transferring the resource consumption from one infrastructure to another without affecting the overall functionality of the service and retaining full control over the transactional components.

Therefore offload is never 100% unless you have 100% static website. However it can be close to 100% even for dynamic sites.

We know that using a CDN like Akamai (or what Akamai calls Dynamic Site Accelerator) for Static files etc on a website can help offload much of the work from the Servers. This can provide significant load reduction in the Data Center. This adds more scalability to your infrastructure and offers better experience to the end user. Akamai offers a report to elaborate on the offload but it can only show you the offload of traffic passing via Akamai and hence it’s the Network offload. This does translate to some Disk I/O and CPU offload on your Data Center usage. However it may not be very significant.

Akamai offers an inclusive service for Logging the access logs on their own 100% SLA Storage which is extremely underrated and generally ignored. As I mentioned in my previous post about how logs can bring down your own servers this logging service can actually help. Here is an example of my setup with apache2 web server and already on a CDN. 

The Test

at 10 req/second via CDN as measured by GA my test server is using the following resources

CPU: 105% avg to 146% on a Xeon Quad Core class CPU.

Disk I/O: 14MB/s overall (mostly apache2) Apache2 alone is using 10MB/s (thats Megabytes)

Memory: 3.3 GB (mostly for apache2 alone.)

Apache workers: 21 , process count 43

Linode longview showing high cpu/disk and network usage

 

I disabled logging altogether for the site and kept the Logs only for Critical events. For example 503 Errors would log here. The error logs feed the fail2ban service hence they are needed to dynamically block IPs attempting funny stuff. Akamai does offer the same at the edge as well but I am not using that. I disabled the logs because all the logs are already available on Akamai in Apache common log format with additional fields like referrer , cookies and full headers (if you need it) and it has zero impact on the service as it is all offloaded to Akamai.

Folks, Data Transfer is cheap but CPU and memory is not. When you get a service like Akamai you cannot rely on them alone to solve all your problems. If you are not being charged additional for the CPU usage you might as well make the most of it and maximize offload. Here is what I get after disabling server/load balancer logs.

At now 45 req/sec (so more than 4 times the original)

CPU: 10% (average)

Apache Workers: 7

Memory: 900MB average (again mostly Apache2)

Apache Workers: 7-9 Process count: 21

DISK I/O: 7 KB/s for apache2 and (1.5 MB/sec average overall)

Ok, the DISK IO needs more explanation. The other processes like Database server is also on the same host and they all are using the same resource constrained mechanical disk. When Apache starts using 10000 KB/s it was causing a race condition requiring longer times for other processes to complete their transactions. Now with Web server Disk I/O out of the picture the bottleneck is significantly reduced. The same impacts CPU indirectly.

See for yourself.


2014-12-27_08h50_29

Note that , by the time I took the screenshot the traffic had moved up to 75 req/sec. Normally this would require aggressive caching or adding Nodes. However this time I had to do neither.

The solution is there but it is never actually used by most people. I am hoping it would change once more SysAdmins get to this. And to imagine the time folks spent on Database Caching, memcache and stuff.

 

Tools and Services used:

Linode
Linode LongView
Apache2, CentOS, Apache2 with mod_status
Akamai DSA with Netstorage logging.

 

Dec 172014
 

Classic overflow problem. Bottom-line is that it’s all about IO…. Logs hit both memory and disk.

I should learn to take my own advice. I have always minimized disk hits on my server instead of starting with database caching code.

Recently one of my projects server received higher than normal traffic and it killed the mysqld process. There was no way to keep the dB running as it would get terminated. This is bad…. Really bad. The mysqld was fine, queries are cached, not enough connections either. Problem was it was still using the most amount of memory. With low memory and swap out of space,  the os decided to kill the memory consuming process. I still need to dig deeper into how nix os decides on this. In any case this should not have happened. There is more than enough memory and the traffic wasn’t nearly high enough and the CPU usage was low.

The culprit for memory hogging was rsyslogd process. The second was php-fpm children. Normally I recommend using ramdisk type of location for logs or simply logging stuff only to a remote collector and no local logging. In this instance several logs were split up and some were in verbose mode. So despite being able to support 300 requests per second the site was barely keeping up with 10. The problem with logging is the Disk IO. It is still a write and no matter how much we optimize database , if the IO is spent elsewhere there is still a problem. The problem was more pronounced in this case because of software Raid 5  which is infamous for additional IO overhead and least return on the investment.

 

First thing I did to get back the mysql server to run and stay up was stop rsyslog daemon. Then cleared up my rsyslog configuration. I now only log everything via UDP to Splunk Storm. On AWS I used to run a collector written in Twisted Python, It was a simple script and it still work well. To be posted to github by end of this post.

 

PHP-FPM children:

I use dynamic allocation but the max child was set to 100 on one of the fpm configs. I lowered it to 20. There is really no reason to have 100 child process specially since I use fpm virtual hosts config to split different sites across different users and each site maintains it’s own environment. Each fpm vhost or virtual user had between 20-50 max children.

The overall memory improvement was about 65% and I was able to  serve 100 req/second again without problems and without forking out more money. This could explain why my AWS deployments used fewer resources as opposed to those which I have analysed and found them to spending 5-10 times more on instance usage and serving less a tenth of traffic.

 

Good luck with your Adventures and look before you upgrade.

 

 

Update: Another log I disabled but forgot to mention is Apache/NGinx access logs but kept the error logging to “critical” only. This dropped an additional 90% of IO usage and reduced memory usage to 25%.

If you use CDN services or remote logging then you’d be better off either way. Services like Akamai offer to log at the “Edge” which is far more useful solution. I am going to write another post on this.

Apr 032013
 

For the past few years websites have really grown and outgrown. If expired domains is any indication there are more websites born than babies on this planet everyday. Or perhaps more websites dying than people on this planet. Ok I made those stats up but hey 79% of all stats a re made up right?

Well what I have seen is that it is not websites that are becoming popular over the past few years. The trend has been towards services. I am not just thinking SaaS and stuff. Those are covered by “whitepapers” already. I am thinking nexusmods. A simple service to download game mods. I do run a fairly new games news portal and forum (like 3 months old…. remember babies) over at http://gamingio.com My interest in games go far back…really far like 1985 far. I work on web services everyday at work but I want to see some websites so I make them. End of the day what really matters is the service, the value add. Websites have done that and it’s getting old. Stuff likes news, blog etc is done to death. There is still ample room for new websites with new ideas. The real winner is web service.

Steam DB is amazing ref: http://gamingio.com/2013/03/mortal-kombat-9-and-many-other-games-steam-achievements-spotted/ see the schema someone has drawn up. It has so much information to play around with. Steam has a usable web service.

Web services are becoming the norm for many hosted apps as well. For e.g. when I used zendesk or aws I barely access the site. Most of it is done via web services. There is no standard to these things and web services is still a lot of stuff to cover and everyone has their own idea of how to do things.

 

A few months back I started working on my own web service. Nothing big , just something I wanted to use personally inspired by nexusmods and steam db. Idea is to put together a database of games and mods and allow it be publicly accessible or data submitted, merged etc. Anyone can query it. I could enable file downloads as well and in my test so far it all works ok. I just don’t have any background in windows or software development to make a GUI anyway. Perhaps others can find it useful.

I don’t have any game data yet but I will be making the API publicly available. I am hoping to get volunteers to help fill in the gaps.

In the next post i will put up the URL and some sample queries. I dont yet have a way of letting everyone send game data for the DB but I can work on that.

 

Other thing I realized that the GamingIO forums does have an API from IPB (the forum software we use) but I haven’t looked into whether it has a downloads API.  Either one should work for my purposes though the forum one would be simpler.

Jul 032012
 

I will briefly describe the tools and idea behind using the Amazon AWS for Web deployments to it’s full extent without overdoing it. This is the first Step of at least three I am going to write down for reference and critic. I have not read any books on the subject, having worked as both server engineer in data centers and 3 tier coder I had accumulated a fair share of failures and ideas of what could have been done right. It was just plain hindsight behind doing things the way I did.
I used AWS documentation for most of my work as I do now with Openshift.

Parts
1. Architecture (Infrastructure requirements)
2. Tools – Platforms, tools, scripts and programs
3. Building, Testing and Automating

This is the simplest form of web deployment that is scalable, fault tolerant and very redundant as well as stable.
Below is an architecture of the basic AWS IaaS building blocks we need. Here I use Amazon RDS for MySQL as it takes care of a lot of headaches involved with maintaining a MySQL server with backups and failover.

Amazon AWS WEB deployment

Amazon AWS WEB deployment, highly scalable and redundant

I currently have at least 3 deployments in production between one to three years and can maintain an up time of 99.9x% despite occasional DOS attacks and several Database migrations.

For PHP5 backed applications:

App Servers – These EC2 Instances can be just plain web servers with Nginx + PHP-FPM and your source code. All static files, all user data (uploaded files , avatars pictures, comments etc) that can change should not be here on local drives. They should be in S3 or other locations that can be shared by any EC2 Instance.
Note: Amazon Sales will tell you to use huge EBS drives and copy files over etc. I do not understand their reasoning for this. All App servers I have are Instance Store Machine Images which contain source code/executables only. While boot time is slightly longer (as claimed by AWS sales) I see no significant difference. I also found EBS backed instances are more prone to failures, reboots, corruption and cost more. A badly configured EBS backed instance can leave behind unused EBS volumes like bird shit.

WARNING: NEVER USE NFS TO SHARE FILES BETWEEN AUTOSCALED SERVERS. IN FACT NEVER USE NFS… period.

RDS: Always use Multi-AZ. Try to keep to one DB only for best performance , now you know why programmers used table prefixes. Do not use anything less than a small instance on production.
WARNING FOR MYSQL RDS USERS: NEVER USE MYISAM Tables. INFACT , ONLY USE INNODB unless you have a damn good reason to not use InnoDB.
Activate Snapshots to atleast 5 days. They are very useful when someone accidentally deletes your database , muti AZ will not save you, it will happily delete your failover copy. Yes it happens in real life! If ou want to make your own homegrown backup server , go ahead but leave snapshots on. They are the fastest recovery mechanism and is virtually hands free.

I think the rest is self explanatory for now. Unless there are questions then I can update the post.

In Part 2, I will cover the tools like memcache, syslog, configuring an AMI with startup scripts, automating stuff.