Dec 272014
 

What is Offloading?

Offloading is in gist, transferring the resource consumption from one infrastructure to another without affecting the overall functionality of the service and retaining full control over the transactional components.

Therefore offload is never 100% unless you have 100% static website. However it can be close to 100% even for dynamic sites.

We know that using a CDN like Akamai (or what Akamai calls Dynamic Site Accelerator) for Static files etc on a website can help offload much of the work from the Servers. This can provide significant load reduction in the Data Center. This adds more scalability to your infrastructure and offers better experience to the end user. Akamai offers a report to elaborate on the offload but it can only show you the offload of traffic passing via Akamai and hence it’s the Network offload. This does translate to some Disk I/O and CPU offload on your Data Center usage. However it may not be very significant.

Akamai offers an inclusive service for Logging the access logs on their own 100% SLA Storage which is extremely underrated and generally ignored. As I mentioned in my previous post about how logs can bring down your own servers this logging service can actually help. Here is an example of my setup with apache2 web server and already on a CDN. 

The Test

at 10 req/second via CDN as measured by GA my test server is using the following resources

CPU: 105% avg to 146% on a Xeon Quad Core class CPU.

Disk I/O: 14MB/s overall (mostly apache2) Apache2 alone is using 10MB/s (thats Megabytes)

Memory: 3.3 GB (mostly for apache2 alone.)

Apache workers: 21 , process count 43

Linode longview showing high cpu/disk and network usage

 

I disabled logging altogether for the site and kept the Logs only for Critical events. For example 503 Errors would log here. The error logs feed the fail2ban service hence they are needed to dynamically block IPs attempting funny stuff. Akamai does offer the same at the edge as well but I am not using that. I disabled the logs because all the logs are already available on Akamai in Apache common log format with additional fields like referrer , cookies and full headers (if you need it) and it has zero impact on the service as it is all offloaded to Akamai.

Folks, Data Transfer is cheap but CPU and memory is not. When you get a service like Akamai you cannot rely on them alone to solve all your problems. If you are not being charged additional for the CPU usage you might as well make the most of it and maximize offload. Here is what I get after disabling server/load balancer logs.

At now 45 req/sec (so more than 4 times the original)

CPU: 10% (average)

Apache Workers: 7

Memory: 900MB average (again mostly Apache2)

Apache Workers: 7-9 Process count: 21

DISK I/O: 7 KB/s for apache2 and (1.5 MB/sec average overall)

Ok, the DISK IO needs more explanation. The other processes like Database server is also on the same host and they all are using the same resource constrained mechanical disk. When Apache starts using 10000 KB/s it was causing a race condition requiring longer times for other processes to complete their transactions. Now with Web server Disk I/O out of the picture the bottleneck is significantly reduced. The same impacts CPU indirectly.

See for yourself.


2014-12-27_08h50_29

Note that , by the time I took the screenshot the traffic had moved up to 75 req/sec. Normally this would require aggressive caching or adding Nodes. However this time I had to do neither.

The solution is there but it is never actually used by most people. I am hoping it would change once more SysAdmins get to this. And to imagine the time folks spent on Database Caching, memcache and stuff.

 

Tools and Services used:

Linode
Linode LongView
Apache2, CentOS, Apache2 with mod_status
Akamai DSA with Netstorage logging.