Jul 122012
 

This week was especially eventful as I lost my workhorse of a hexacore 12GB RAM and 4.5 TB server to a double disk failure. Not just a wrench but the whole monkey was thrown in. I know raid is no back up but I know that when you rent dedicated servers disks are the first commodity that the providers cheapskate on. As such I assume a single disk failure on Raid5 can be handled. What are the chances that TWO and bot ONE disk will fail. Apparently , exponentially higher that what I assumed.

Raid5 can handle a single disk failure. I had 3 disks of 1.5 TB each in the array, 2 failed all data gone. The only sad part is some users were affected as they lost data on the file hosting I provided for limited people on the server. The reason I have no backups for everything …..everything is because I just don’t have that kind of backup capacity (600GB plus before it went down).

I had just recently done a lot of nginx port of apache configs for gallery2, grepler.com etc and had not committed the code yet. it was just running as patch on production. the Nginx configs were super hard and took a lot of trial and error to get just right. I also had to make ad-hoc code changes.

Well, this is the reason I ask my team to commit everything including server config files to VCS first before updating servers. Not really a big loss since I wasn’t running any super critical web 2.0 friending applications and neither was it raining green bills on my doorstep.

On this particular server I did not do MySQL backups as I do with other servers, again too much data to backup but I think I should have selectively backed up something, maybe next time. All the sites I had were back up in 24 hours but with an older snapshot on an older server. I usually keep them around for just in case (like this).

The optimist: woohoo brand new server
Since I have brand new server I reinstalled CentOS 6.2 pristine again with Raid 5 and waited for sync to finish.
meanwhile in… datacenter. I sent instruction to use my plastic and upgrade RAM. yipee more ram , now it is 24GB. I used to think 12GB was enough, then MongoDB happened. I am not going to be installing mongodb right now but with all the memory I decided to add back my ramdisks.

I used webmin and the configs were weekly backups. They were safe. I restored them and my Network, Firewall was all ready to spec, I only selected stuff I needed back and IP blocks, gateways as well as firewalls are pain in the proverbial behind, much more than it’s supposed to be. I have a lot of IP table rules since my home IP is static, thanks to MyRepublic. The static IP saved me big time from two things, firewall and ssh was instantly secure. The connection is super stable to scp everything (100GB) of files back to the server at 2.5MB/s to 3.2MB/s. Ever since I changed to my new ISP I have developed a bad habit of posting my log files to splunkstorm even for my local machines. Talk about abusing the internet.

In my/etc/fstab I added


tmpfs /tmp tmpfs size=10%,noexec,nosuid 0 0
tmpfs /mnt/ramdisk0 tmpfs size=2000m 0 0

A quick reboot and then running command. Then I played around with copying/deleting 2GB files in a for loop with /tmp and and the ramdisk mount for like a second. Since I did not believe it took me that fast I double checked that files were actually there, they were.

I mounted the /tmp to tmpfs as well as I really want the mess cleaned up and there is no need to explain the security reasons behind noexec,nosuid options.


df -h

tells me everything is mounted fine. Not to mention I get to put stuff in RAM for super fast operations, maybe like caching, spools etc. Who knows, we’ll see.

In my rsyslog.conf I activated SplunkStorm . I hated the idea of installing splunk or forwarders, its just stupid for a standalone system. Splunk Storm is currently free and therefore awesome way to collect all my syslogs in one place. Which I do for my Development laptop as well. Browser tab switching is simply much better than multitailing multiple log files. Yep, lazy, I know, I call it efficiency – therefore more gaming time.

Next I found it useful to use my build scripts to restore some of the stuff I had installed before like ffmpeg. You never know when you need video conversion or decide to write a web service for it. Over the years I have done this several time so I always made scripts or command history to recreate what I had. Somethings can be more annoying than they need to be and Googling for it all over again is not ideal For example this time I had to patch FAAC audio codec because it failed to compile. There is no Google answer for it. I thought maybe I should just post my scripts publicly and let everyone know how bad I am at BASH scripts. I have created a github repo and will post all my scripts as I test them.

Onwards to compiling nginx to make use of the ramdisks. I am thinking I could make the web apps write the application cached files to ramdisk. This could be fun.

Meanwhile in erstwhile USSR…. server RAIDs you 😀

This blog survived because it is on Openshift.

Leave a Reply