Jul 122012

Whenever I login to my CentOS6 dedicated server my logs say

su: PAM unable to dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so: cannot open shared object file: No such file or directory
Jul 12 13:11:07 CentOS-55-64-minimal su: PAM adding faulty module: /lib64/security/pam_fprintd.so

The thing about PAM is that it is tons of modules. Now fprintd is nothing but figerprint module. This on a so called CentOS minimal install.

Just disable it, and you’ll be fine.

authconfig --disablefingerprint --update

 Posted by at 7:16 pm
Jul 122012

This week was especially eventful as I lost my workhorse of a hexacore 12GB RAM and 4.5 TB server to a double disk failure. Not just a wrench but the whole monkey was thrown in. I know raid is no back up but I know that when you rent dedicated servers disks are the first commodity that the providers cheapskate on. As such I assume a single disk failure on Raid5 can be handled. What are the chances that TWO and bot ONE disk will fail. Apparently , exponentially higher that what I assumed.

Raid5 can handle a single disk failure. I had 3 disks of 1.5 TB each in the array, 2 failed all data gone. The only sad part is some users were affected as they lost data on the file hosting I provided for limited people on the server. The reason I have no backups for everything …..everything is because I just don’t have that kind of backup capacity (600GB plus before it went down).

I had just recently done a lot of nginx port of apache configs for gallery2, grepler.com etc and had not committed the code yet. it was just running as patch on production. the Nginx configs were super hard and took a lot of trial and error to get just right. I also had to make ad-hoc code changes.

Well, this is the reason I ask my team to commit everything including server config files to VCS first before updating servers. Not really a big loss since I wasn’t running any super critical web 2.0 friending applications and neither was it raining green bills on my doorstep.

On this particular server I did not do MySQL backups as I do with other servers, again too much data to backup but I think I should have selectively backed up something, maybe next time. All the sites I had were back up in 24 hours but with an older snapshot on an older server. I usually keep them around for just in case (like this).

The optimist: woohoo brand new server
Since I have brand new server I reinstalled CentOS 6.2 pristine again with Raid 5 and waited for sync to finish.
meanwhile in… datacenter. I sent instruction to use my plastic and upgrade RAM. yipee more ram , now it is 24GB. I used to think 12GB was enough, then MongoDB happened. I am not going to be installing mongodb right now but with all the memory I decided to add back my ramdisks.

I used webmin and the configs were weekly backups. They were safe. I restored them and my Network, Firewall was all ready to spec, I only selected stuff I needed back and IP blocks, gateways as well as firewalls are pain in the proverbial behind, much more than it’s supposed to be. I have a lot of IP table rules since my home IP is static, thanks to MyRepublic. The static IP saved me big time from two things, firewall and ssh was instantly secure. The connection is super stable to scp everything (100GB) of files back to the server at 2.5MB/s to 3.2MB/s. Ever since I changed to my new ISP I have developed a bad habit of posting my log files to splunkstorm even for my local machines. Talk about abusing the internet.

In my/etc/fstab I added

tmpfs /tmp tmpfs size=10%,noexec,nosuid 0 0
tmpfs /mnt/ramdisk0 tmpfs size=2000m 0 0

A quick reboot and then running command. Then I played around with copying/deleting 2GB files in a for loop with /tmp and and the ramdisk mount for like a second. Since I did not believe it took me that fast I double checked that files were actually there, they were.

I mounted the /tmp to tmpfs as well as I really want the mess cleaned up and there is no need to explain the security reasons behind noexec,nosuid options.

df -h

tells me everything is mounted fine. Not to mention I get to put stuff in RAM for super fast operations, maybe like caching, spools etc. Who knows, we’ll see.

In my rsyslog.conf I activated SplunkStorm . I hated the idea of installing splunk or forwarders, its just stupid for a standalone system. Splunk Storm is currently free and therefore awesome way to collect all my syslogs in one place. Which I do for my Development laptop as well. Browser tab switching is simply much better than multitailing multiple log files. Yep, lazy, I know, I call it efficiency – therefore more gaming time.

Next I found it useful to use my build scripts to restore some of the stuff I had installed before like ffmpeg. You never know when you need video conversion or decide to write a web service for it. Over the years I have done this several time so I always made scripts or command history to recreate what I had. Somethings can be more annoying than they need to be and Googling for it all over again is not ideal For example this time I had to patch FAAC audio codec because it failed to compile. There is no Google answer for it. I thought maybe I should just post my scripts publicly and let everyone know how bad I am at BASH scripts. I have created a github repo and will post all my scripts as I test them.

Onwards to compiling nginx to make use of the ramdisks. I am thinking I could make the web apps write the application cached files to ramdisk. This could be fun.

Meanwhile in erstwhile USSR…. server RAIDs you 😀

This blog survived because it is on Openshift.

Jul 032012

I will briefly describe the tools and idea behind using the Amazon AWS for Web deployments to it’s full extent without overdoing it. This is the first Step of at least three I am going to write down for reference and critic. I have not read any books on the subject, having worked as both server engineer in data centers and 3 tier coder I had accumulated a fair share of failures and ideas of what could have been done right. It was just plain hindsight behind doing things the way I did.
I used AWS documentation for most of my work as I do now with Openshift.

1. Architecture (Infrastructure requirements)
2. Tools – Platforms, tools, scripts and programs
3. Building, Testing and Automating

This is the simplest form of web deployment that is scalable, fault tolerant and very redundant as well as stable.
Below is an architecture of the basic AWS IaaS building blocks we need. Here I use Amazon RDS for MySQL as it takes care of a lot of headaches involved with maintaining a MySQL server with backups and failover.

Amazon AWS WEB deployment

Amazon AWS WEB deployment, highly scalable and redundant

I currently have at least 3 deployments in production between one to three years and can maintain an up time of 99.9x% despite occasional DOS attacks and several Database migrations.

For PHP5 backed applications:

App Servers – These EC2 Instances can be just plain web servers with Nginx + PHP-FPM and your source code. All static files, all user data (uploaded files , avatars pictures, comments etc) that can change should not be here on local drives. They should be in S3 or other locations that can be shared by any EC2 Instance.
Note: Amazon Sales will tell you to use huge EBS drives and copy files over etc. I do not understand their reasoning for this. All App servers I have are Instance Store Machine Images which contain source code/executables only. While boot time is slightly longer (as claimed by AWS sales) I see no significant difference. I also found EBS backed instances are more prone to failures, reboots, corruption and cost more. A badly configured EBS backed instance can leave behind unused EBS volumes like bird shit.


RDS: Always use Multi-AZ. Try to keep to one DB only for best performance , now you know why programmers used table prefixes. Do not use anything less than a small instance on production.
WARNING FOR MYSQL RDS USERS: NEVER USE MYISAM Tables. INFACT , ONLY USE INNODB unless you have a damn good reason to not use InnoDB.
Activate Snapshots to atleast 5 days. They are very useful when someone accidentally deletes your database , muti AZ will not save you, it will happily delete your failover copy. Yes it happens in real life! If ou want to make your own homegrown backup server , go ahead but leave snapshots on. They are the fastest recovery mechanism and is virtually hands free.

I think the rest is self explanatory for now. Unless there are questions then I can update the post.

In Part 2, I will cover the tools like memcache, syslog, configuring an AMI with startup scripts, automating stuff.