At Shutterstock we’ve been putting a lot of effort into rolling out an infrastructure-wide configuration management / provisioning system to ensure all our servers are built correctly every time. This system consists of cobbler / puppet to ensure the appropriate packages and configurations on each server set that we have (thumb, web, DB, memcache, etc.), and we use some other cool tools like fabric / mcollective to do bulk jobs across pools of servers. It’s been a lot of work and it’s always nice to see some validation that it was worth it. I’ll write a larger post on some of these later on.
Below is a good example of a server that is currently not “puppetized” and largely was built by a human to be a replica of the other three servers in the pool. This server was missing a simple “noatime” mount option. By simple we mean the fix was completely trivial, though finding the cause of the problem itself was something we discussed for quite some time. Not a ton of ops time was lost… but I think we spent some hours scratching our heads before one of our engineers really wanted to sort this out. Check out the difference that this made on load.
There are a few wins here:
Performance – Major decrease in load on thumb02
Sexy – A graph that looks like the server set is scaling horizontally as we would expect
Validation – The warm fuzzy thought knowing that with puppet on hosts a misconfiguration like this should never happen again (and if it does we can always do a diff to find out what’s awry).