Faster TCP Slow Starts

At Shutterstock we’re obsessed with speed. Faster page loads mean happier customers, and we like happy customers.

Shutterstock’s customers are widely distributed around the globe. Our primary set of web servers is co-located in New York. We push a lot of our static assets (image thumbnails, JavaScript, CSS, and heck, let’s include DNS entries in that category) to globally distributed edge caches, but our servers still have to generate markup and push it over the long wire to web browsers far away from NYC.

After some advocating, hacking and testing (well reported over at LWN), a patch has landed in Linux 2.6.39 that raises TCP/IP’s initial congestion window from 4 packets to 10 packets. It’s going to take some time for this patch to make its way into “the enterprise”, but we’d like our customers to benefit from this larger initial payload immediately. So we’ve applied a patch to our web server kernels and measured the benefit to our customers.

The results are fantastic. Things stayed fast in New York, and got a lot faster around the globe.

(Update based on some feedback in these comments and on proggit):

The data being measured there is how long it takes us to push page markup over the wire to WebMetrics polling agents in the cities that are labeled. The X axis is days. Those are numbered 1 to 7, left to right. Y is markup delivery time in fractions of seconds, with a base of zero. Pushing markup over the wire from our webservers to the WebMetrics polling agent in New York is fast, and stays slightly noisy but fairly constant over the week period in the graph.

Pushing markup over the wire from our webservers to the WebMetrics polling agents in remote geographies gets significantly faster (closer to zero seconds) as the 6th day on the graph paints. Then it stays faster on the 7th day.