We use open source software in just about every form it takes: programming languages, operating systems, web servers, databases… even firewalls. We try to release some of our own software, too. Open source software has all kinds of advantages, but one of my favorite’s is how easy it is to fix problems if any arise.
Earlier this year, we added autocomplete functionality to our search interface. Autocomplete is a simple concept that has some tricky implementation details, especially if you have a big data set. It requires a fast server-side lookup table, and the server response has to be lightning quick.
We’ve experimented with a lot of web servers at Shutterstock. Our mainstay is Apache, but over the years we’ve toyed with lighttpd, nginx, node, and others. But hooking logic into a full-featured webserver leaves you with a pretty bulky system, and we thought we’d be better off going with something lighter.
We looked around and found Feersum. Feersum is an event-based webserver (like nginx and node) that’s written in Perl (or more accurately, a combination of Perl and C) and is based on EV/libev (the same event loop that node uses). We whipped up a prototype with it and were impressed by its speed — 2,000 requests/sec with a 30ms mean response time with 100 concurrent connections on a lightweight box. That’s quick!
So we wrote an implementation of autocomplete with it and launched it. And it was a great success — when it worked. We noticed that sometimes it would simply fail on certain requests. The host servers seemed fine. The daemon was still listening and responding to requests. But for some reason we’d sporadically get “400 Bad Request” errors.
At first we assumed this was a problem with the client — our AJAX code must have somehow been buggy and passed in bad data. But we ruled that out pretty quickly, and soon isolated the problem to the daemon. We were able to reproduce the issue by sending simple and innocous HTTP requests that would nonetheless return “400 Bad Request” responses back. We scratched our heads a bit, and then did that glorious thing that open source software lets you do: we dove into the code.
Here, life got more interesting. It turns out Feersum is based on another open source project, picohttpparser. That presented a challenge, both because it was slightly harder to isolate the problem and also because picohttpparser is meant to be lightning fast and is therefore written with a bunch of effective but obscure optimizations.
So we spent a weekend hacking away at it, adding sprintf’s (debugger, bah!) to every line we could to understand the problem. We got pretty close to figuring it out, but ultimately got tripped up by not knowing whether the problem was in how Feersum was calling picohttpparser, or in picohttpparser itself.
Happily, open source software gives you an easy next step: contact the author. So we gathered all the information we could about the problem, tried to sum it up as succintly as we could, and posted an issue on Github. Within two days, the author had identified the error, patched it, and released a new version. Thanks, stash!
Delighted with the quick reponse, we installed the new version, did some tests — and saw our daemon work flawlessly.
And check it out — we’re now humming along with Feersum serving a snappy response on every keypress of every image search! That’s way cool.