Every quarter, the tech team at Shutterstock holds CodeRage, a 24-hour hackathon where we’re encouraged to work on any project that can bring value to the company.
This quarter, one of the winning projects was called Projector. It’s a web app that lets you turn your webcam into a projector to show drawings and diagrams to other people. Here’s a quick demo of how it works:
Dave K, the lead engineer on our footage team, wrote the app. I interviewed him about the project.
What problem were you trying to solve?
I was onboarding one of our new developers in our Denver office, and I was in New York, and I wanted to show him how our different systems were set up and how they work together. I thought, “Wow, every time I onboard somebody I usually go to a whiteboard and sketch out how this works, because it’s so hard to talk about it without diagrams.” I really just wanted to give him a quick sketch of how our servers are set up. I couldn’t find any good online solutions for drawing with a touchpad, and if you point a webcam at a whiteboard it’s really hard for the person on the other side to see anything.
So a few days later I was thinking it’d be cool if we could use a webcam to show a facsimile of a piece of paper you have in front of you to someone across the country and be able to make changes in real time.
How did you approach the problem?
For this project, the main problem was how to detect where the piece of paper was. So I brainstormed a bit about that. But sometimes it doesn’t work out the way you expect. This was a perfect example — the initial plan I had didn’t work at all, and I had to re-formulate it and sleep on it to find a better solution.
What was the first approach you tried?
Well, I needed to detect the edges of the piece of paper. Originally, I was going to have a setup where you made four black dots on a piece of paper. Then, I was going to try converting the webcam image to black and white, and then detect every shape on the page. Any shape that was touching the corner of the image, I’d delete from the shapes that I’m looking at. And then I’d try to discover the shapes that were closest to the corners of the frame because those would probably be my black reference dots.
Part of trying to solve that problem was to write a fill algorithm, and so I created a structure of every black pixel on the page and then I’d loop through the pixels and try to determine if it was part of a bigger shape based on neighboring pixels. I wrote it as a recursive function, and although it worked on a small scale — like a 10×10 pixel image — on a bigger image I was getting a stack overflow — it was just using too much memory.
So when that didn’t work out, I looked online for different fill algorithms, and one of them was a flood fill algorithm which was supposed to be more performant. I was able to tweak some open source code that I found to get that working, but on a big image it would still crash from using too much memory. It was kind of upsetting because I spent a whole night getting that to work — going down this one rabbit hole. So I thought, “I should just go home and go to sleep.” It was about midnight at the time.
The last thought that entered my mind was, “Wait. Why make these dots black? What happens if we add a color in there?” Then you could do a simple color detection based on different quadrants of the image. And then I felt a little better going to sleep with that idea in my head. The next morning I woke up and just focused on that and it seemed to work pretty well.
What third-party libraries did you use for the project?
A lot of this is reliant on these new, awesome features available in HTML5. One of the things that was really crucial was the getUserMedia() HTML5 function. That lets the browser get access to your webcam. Then I used some of the canvas manipulation tools. HTML5 lets you draw an image to the canvas, and then you can get RGBA values for every pixel on that canvas so you can determine what color something is. You basically have an array you can loop through, and that’s how I’m able to find the green pixels.
The other library I used is BinaryJS that lets you send and receive streaming binary data over web sockets. It uses a compact serialization method to make that as efficient as possible. I also had to use a polyfill for Canvas’ toBlob() method, which turns an image into raw binary data so that BinaryJS can segment the packets. It’s not implemented in mainstream browsers yet, so the polyfill allowed me to use it in browsers that did not already have support.
I used ImageMagick for server-side image processing, and ran a threshold function on the image so everything that fell into the lighter 50% of a black and white image turned to white, and everything in the darker 50% turned to black. That makes it easier to create a facsimile of the image. The place I got the idea for that was from an app called JotNot Pro, which lets you scan documents by using the camera on your smartphone. It uses a similar approach to thresholds to make the scanned text clearer.
The other thing I used ImageMagick for is for perspective distortion. ImageMagick has a perspective distortion function that lets you take four points and map them to new coordinates, which is really neat because I can take those four control points (the green dots) and map them to the corners of the viewer to flatten the image.
What’s nice is that if I’m holding a piece of paper, no matter how I’m holding it, it keeps it in place so it doesn’t jump around. It also makes it so that it doesn’t look squashed.
Have you thought of open sourcing this project?
Yeah, I have to clean it up a bit and make it a little more practical to use, but then I think we could release it.
(Sign up for email updates on the right to get notified when we release it.)