Shutterstock developers pay a lot of attention to the user experience of our website. We have a fleet of User Experience experts who help make sure the error states our web application shows to customers are useful and actionable.
But when we’re building backend APIs instead of HTML forms, that experience doesn’t translate. What’s the equivalent of this, in an API?
The Shutterstock Contributor Team has been building our next-generation content-review system, so that we can scale our image-review operation. We’re building it in a service-oriented fashion, in Ruby, with DataMapper as an ORM.
As developers building backend APIs, it’s solely our responsibility to provide useful information to the developers who will use our services. A good error validation framework preserves the integrity of our applications’ data and empowers developers to integrate with a new API.
Rather than write custom validation for each API endpoint, we took a systematic approach to add validation to all of them. Now we can avoid many application crashes, while providing useful information to developers.
One of the first things the review system needs is to learn about new items needing review:
POST /items
{
"domain": "shutterstock-photo",
"owner": "81",
"item": "3709",
"item_type": "photo",
"queue": "main"
}
This call puts the photo with item id 3709 and owner id 81 into the main review queue. The expected result is HTTP 201 Created
with a Location:
header giving the URL of the created item.
There are several other Shutterstock teams that will eventually integrate with this review service. Sometimes, when developers are still writing the software, they will post invalid data:
POST /items
{
"domain": "shutterstock-photo",
"owner": "81",
"item": "3709",
"item_type": "photo"
// "queue": "main"
}
Whoops! This POST left out the queue name, so the review system doesn’t know who’s supposed to review it. Without data validation, our application will throw a 500 error:
500 Internal Server Error TypeError: expected String, got NilClass
It would be better if we told the programmer what he’s done wrong. Also, we’d like to return HTTP 400 Bad Request
instead of having an internal server error.
Our team realized that there’s a tool to help us do this sort of thing: the json-schema
Ruby gem, an implementation of the IETF JSON Schema spec. To use this, we’ll need to build up a schema. For the items route, it would look like this:
{
"id":" http://review.shutterstock.com/items.schema",
"type": "object",
"required": ["domain","item","item_type","owner", "queue"],
"properties": {
"create_time": {"type":"string"},
"item": {"type":"string"},
"domain": {"type":"string"},
"item_type": {"type":"string"},
"owner": {"type":"string"},
"queue": {"type":"string"}
}
}
Now we will make our review service pass the incoming POST data through json-schema’s JSON::Validator
before doing anything else:
rest_data = JSON.parse(request.body.read)
json_errors = JSON::Validator.fully_validate(
schema,
rest_data,
:version => :draft4)
if json_errors.length > 0
content_type 'application/json'
halt 400, JSON[{:errors => json_errors}]
end
If there are any errors, the response looks like this instead:
400 Bad Request
{"errors"=> [
"The property '#/' did not contain a required property of 'queue'
in schema http://review.shutterstock.com/items.schema#"
]}
This message tells us that there’s a property missing in the JSON document root (#/
). If there’s more than one item missing, the validator will identify them all. The validator does more than check for the existence of the required fields; it also checks the types of each field. If someone passes in a Hash
instead of a string, like so:
POST /items
{
"owner": "81",
// eek, I'm not a string, I'm a Hash:
"item": {"domain": "shutterstock-photo", "id": "3709"},
"item_type": "photo",
"queue": "main"
}
then they’ll get an error message about item
. Previously the application would have returned another Internal Server Error about a TypeError
as soon as it tried to treat item
as a string.)
There’s just one problem. We have a variety of resource types to manage. It would be really great if we didn’t have to write a custom schema for all of them. It’s a fair amount of text to write; it’s easy to get wrong; the hand-written schema can fall out of sync with the actual code; and above all, it’s redundant! Most of that validation information is already encoded in our ORM layer, where it looks like this:
class Item
include DataMapper::Resource
property :id, DataMapper::Property::Serial
property :create_time, DateTime,
:default => lambda {|_,_| DateTime.now }
property :external_id, String, :required => true
belongs_to :domain
belongs_to :item_type
belongs_to :owner
validates_uniqueness_of :external_id,
:scope => :domain,
:message => "Item must be unique to a domain"
has n, :reviews
has n, :queues, :through => :queue_items
...
It turns out that we can use this class definition to build our schema:
- figure out the class of the resource in question (we’ll call it
resource_class
) - ask the
resource_class
for a list of its properties (resource_class.properties
) - ignore properties that our application can automatically populate (like the internal database
id
andcreate_time
) - figure out the data type for the remaining properties (
property.primitive
) - ask the properties whether they’re not required (
property.required
)
Once we’ve done that, we almost have enough information to build a schema. There are a few other wrinkles: our properties include things like domain_id
as an integer instead of a string, and we want our consumers to specify shutterstock-photo
instead of the internal database ID. So for those we:
- ask for the
resource_class.relationships
- figure out the
relationship.child_key
- replace the property matching that key with a string instead
Finally, we present all this data in the JSON Schema format.
That’s all the information we need to build schemas for all of our resource types. By computing and caching this at application load time, we can provide a basic schema for all POST
and PUT
requests.
We may need to customize a generated schema for certain routes that are special cases. For instance, we’ve decided that the POST /items
route calls its logical ID field item
in the POST and external_id
in the database. Such customization is straightforward to accomplish.
Our final realization was that once we had all the information about how a schema ought to look, we could make the schema available to our users. So now they can issue a request against http://review.shutterstock.com/items.schema
(or domains.schema
and owners.schema
) and see for themselves exactly what fields the system is expecting to create a new resource. By providing a URL to the schema in the error message, we end up with a self-documenting API!