There’s an important piece of infrastructure lacking the appropriate level of automation. In fact, without this part you are not connected to the Internet. I’m talking about the network hardware that moves packets between your backend servers and your customers.
The current state of the network industry is far behind that of the server industry, at least in terms of software customizations and the ability to “Do What You Want™”. This comes largely from the idea of requiring specific hardware vendors to write their own software. For all intents and purposes, this software is Cisco’s IOS, and Juniper’s JUNOS.
At Shutterstock, we use Juniper for all our networking needs. These guys have done a decent job allowing you to do anything you can from the command-line interface (CLI), via their XML-RPC interface which conforms to the NETCONF standards. Building on this and understanding the various stages of deployment, it is possible to automate your network. It just requires writing a lot of your own software, which honestly, gives you the greatest flexibility to mesh your customer facing application with your lower level network stack.
Since no application or network is designed the same, I’m going to describe some of the base components necessary to give you more control over your network.
Determine the protocol to interact with devices
A large part of the networking space has adopted the NETCONF protocol for running XML-RPC based commands. Thankfully, there is a decent collection of client libraries for this protocol written in many languages.
As a services-oriented company, we tend to break things up into their smallest component and add orchestration on top of those building blocks. As such, I’ve written a NETCONF proxy which can run commands concurrently across a provided list of devices.
Even if you’re not at the scale where high concurrency is necessary, having a simple API or library to interact with your network devices is key for taking control. For some great inspiration, I recommend taking a look at py-ez-junos which makes it easy to gather data about your devices.
Metadata storage for all devices
On the server software side of things, this is often referred to the Configuration Management Database or CMDB for short. If you’re lucky enough to already have this, all you have to do is extend it to support networking devices. If you do not have this, it’s worth the time and effort to get one in place.
At Shutterstock, we currently use a PostgreSQL database to store things like facter facts, IP addresses, ethernet interfaces, and other custom properties about individual nodes in our environment. We add on an Elasticsearch index to expose lucene search syntax into our data, making it easy to query for nodes based on any of the data we collect.
If you were to create a collection of YAML or even text files, that would be better than nothing. Having an authoritative list of network devices is critical for managing them in an automated fashion.
Discover new devices as they come online, then configure them
This one is a bit tougher depending on your network vendor’s default configuration. In Juniper land, the first thing a device does is blast out DHCP on the management interface and the default VLAN looking for an IP. With the appropriate dhcpd setup, you can cause a device to pull a configuration from an HTTP server.
We follow the paradigm outlined by Jeremy Schulman to pull a base configuration and install a proper JUNOS version when a device comes online. Once at this stage, a separate process that runs periodically will look for IPs responding to SNMP on our management network. The process then connects and collects SNMP data and stores it in our metadata storage.
You get some serious benefit here as you start to scale up. If you have to manually provision 20 new racks worth of network equipment, you are bound to have copypasta problems and even varying configurations if you have multiple engineers touching these things.
Tools to orchestrate the automation
Tools are why you build all the other components. If you have your devices easily searchable, and a library to interact with them, you can build useful CLIs or web interfaces for others to use.
This is where you can easily do a mass configuration change, write monitoring against devices for configuration drift, or simply use your metadata store to generate a nagios configuration. You can even schedule a configuration load for a specific time across multiple devices, and be on the beach while it happens.
These are the basics to learning how to control your own network. Feel free to comment below if you do something similar or know of any components that have been overlooked.