As the term “API” has become more widely recognized through its ubiquity in social media and other web services, its coolness factor has grown considerably, and has become something frequently called for from government.
But does government really need to rush around and make APIs for all of their stuff? Peter Krantz argues that offering direct downloads to bulk data is a much more scalable, simple, and sane solution in most cases.
You should go read his article, rather than just our summary. But specifically, he points out that by offering an API instead of bulk data:
- A government API can suddenly and unintentionally become a piece of critical, high-demand infrastructure. Offering bulk data minimizes load, and forces an intermediate step between a client’s system and your own.
- Entrepreneurs and developers are limited to your API’s worldview of that data, which is necessarily limited to that agency’s worldview of their data.
- Consistent URL schemes to downloadable files go a long, long way and may be all you need to give entrepreneurs to allow them to automate the entire process and pick up new data over time. (This is why the House requiring predictable URLs for committee documents and data is such a big deal.)
There are some cases where an API is justified, even needed. Chris Musialek from Data.gov asked about this in a post late last year to the Sunlight Labs mailing list. Our Labs director, Tom Lee, responded with a good summary of the distinction:
“My sense is that APIs are great, but except for a very few cases they should be adjuncts to bulk downloads, not substitutes. In particular, it may be appropriate to offer an API when the system’s source data changes very rapidly (I would put this at “faster than daily”), making staleness a concern; or when manipulating the data requires technical capabilities sufficiently advanced as to exclude many potential users of the information.
One example of this might a dataset with very large storage requirements. Another is GIS data, where users need to know something about setting up the associated toolchain to use the source data: an API can make this information accessible for those who just need to query a bunch of lat/lon pairs and see what comes back.“
In addition to Peter’s and Tom’s points, I would add a couple of other lessons learned as a developer here:
- There’s no way to predict ahead of time the right data format and structure for every client who’s interested in your data. Expect clients to need to transform your data for their own requirements, and for that transformation to require clients to first obtain all of your data.
- Providing bulk access is several orders of magnitude less work on the part of the provider than building and maintaining an API. An API is a system you need to design, create, keep running, attend to, and worry about. Bulk data access is uploading some files, forgetting about it, and letting HTTP do the work. Ongoing automated bulk access may require some integration into existing workflows behind the scenes, but it’s going to be a lot less work than building a new system.
In many cases, we’ll develop an API simply because we need it for our own work. After all, the best way to create a good API is to dogfood it yourself as you build it, so you know what it’s like to use it. If the resulting API also provides a meaningful public service, then the cost of making it public is very low for us, since we’ve already decided that building and maintaining it is necessary.
In others, such as our Congress API’s geographic lookup service, it’s because it covers up a lot of complex calculation and allows geographic novices to build interesting services. It’s more than just a mechanism of transporting information.
To sum up, Sunlight is pro-API – we make our own, and we welcome them from the government when they enhance access to information (the FederalRegister.gov API is a particularly good example). However, the first step government should take, in nearly all cases, is to offer the data directly and in bulk. They’ll save themselves mountains of hassle, will better and more quickly serve entrepreneurs and developers, and will encourage the broadest possible use of their data by the public.