National Data Catalog API

by

The National Data Catalog went live last week. Now we would like to share a little bit about our API and how it fits into our platform.

The National Data Catalog (NDC) is an open source catalog for government data sets and APIs. Our goal is to have it encompass all data released by or about governments in the United States. This includes federal, state, and local jurisdictions. The NDC will harness the community of users interested in open government data.

Web developers can take a look at our API documentation.

The NDC Platform

The NDC is an open source platform consisting of several components: an API, a Web application, and a set of importers.

  • At the center is the API. We say ‘center’ because it houses NDC’s data; therefore, the other components utilize the API to access the data. Technical users (such as software developers) are the kinds of users mostly likely to interact with the API directly.

  • The richest user experience is available with the National Data Catalog web app. It is geared towards the general public, but with a focus on researchers, reporters, investigative journalists, and lovers of data far and wide.

  • Importers are a key way that we populate NDC with data. They are automated for the most part: an importer’s job is to connect to an external data catalog, gather data from it, and upload it to the NDC API. We currently have importers for data.gov, data.dc.gov, and utah.gov. NDC also has curation tools so that Sunlight curators can look over the metadata, grooming it as needed.

API Architecture Benefits

The NDC architecture is very modular in comparison with the prototypical Rails Web application. NDC applications (currently the Web app, API, and importers) are separate and communicate via HTTP. With the API at the center, these clustered apps are small and focused. This has scalability benefits as well: if one component gets bogged down, we can start up another instance.

The modular architecture also helps promote an open ecosystem. Anyone is welcome (and encouraged) to build ‘third-party’ applications that consume our API. To obtain read access to the API, just sign up. Your profile page will list your API key.

Many APIs are read only; however, the NDC API also provides full write access to authorized users. It is careful to check proper credentials to ensure data integrity. This means that trusted applications in our ecosystem can both consume and add data to the platform. If you write a importer for an external data catalog, please let us know and we’ll set you up with credentials.

For the technically inclined, our API embraces the Resource Oriented Architecture (ROA) as detailed by Leonard Richardson and Sam Ruby in their book “RESTful Web Services.”

Technical API Details

The API source code is open source and available from the sunlightlabs GitHub page. As I mentioned above, we also have API documentation online.

The NDC API is written in Ruby using sinatra_resource. SinatraResource is a framework that makes it easy to write RESTful Web Services on top of Sinatra and MongoMapper. In case you haven’t heard, MongoMapper is a wrapper on top of MongoDB that provides functionality such as validation and callbacks.

Ruby developers will want to take advantage of our Ruby wrapper. It makes using the API very easy. Just install the datacatalog gem and off you go.