When the government shuts down and takes most of its data with it, the public needs to have a backup plan.
The laws still apply to us. Politicians are still raising money. We still have a population, unemployment, and social problems. But if you want to look up some of those basic facts, well, Data.gov can’t help you, you’ll have to ask about those campaign donations later, and the Census will get back to you someday.
It’s not 1995 anymore. The government lives on the Internet, and so do we. We can’t just lose access to all the information they put online. We all understand why the Internet Archive keeps the Web on file for us. The only reliable way to preserve data online is to make copies — and the more copies, the better!
That’s why a government API will never be enough. It’s just so much easier to copy data when it’s directly downloadable in bulk. APIs can be extremely useful, but they also centralize control and form a single point of failure. Ultimately, APIs are optional. Data is a necessity.
Bulk data is just downloading files. The Internet was built for this. Downloading files is a much more straightforward backup process than constructing batteries of queries to APIs that weren’t designed for wholesale copying. Depending on the API, a full backup may be so difficult as to be infeasible.
Just as importantly: hosting static files requires fewer people, smaller systems, and less technical expertise. It’s vastly simpler and cheaper than hosting a live, “smart” data service. In the face of hard funding decisions, that’s going to matter.
It’s safe to say that in the community of people who depend on government data, things feel different now. Now that it’s been made clear that this data can suddenly disappear with the political crisis of the moment, the open government community will be taking very seriously their role in preserving it.
* Code for America assembled a backup of much of the Census’ data. * Cornell’s Legal Information Institute rescued various data verification files from the Library of Congress. * The Library also almost withdrew access to the work of Congress, and Sunlight has the APIs and bulk data ready in case they ever do. * Mark Headd, Philadelphia’s Chief Data Officer, wrote today extolling the virtues of community-operated data portals that the government doesn’t unilaterally control.
Once the federal government picks up the pieces and starts working on their open data strategy again, they’ve got to acknowledge that they may not always be there for you, much as they want to be.
That means federal agencies should:
- Publish downloadable bulk data before or concurrently with building an API.
- Explicitly encourage reuse and republishing of their data. (Considering public reuse of data a risk to the public is not recommended.)
- Document what data will remain online during a shutdown, and keep this up all the time. Don’t wait until the day before (or of) a shutdown.
- Link to alternative sources for their data. Keep these links up during a shutdown.
We’ve got to be able to keep a copy of our government’s data, just as we make backups of our photos, the Web, and everything else that’s truly important to us. Government information is too vital for it to be dished out only in narrow, complicated pipes.
For our government to be a platform, it needs to let the public hold them up.
Image via Ralf Muehlen.