Is Government a Data Wholesaler or Retailer?


Imagine if Costco announced that they were going to take the Costco experience to Manhattan, and open up convenience stores across the island. Further, imagine shopping at these new CostCo bodegas, all of 500 square feet, with your giant cart, selecting from what the CostCo bodega has to offer in this limited amount of space! At your local CostCo bodega you have to choose from either 400 rolls of toilet paper, 70 lbs of dehydrated mashed potatoes, or a 6 pack of giant boxes of cereal. That’s pretty much all they could store in inventory at the CostCo bodega because they wouldn’t have room in 500 square feet for anything else. And good luck carrying all that home!

Sounds absurd, doesn’t it?

That’s exactly what the U.S. Federal Government often does with data. They’re the only people in the country that can provide bulk (wholesale) data but they focus almost entirely on providing visualizations and websites to view that data. See, government has a monopoly on your data, but more often than not they focus on creating web (retail) experiences and make providing data as a wholesaler an afterthought.

Here’s a good example: provides data about campaign finance. They receive data from campaigns regarding their contributions and expenditures and have a mandate to put that data in front of the public. If you go to right now, front and center are maps. But if the FEC was really focused on getting as many eyeballs on disclosure data as possible, so people could be their own watchdogs, then wouldn’t it make sense that FEC’s pride and joy be modern, machine readable, accurate downloadable data? After all, the FEC is the only organization that has that data, and anyone can build a map. They’d be focused on supporting good retail operations like OpenSecrets rather than building those operations in house. After all, OpenSecrets usually gets 4x as much traffic as the FEC.

I propose that Federal Agencies and Congress instead view themselves primarily as wholesalers, and make the retail experience after they’ve built a wholesale operation. Mostly because retailers exist and government is their sole supplier– meaning that if the FEC’s data is bad or inaccurate, so is OpenCongress’ and the Huffington Post’s, and the New York Times’. But also I think this is good for four other substantial reasons:

  1. As soon as anyone starts providing user interfaces on top of the data, a lens– for better or worse– gets put on that data. Designers and managers start making decisions about what is important and what isn’t. The closer you can get to bulk, raw data the closer you get to Truth.

  2. When government’s intent is to mandate that data be disclosed, part of that is an implicit assumption that Government wants eyes on the data as well. To get the most eyes on that data, that data ought to be put into the hands of multiple parties so that retailers (like the Sunlight Foundation) can easily shed light on the data for their constituencies.

  3. It saves the taxpayer money. If Government does primarily rely — just a little bit — on outside organizations to build engagement around data, and focus on providing good, accurate, modern machine readable data instead, then the taxpayer can save millions.

  4. Government has a monopoly on this data. is another great example of what I mean. If you take a look at page 12 of the recently released proposal to the Recovery board, you see this image: stories of stimulus success

I’ve obviously circled the part that’s important and added my own editorial flair to the image. Once you start editorializing data– well, you’ve lost both accountability and transparency as well as credibility.

This isn’t to say that Government should only be serving watchdog groups, media organizations and freelance developers, of course. But rather, I’m suggesting an order of operations. To be specific:

  1. Government should FIRST provide wholesale access to data.
  2. Government should NEXT provide APIs on top of that data for developers and outside organizations to use.
  3. Government should LAST provide a retail experience to citizens that is built using ONLY the API and bulk data that it provides.

Too often, now, these are the steps that Government takes:

  1. Government creates a retail experience
  2. Watchdog groups, media organizations and others complain about not having access to data
  3. Government worries where it is going to find the budget to create an API or create bulk access to the data.
  4. Government finally does provide one or the other, or sometimes both, but as an afterthought– or at least a second priority to the retail experience. And generally they create some new, “vetted” or “cleaned” dataset to the public whereas they use their own internal dataset for their own purposes.

If Government first takes on the responsibility of being a wholesaler, and builds its own retail operation on top of its own wholesale data, you’ll see the quality of that wholesale data go up, more retail experiences being created outside of government, and the overall quality of the retail experiences going up as well.

So let’s start pushing government in this direction– to become part of the new web, and leverage the cost savings that we developers can provide to government, we’ve got to start pushing government to start with the wholesale, and end with the retail– rather than the other way around.