Forgive me for slacking off on the blogging this week. I’ve spent the last three days “inside” a new database – and you’re going to like it when it’s released in a couple of weeks.
Okay, I’ll admit to being a little weird when it comes to databases. I’ve always enjoyed digging into data, and for someone with such propensities there’s no greater thrill than the feeling that you’re looking over information that no one else has ever seen before. It’s like laying down a fresh set of footprints on an island or a continent that nobody knew was there. Well, except the “undiscovered” people who lived there before.
In the case of government databases – which this one was – we know of course that some people have seen it before. Someone inside the government put it together, after all. And the data inside it reflects real government actions. In this case, the database shows federal government grants over the past six years: the recipients, the amounts, the agencies involved, even the congressional district that got the money.
But all that was for an inside audience. People receiving the checks knew about it. The ones writing the checks certainly knew about it. So did the galaxy of insiders who made it happen – including members of Congress, lobbyists, and a small army of government staffers.
But even they only saw fragments of the whole picture. The database shows everything. That means that outsiders to the process – news reporters, curious citizens, all those do-it-yourself journalists in the blogosphere – everyone will soon be able to search, sort and subtotal the data on government grants in ways that even the insiders couldn’t do before.
All the heavy lifting to make this happen and put it on the web is being done by a crack team of database pros at OMB Watch, funded by a Sunlight Foundation grant. The only role I had was standardizing the names of the agencies, universities and others that received the grants. That actually turned out to be quite a job, since lots of the agencies have long names and they were entered in the database under a bewildering array of variations.
So for three days I’ve been cleaning it up, turning entries like “NC ST DEPARTMENT OF HEALTH & HUMAN SERVIC” into “NORTH CAROLINA DEPT OF HEALTH & HUMAN SERVICES.” When the names are standardized it’s much easier to do searches and generate meaningful subtotals for each recipient.
It’s dry, painstaking work, and after three days of it – take it from me – you get into a zone that’s not completely unlike a hypnotic trance. But it’s also fun, in an odd sort of way. It’s like solving a difficult crossword puzzle – something I love doing in my spare time. But it has the added bonus, unlike a crossword, of being useful to the wider world.
I’ll leave it at that and you’ll have to wait another couple of weeks to play with it yourself. (Don’t worry, we won’t be shy about letting you know when it’s coming out.) But take it from someone who’s been digging around inside it: this is good stuff. I don’t know yet what kind of news stories, and maybe scandals, will rise to the surface when a larger audience starts examining this information, but I know it will happen.
That’s the great thing about putting new databases on the web. It’s not just the database builders who get all the joys of discovery – its the users who start typing in questions and digging up answers. Once the data’s up there, the whole world can share in on the same thing I’ve been living with for the past three days: that richly satisfying “aha!” when you find something interesting that nobody ever noticed before.