Today, a coalition of groups and individuals concerned with open government urged the Senate Committee on Homeland Security and Governmental... View ArticleContinue reading
This past Saturday was the second annual International Open Data Hackathon, a globally coordinated day for people to gather and hack on open public data from the world's governments. As part of this, POPVOX hosted an Open Data event here in DC at the MLK Memorial Public Library.
Several Sunlighters showed up, and we had a pretty great time. Andrew and I came expecting to work alone on our project, an ambitious attempt to bridge the data gap between legislation and the regulations they generate, that we're tentatively titling Crosslaws. Instead, after we (and everyone else) described our project to the room at the start of the day, we had 6 people come to our table and ask how they could help - 5 of whom weren't developers at all.
Despite Andrew and I not having any obvious tasks to hand out, after we explained the finer points of the work, everyone figured out their own valuable research and development to do for the entire course of the day, from scholarly articles to actual parsing code. You can find some of our group's notes on the Crosslaws wiki, as well as an overview of what's left to be done (there's a lot!).
Drew and Daniel went to the hackathon to work on their statistical analysis of USASpending data, using Benford's Law. They were hoping to find a stats wizard to help rigorously test the findings, and while they weren't able to find one, their search was still fruitful. The project did attract interest from a handful of very thoughtful people, and they had a long discussion that helped refine the goals of the project. Drew was very thankful for that, as he came away from the hackathon better focused on a concrete goal. At the end of the day, they had the parser and downloader written, but weren't able to download enough data to test it thoroughly. You can find Drew's team's code on Github.
In general, it was a fantastic crop of people who showed up on a Saturday morning at the MLK Library, from awesome self-directed policy people, to talented folks from the DC and federal governments. My project got real momentum from it, and we'll be capitalizing on that momentum with more work over the next couple months. Given all that, the hackathon felt like a real success to me, and I'm looking forward to next year's.Continue reading
Today we're launching 6° of Corporations, a new micro-site that provides some insight into the complicated area of corporate identity. It may sound trivial, but uniquely identifying a corporate entity is not easy. For federal contracting data (like in USASpending.gov), DUNS numbers are used to (supposedly) uniquely identify a contractor. However, there are problems in not only how DUNS numbers are issued and maintained, but also with the agency's use of DUNS numbers. To help illustrate this, we’ve created a visualization that shows the relationship between company names and company DUNS numbers in USASpending.gov.Continue reading
We’ve often looked at the macro perspective with this data, but what if we followed the transactions of a single program? Would we be able to understand and follow the data easily and provide citizen oversight, as was the intent of the legislation behind USASpending.gov?Continue reading
In the last few weeks there’s been a whirlwind of news and speculation about what will happen to the federal... View ArticleContinue reading
Those interested in the business potential of government data will definitely want to check out Washingtonian's story about Bloomberg Government. It's a good introduction to what really does seem to be the D.C. media landscape's newest 800 lb. gorilla (albeit a very quiet and well-behaved one so far).
Readers of this site will probably be most intrigued by these two pragraphs:
[...] BGov subscribers, of whom there are currently fewer than 2,000 individuals, get something potentially more valuable than news. BGov’s “killer app”—the feature that sets it so far apart from its competition that prospective customers will feel compelled to buy it—is a database that lets users track how much money US government agencies spend on contracts, something no other media organization in Washington offers. Users can break down the spending by agency, company, amount, or congressional district; they can track the money over time; and with a single mouse click, they can call up news associated with the companies and the type of work they do. They can also see which contractors are giving money to elected officials.
All that information is extraordinarily hard to gather, largely because the government doesn’t store it in one place. But when it’s collected, and explained by journalists, the data has the potential to give businesses an inside track on winning government deals. It shows where spending trends are heading and thus where the next business opportunity lies.
Data quality problems aside, this is true as far as it goes -- I've seen a demo of the BGov interface, and it really is quite impressive. But in fact the data isn't that spread out. Between Sunlight's APIs, bulk data from USASpending.gov, GIS data from Census and the admittedly hard-to-scrape Regulations.gov, any startup with enough time and technical talent could replicate the majority of the site's functionality (the business intelligence data provided by Bloomberg Financial is an admittedly tougher nut to crack). That's the great thing about public sector information: it's there for the taking. Anyone can use it.
I've written about this before, and generally argued that government data is a tough thing to create a business around because there's no way to prevent competitors from undercutting you. But there's money to be made in the undercutting. Mike Bloomberg thinks it's worthwhile to bet $100 million on reselling government data. He's made some pretty good business decisions in the past. A smart startup might want to take the hint.Continue reading
A roundup of what we’re noticing in the Reporting Group as we dig into government data and disclosures: By the... View ArticleContinue reading
Today we're launching Clearspending -- a site devoted to our analysis of the data behind USASpending.gov. Ellen's already written about this project over on the main foundation blog, and you should certainly check out her post. But I wanted to talk about it a little bit here, too, because this project is near & dear to my heart, having grown out of work that Kaitlin, Kevin and I did together before I stepped into the role of Labs Director.
The three of us had been working with the USASpending database for a while, and in the course of that work we began to realize some discouraging things. The data clearly had some problems. We did some research and wrote some tests to quantify those problems -- that effort turned into Clearspending. The results were unequivocal: the data was bad -- really bad. Unusably bad, in fact. As things currently stand, USASpending.gov really can't be relied upon.
You can read all about it over at the Clearspending site, and I hope you will -- in addition to an analysis that looked at millions of rows of data and found over a trillion dollars' worth of messed-up spending reports, we spent a lot of time talking to officials at all levels of the reporting chain. I don't think you're likely to find a better discussion of these systems and their problems.
And make no mistake, these systems are important.Continue reading
USASpending.gov got a face-lift on Wednesday evening, and it brought with it a raft of new features. Some of these are great; others are either not very useful, or an actual step backward. Let's run through them -- not only to highlight the features and shortcomings, but to examine what they can tell us about how government should be opening its data.Continue reading