Former (and future) Sunlighter Zack Maril discusses his initial forays into advanced entity deduplication approaches.Continue reading
Sunlight’s Priorities for the Next Administration
Regardless of who wins the presidential election, the next administration will have enormous power to say how open our government will be. We have organized our priorities for the next administration below, to share where we think our work on executive branch issues will be focused, in advance of the election results. From money in politics to open data, spending, and freedom of information, we'll be working to open up the Executive Branch. We'd love to hear any suggestions you might have for Sunlight's Executive Branch work, please leave additional ideas in the comments below. (We'll also be sharing other recommendations soon, including a legislative agenda for the 113th Congress, and a suite of reform proposals for the House and Senate rules packages.) Sunlight Reform Agenda for the Next Administration:Continue reading
The Consequences of the e-Gov Cuts
If you haven't already, please be sure to check out my colleague Daniel Schuman's post over at the main Sunlight Foundation blog, where he details the consequences of the cuts to the e-Gov fund. The short version: in a letter to Sen. Carper, federal CIO Vivek Kundra is reporting that the cuts will negatively affect upgrades to a broad variety of executive branch transparency- and good-government-related websites; lead to the cancellation of FedSpace and the Citizen Services Dashboard; and hinder efforts at improving data quality.
There's no doubt this is bad news -- that the administration is already making excuses for not following through on fixing data quality is particularly discouraging. But there's also no question that things could have been worse. This fight isn't over yet, but our community has already made a big difference.
So thanks for your help, and for sticking with us as we try to ensure that our government doesn't stagger backward from its early, tentative steps into the online era.Continue reading
House Oversight Subcommittee Discusses Problems with USASpending.gov Data
On Friday, Ellen testified in front of the Subcommittee on Technology, Information Policy, Intergovernmental Relations and Procurement Reform, a subcommittee... View ArticleContinue reading
USASpending.gov Data Quality — Still Bad?
We at the labs have written about USASpending.gov several times now. We’ve recently been able to make use of their bulk data downloads to regularly populate some of our webapps with federal grants and contracts data. However, we also have an old snapshot of the data that we received in April of 2010. This snapshot was received on a hard drive that we shipped to USASpending engineers -- before the bulk data downloads existed. Thankfully, we don’t have to go through that process anymore. I wondered how the data has changed over the past year. Last year, the USASpending team took a lot of flak for their data quality issues. Has it been improved? I thought I’d take a look back and see how two data snapshots from April 2010 and December 2010 compare.Continue reading
Carrots and Sticks
The response to Clearspending has been overwhelmingly positive. People seem to care about government spending data quality to an extent I never would have anticipated. It's encouraging, and it makes me think we have a real shot at getting these problems fixed.
But there are some people with a different perspective. One of them is Gunnar Hellekson, who wrote a thoughtful blog post about why he disagrees with our approach. Naturally I don't plan to write responses to everyone who disagrees with us. But we really like and respect Gunnar, and he raised some important points in his post. To wit:Continue reading
Government Data Sets – Managing Expectations
US Open Government plans were released today. As part of this process, federal agencies are beginning to release data sets publicly in ways they never have before. Some substantial and thought-provoking blog posts over the last few weeks have discussed how government can do open data well.
There are significant cultural and social sticking points that have yet to be addressed in releasing data openly. A discussion with a colleague from NASA last week confirmed how far away most agencies are from the luxury of considering the innovative ideas for data set management available to them. Here's why:Continue reading
Vice President Biden issues new memo on Recovery Act reporting
Yesterday, Vice President Biden announced a new memorandum seeking to tighten reporting of stimulus dollars spent under the American Recovery... View ArticleContinue reading
Quantifying Data Quality
You've already heard me complain about data quality -- how it's a bigger problem than most people realize, and a harder problem than many people hope. But let's not leave it there! Perfect datasets mostly exist in textbooks and computer simulations. We need to figure out what we can do with what we have. In this and other posts, I hope to give the developers in our community some idea of how they can deal with less-than-perfect data.
The first step is to figure out how bad things actually are. To do that, we'll use some simple statistics -- those of you with a strong stat background can skip to the next entry in your RSS reader (or better yet, correct my mistakes in comments).
Data Quality Deserves to be Tackled on Its Own
Last week Clay wrote about how we'll be evaluating /open pages released under the OGD. The post ended with a series of considerations that we think are important: completeness, primacy, timeliness, accessibility, machine readability, availability without registration, being non-proprietary, freedom from licensing restrictions, permanence and obtainability.
One thing is conspicuously missing from the list, though: quality.Continue reading