Data.gov Clarity

by

We shouldn’t be surprised that people are often confused by Data.gov. It is new, and represents something complicated.

When the current budget cuts were revealed to include cuts to the e-government fund that supports Data.gov, everyone starting questioning Data.gov’s value.

Comments have tried to defend, or sometimes to cast doubt on, Data.gov’s value, through a few partcular lines of question.

Sunlight hasn’t been shy about criticizing this administration, and we’ve certainly been critical of Data.gov in the past. But the current budget fight throws a little clarity on what it is we could lose if Data.gov were to go dark.

I hear arguments that someone needs to define the primary audience, that Data.gov’s primary purpose must be established, and that there hasn’t been enough study on transparency’s value.

There is sufficient confusion around each of these questions that further discussion would be useful. But for the most part, these arguments are people projecting their priorities onto a website with a broader purpose. This happens to Data.gov a *lot*, probably because its purpose is so broad. Our clearinghouse for federal bulk data access is also a cultural symbol, and that leads to all sorts of questions focusing on it. But we shouldn’t confuse those smaller questions with the bigger point — Data.gov should continue to exist.

Data.gov’s goals — audience, users, goals, value — are as broad as the challenge Data.gov is intended to address: data access for all public federal data. That’s a huge and complicated corpus that we’re aimed at, and it would be silly for some Data.gov employee to sit down and try to define a single user.

I have even less doubt about the goals and value behind the site.

Data.gov serves an enormous variety of purposes, goals, and communities, and they don’t have to be considered separately. We don’t have to prioritize between health, GIS, or video archives; that’s the whole point of a clearinghouse. It doesn’t really matter whether we talk about “Data” or “information” online, Data.gov doesn’t have to choose between “good government” and “Gov 2.0,” and we don’t need to choose whether data portals are for elite users are the general public.

Data.gov can do better or worse at all of these things. It can have a great community, or go relatively unappreciated. Have tons of meaningful data, or a meager selection. The site could serve just academics, or primarily as the subject of contests for CS students. Management, design, and marketing are all important, but they aren’t the hills that Data.gov should ever die on.

We’ve got two huge forces at work here. The public data of the American federal government, and the entire online public. Data.gov exists to connect the two, through direct access. In Data.gov, the administration has stated as clearly as possible “public reuse of national data is a public good.”

We take that for granted. In many EU countries, you have to apply for a license to reuse public sector information. Seriously, a license. Everyone clamoring for ROI studies should keep that in mind. It’s easy to forget that the US has public sector information freedom that is the envy of the rest of the world. Our government is jumping through hoops to encourage us to reuse their information, with (generally) zero terms of use. That is a gift horse into whose mouth we should not look.

Now, we still have plenty to gripe about. There should be *far* more information available on Data.gov, and better information. But the people behind the site clearly agree with that sentiment. Go check out the “metrics” tab of Data.gov. Spend a little time there.

This site was clearly built by someone interested with metrics and performance. You can see how many datasets each agency has registered on data.gov per month, how many times each was downloaded, and even how many visits per month are coming from foreign countries. If *any* other federal website comes close to being this transparent, I’d love to see it.

These stats aren’t done for their own sake, either. Data.gov, like ITDashboard.gov, is intended to be a behavioral tool. The site posts stats on agency participation because OMB is trying to get more agencies to post more data.

To me, that’s the biggest overlooked aspect of Data.gov, and the one whose loss I’d fear the most. Data.gov is about responsibility. Forcing agencies to post information is nearly impossible; even FOIA often fails. But in Vivek Kundra, we’ve got someone willing to fight that fight. In Data.gov, the OMB is taking some responsibility for how agencies share their data. Harlan Yu smartly refers to this as something like the procedural infrastructure. I’d go even further, and say that Data.gov represents our national commitment to national data openness.

That’s not something we should lose. If it’s still lacking overall in any of the categories I described above, those are issues that can be fixed. And they will be. Congress is shedding its digital haze and starting to assert itself more, and President Obama’s Memo on Regulatory Compliance Data is forcing agencies to ask important questions.

Data.gov may be imperfect, but it has already and convincingly earned a spot as a big part of the future of American open government.