transaprency

 

Improvements Needed For High Value Datasets On Data.gov

This morning a number of organizations -- POGO, OMB Watch, CREW, National Security Archive, the Center for Democracy and Technology  and the Open The Government coalition-- and Sunlight sent a letter to Vivek Kundra, Federal CIO, about improvements needed to the release of High Value Datasets on Data.gov. Here are the core recommendations included. Please tell us what you think in the comments below.

As advocates for government openness, we support the Administration’s efforts to provide the public with access to information through Data.gov. We are eager to work with you to ensure the success of Data.gov and, in that spirit, write to raise our concerns with the datasets submitted by agencies to fulfill their requirement under the Open Government Directive to post three high value datasets by January 22, and to offer constructive suggestions for improving their usefulness. As an overall recommendation, we urge you to add public representatives to the Open Government Initiative interagency working committee and ask the committee to address the problems and recommendations identified below. Release Format and Usability by the Public We understand one of the primary purposes of Data.gov is to enable the technology community and transparency advocates to most effectively use the data to make a direct impact on the daily lives of the American people. The format of the data plays a key role in its usability; many within the community of advocates who re-use and repackage government data would prefer data in CSV format, rather than the XML format in which many of the posted databases are provided. Accordingly, we recommend that you strike an appropriate balance between formats (such as XML) that serve the coding community and web-based presentations by agencies that can be used and understood by the general public. In addition, some of the currently posted files are quite large, ranging upward to several hundred megabytes. Their large size undermines their usefulness for most people or organizations. The large number of currently posted datasets also makes it difficult to find a particular database of interest. We therefore recommend that if a Data.gov dataset is available from an agency through a web-based interface, Data.gov link to that interface on the dataset's Data.gov landing page. For a consumer looking for information on a car seat, for example, it would be far easier to search the Department of Transportation's online database rather than scrolling through screen after screen of raw data in XML format. Additionally, as agencies continue to post datasets to Data.gov, efforts should be made to identify those of greatest public interest that lack such interfaces and develop web interfaces that allow the data to be explored online. Further, while we agree there is value in aggregating government data in a single site, it is questionable how much the collocation of the currently posted information on Data.gov actually benefits the public. The site is not searchable by topic and does not provide any way to bring together data from different sources on similar topics. As an enhancement to the organization of the site, we recommend that you use tagging or metadata to enable the public to bring together information on a topic. The thesaurus that USA.gov uses provides a useful example of the needed vocabulary. Value of Data The release of the datasets also has prompted discussions about the value and the quality of the released data, and the additional value provided by access to existing data in a new format. We believe repackaging old information is of marginal value, yet that is what many agencies have done with their recent postings on Data.gov. According to the Sunlight Foundation, of 58 datasets posted by major agencies, only 16 were previously unavailable in some format online. This leaves the impression that agencies posted easily available data, the proverbial low-hanging fruit, rather than seriously considering which of their datasets truly are of high value. While these initial postings can be considered a test run, more attention needs to be directed toward ensuring the overall quality and usefulness of the data. In addition, sustained attention should be paid to the possibility of making some of the datasets available as feeds that are constantly up to date, rather than as static datasets that are pulled down and then reposted on an occasional basis. We recommend that agencies be required to explain why the data is high value by having them designate which of the “high value criteria” the data meets: information that can be used to increase agency accountability and responsiveness; improve public knowledge of the agency and its operations; further the core mission of the agency; create economic opportunity; or respond to need and demand as identified through public consultation. Similarly, we recommend requiring agencies to indicate whether a high value dataset was previously unavailable, available only with a FOIA request, available only for purchase, or available, but in a less user-friendly format. Going forward, this will make it much easier to track how agencies are complying with the other requirements of the Open Government Directive. While we appreciate the value of data that furthers the mission of an agency, we believe it is equally important to make available to the public data that holds an agency accountable for its policy and spending decisions. We hope to see more datasets of this type available in the near future. Quality As is to be expected in efforts of this type, there were a number of glitches--datasets that could not be downloaded or, once downloaded, could not be opened (the Central Contractor Registration FOIA extract from the General Services Administration seems to have caused several users problems). Additionally, some datasets were incomplete (the Hazard Grant Mitigation Program data released by FEMA is missing 23 years of data between 1966 and 1989). Even more troubling, some did not have header rows, and for those that did, their Data.gov pages did not always link to code sheets explaining what those header rows meant. Without this information, the data cannot be used. We therefore urge the implementation of a responsive feedback mechanism that allows the public to alert an agency that a specific dataset is not working, lacks information, or is missing explanatory material and provides a response to the concerns within a specified time. One way to address this may be to include an agency contact with the ability to resolve any database problems or provide information about the database. The interagency working group could sample the quality of these agency-specific dialogues to ensure that they are having an impact and to develop recommendations on best practices to improve the responsiveness. Additionally, we strongly recommend that all datasets on Data.gov be directly associated with their code sheets. Finally, we are concerned with the current lack of public notice when data is removed from the site. We respectfully urge you to note all raw tools and data that are removed from Data.gov, and to provide an explanation for their removal. Many of the concerns outlined above apply across all or many of the agencies’ datasets. Accordingly, we think that standards for handling these types of problems can easily be addressed through the interagency working group and then disseminated amongst the agencies.

Transparency Reforms on List of President's Priorities

Copies of the SOTU speech are now circulating and there are several things in it that Sunlight is extremely happy about.

First, the President will call for the establishment of a single Congress-wide database so that all of us can track earmarks. A state-of-the-art, user-friendly online database, one that allows users to search, sort, and download machine-readable data, will spur more citizen interest and involvement  -- and accountability -- in federal budgetary questions.

Sunlight has long advocated transparency to ensure that earmarks reflect the public interest. There is a long history of members abusing earmarks, requesting funding to build bridges to nowhere and to reward political allies, family members and even for personal enrichment. These abuses were most prevalent when there was little transparency in the process. Until 2007, members did not disclose which earmarks they requested, recipients were not named and individual earmarks were scattered throughout a dozen or more congressional committee documents that totaled hundreds of pages.

While the last two Congresses have improved earmark disclosure, it’s still impossible for a citizen to find, in a single place, all the relevant information about the projects their elected lawmakers request before votes are taken on them. What the President is requesting -- a centralized database with information posted before final decisions are made -- is a much-needed change.

Second, the President is calling for more complete disclosure by lobbyists when  they are lobbying the White House or Congress. Under his plan, each contact would be reported, presumably with enough specificity to be meaningful.  Sunlight believes strongly that such disclosures should be made electronically, published promptly and maintained online in a downloadable, searchable, sortable format. We believe that disclosure should include all legislation and regulations discussed and all requests for specific services or government funding. Legislative contacts should be reported within 24 hours of any meeting. In addition, the requirement that contributions by registered lobbyists be reported semiannually should be amended to require contributions be reported within 24 hours of being made.

And third, the President calls for fixes to the campaign finance system in the wake of the Citizens United Supreme Court decision. We believe that this decision certainly calls for an immediate update to the entire campaign finance disclosure law regime — covering everything from who has to disclose, what is required to be disclosed, how often, and in what form – whether the spending comes directly from corporations’ or unions’ treasuries, from lobbyists, political parties or the candidates themselves. Clearly, now more than ever, our entire system of public disclosure of election-related contributions and expenditures needs to be upgraded to keep pace with the influences it is designed to track. And with the technical capacity we now have in this 24/7 world, this means that disclosures must be filed online, in real time.

We applaud the President for making these new initiatives and stand ready to consult with Congress and the administration to find the best technical means to accomplish these goals.

Excerpts from the Speech below:

Rather than fight the same tired battles that have dominated Washington for decades, it's time for something new. Let's try common sense. Let's invest in our people without leaving them a mountain of debt. Let's meet our responsibility to the people who sent us here.

To do that, we have to recognize that we face more than a deficit of dollars right now. We face a deficit of trust - deep and corrosive doubts about how Washington works that have been growing for years. To close that credibility gap we must take action on both ends of Pennsylvania Avenue to end the outsized influence of lobbyists; to do our work openly; and to give our people the government they deserve.

That's what I came to Washington to do. That's why - for the first time in history - my administration posts our White House visitors online. And that's why we've excluded lobbyists from policy-making jobs or seats on federal boards and commissions.

But we cannot stop there. It's time to require lobbyists to disclose each contact they make on behalf of a client with my Administration or Congress. And it's time to put strict limits on the contributions that lobbyists give to candidates for federal office. Last week, the Supreme Court reversed a century of law to open the floodgates for special interests - including foreign companies - to spend without limit in our elections. Well I don't think American elections should be bankrolled by America's most powerful interests, and worse, by foreign entities. They should be decided by the American people, and that's why I'm urging Democrats and Republicans to pass a bill that helps to right this wrong.

I'm also calling on Congress to continue down the path of earmark reform. You have trimmed some of this spending and embraced some meaningful change. But restoring the public trust demands more. For example, some members of Congress post some earmark requests online. Tonight, I'm calling on Congress to publish all earmark requests on a single Web site before there's a vote so that the American people can see how their money is being spent.