OpenGov Voices: Data.gov relaunches on open source platform CKAN

by

Disclaimer: The opinions expressed by the guest blogger and those providing comments are theirs alone and do not reflect the opinions profileof the Sunlight Foundation or any employee thereof. Sunlight Foundation is not responsible for the accuracy of any of the information within the guest blog.

Irina Bolychevsky is the Product Owner of CKAN — data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. (@CKANproject) is the leading open source data management platform, at the Open Knowledge Foundation (@OKFN). She led and managed the new release of data.gov from the CKAN team and previously managed the relaunch of data.gov.uk. Follow her on twitter: @shevski.

A huge milestone was reached yesterday with the relaunch of the U.S. government data portal on a single, open source platform. A joint collaboration between a small UK team at the Open Knowledge Foundation and data.gov, this was an ambitious project to reduce the numerous previous catalogs and repositories into one central portal for serious re-use of government open data.

Catalog.data.gov brings together both geospatial as well as “raw” (tabular or text) data under a single roof in a consistent standardised beautiful interface that can be searched, faceted by fomat, publisher, community or keyword as well as filtered by location.

Users can quickly and easily find relevant or related data (no longer a metadata XML file!), download it directly from the search results page or preview spatial map layers or CSV files in the browser.

Of course, there is still work to do, especially about improving the data quality, but nonetheless a vast amount of effort went into metadata cleanup, hiding records with no working links and adding a flexible distributed approval workflow to allow review of harvested datasets pre-publication.

This launch is a key part of the government’s commitment to the newly announced Open Data Policy and marks data.gov’s first major step into open source. All the code is available on Github and data.gov plan to make their CKAN/Drupal set-up reusable for others as part of OGPL. Anyone in the community can take advantage of this mature software to bring together datasets for a hackathon or local data for developers, journalists and citizens to re-use. Open source is a powerful tool for transparency, and also enables high quality software through collaborative improvement, and also ensures good value for money as there is no lock-in to a single proprietary software vendor.

The CKAN catalog is also supporting the requirements being outlined in Project Open Data. Agencies can maintain their data sources individually, publishing standard machine readable formats, and then schedule regular refreshes of the metadata into the central repository at data.gov. This means that data can be managed in a distributed way with CKAN doing all the hard work to federate, validate and parse the metadata into a standard browsable interface.

As part of this work, there have been many additions to CKAN’s geospatial functionality, most notably a fast and elegant geospatial search:

 We have added robust support for harvesting FGDC and ISO 19139 documents (government and geospatial metadata schemas) from Web Accessible Folders (WAFs), single spatial documents, CSW endpoints, ArcGIS REST endpoints, Z39.50 databases, ESRI Geoportal Servers (popular protocols for accessing geodata) as well as other CKAN catalogs. This is available for re-use as part of our harvesting and spatial extensions.

CKAN, a project of the Open Knowledge Foundation, started in 2007, and is also used by the UK, Austrian, Australian, Brazilian, German, Norwegian and soon Canadian federal governments – as well as the European Commission and numerous other national, local and community users.

Built with a flexible and extendable architecture to allow customization and contributions, CKAN has a powerful JSON API with access to all web functionality including search queries and downloads which respects user and publisher permission settings. All open source, we have a very active mailing list, new documentation for installing CKAN and ways to contribute to the code.

Check out details of our latest release, get involved in open data and open content, ask questions or tell us about your latest site so we can add it to our wall of awesome instances!

Notes:

You can also read about the launch in my OKF blog annoucement, the data.gov blog, and Whitehouse recap.

The Open Knowledge Foundation is a global non-profit organization which builds communities and tools to help people open up knowledge (data and content) and make it used and useful. The Open Knowledge Foundation operates in many open data and content domains, including public information, science, research, culture and more, has local groups around the world, and has been pioneering open source tools and open data since its inception in 2004. Find out more and get involved!

Interested in writing a guest blog for Sunlight? Email us at guestblog@sunlightfoundation.com