OpenGov Voices: Transit data – a major success story for common data standards

Greg Jordan-Detamore headshot
Greg Jordan-Detamore is a data, GIS and design intern at DataSpark.

2016 brings us one of the largest open data success stories to date: the creation of national and global tools for exploring public transit schedule data.

Transit data has been light years ahead of many other open data sets. In other domains, half the struggle is getting governments to release their data at all — and in a machine-readable format. But an important layer on top of this is for the structure of the data to follow a common standard so the public can easily combine data from multiple governments or agencies.

In 2005, Portland, Oregon’s transit agency collaborated with Google to create an online transit directions service called Google Transit. This service then expanded to other cities, along with a standardized format for structuring schedule data. The format — known as the General Transit Feed Specification (GTFS) — has become the de facto global standard for transit schedule data.

As the icing on the cake, a website called GTFS Data Exchange acts as a global compendium of such data, with about 1,000 agencies represented. This is a drastic improvement over having to go to each individual agency’s website to try to find GTFS data.

Yet, even GTFS Data Exchange is being surpassed. It is currently in the process of shutting down, as better tools have come to replace it. One, Transitland, has several components including a single structured repository of transit data that allows developers to access it through an API. (Does that remind you of any other organization?) Another, TransitFeeds, is similar to GTFS Data Exchange except it has a more open contribution process on GitHub.

Making an impact

Many metropolitan areas in the United States have multiple public transit providers, and tools like these are finally bringing the ability to show aggregate data from multiple agencies to the public. All of this data is incredibly useful for creating apps and websites to help people navigate transit. Furthermore, this data can be used to inform public policy.

In March of this year, US Secretary of Transportation Anthony Foxx wrote a letter to transit agencies nationwide asking them to share their GTFS data with the US Department of Transportation so it can create a National Transit Map, with a first release aimed for this summer. This map will provide a snapshot of existing service, which can then be assessed by policy-makers and advocates. “The data is good not only for individual users, but for people who are trying to hold their system accountable,” said Carlos Monje, an official at the transportation department.

Last month, the Center for Neighborhood Technology (with funding from TransitCenter) released a tool called AllTransit, which uses a national database of transit data to allow users to enter a location and see metrics on topics like job access, mobility, equity, and health. AllTransit allows the public to assess the value of transit in their communities, rather than just seeing where it exists.

AllTransit uses data from over 800 agencies to show the socioeconomic benefits of mass transit.
AllTransit uses data from over 800 agencies to show the socioeconomic benefits of mass transit. (Photo credit: CityLab)

The way forward

All of these amazing transit data tools are made possible by the use of GTFS as a common data standard. How can we expand common data standards to other domains?

There are three primary types of leadership that governments can provide:

Leadership by example: This takes the form of a government agency creating standards for a certain type of data and then trying to get peer agencies in other places to adopt the same standard.

Leadership by request: This means the state or federal government asking lower-level governments to release their data using a specific format, like the National Transit Map example.

Leadership by mandate: This means the state or federal government mandating the use of a certain data standard by the relevant governments or agencies that it has power over. In the short term, this is probably a less desirable option, since most domains don’t yet have an established standard. It’s important for standards to see broad use and acceptance before they are mandated. (Note, however, that a mandate to release data in one format does not stop governments from releasing it in additional formats.)

Working across governments, domain-specific groups of public officials can help develop specific standards for their areas of expertise. For example, the National Association of Secretaries of State could help develop data standards for reporting election results and poll locations.

Developers, policy analysts, journalists and advocates are some of the primary users of open data. They can act as champions for open data and common standards, and place pressure on governments.

Companies have a role too. In the case of transit data, Google played a huge role simply because it wants data in GTFS format for Google Maps directions, and most transit providers would like their services to appear in online directions. Yelp has worked with city governments to develop an open data standard for restaurant inspections. More generally, companies can be a part of the chorus requesting and promoting common data standards, and they can help provide funding for open data efforts. (Transitland is a project of Mapzen, which is based out of the Samsung Accelerator.)

The advanced ecosystem of transit data tools is a prime example of what can be accomplished when open data follows a common format. If we can get governments to develop and adopt common data standards for all sorts of data sets, then we can unleash a new era for civic technology and government transparency.

Interested in writing a guest blog for Sunlight? Email us at