What information do citizens want? Results from analyzing public record requests

by

In our last blog post, we discussed how adopting an open data program significantly decreases the volume of public record requests (PRRs) cities receive, saving cities time and money and providing citizens easier access to the important public information that they need. But we also found that cities need to publish datasets that are of high interest to citizens – to get the benefits of open data and see a decrease in PRRs.

We analyzed which types of information are the most popular to help cities do that.

Cities can use PRRs as a guide to prioritize release of data that is of greatest interest to residents, which enables cities to see a higher rate of return on proactive publication of data through decreases in PRRs. But to achieve these efficiency gains, those inside City Hall tasked with implementing open data programs need to be able to answer the pivotal question: what types of information are citizens requesting?

To answer this question, we analyzed the content of 110,063 PRRs from 33 cities. We used a machine learning algorithm to group the raw PRR text into 60 coherent categories. We then mapped these 60 topics to the type of data that the city would provide to satisfy the request. This process revealed 19 data types, ranked by popularity, outlined below (controlling for cities with a disproportionately large number of requests).

Though Crime and Property were the biggest areas where residents requested public information, diving deeper, we can see that police incident reports and parcel records or permits or plans were the most highly demanded types of data. The next most highly demanded data types were criminal record checks, auto collision reports, or other uncategorizable requests. It’s possible that private companies or residents make requests for these types of information to file or review insurance claims.

Below is the list of post popular data types requested in the cities we reviewed, grouped by the level of demand for that data.

Looking on a national scale, cities might proactively release police incident reports and parcel permit records, permits, and plans to reap the benefits of open data. For example, the city of San Francisco recently developed a new system for releasing police incident data balancing citizen privacy with timely data release. The new system aimed to reduce lag time from 2 weeks to 1 day and included homicide incidents.

These findings also reveal the importance of understanding demand at a granular level – our analysis found that crime statistics and other police department administrative data were demanded far less frequently than individual police reports. This tells us that it is not enough to know that crime data is of interest, cities need to know what kind of crime data people are requesting to accurately respond to demand.

However, just because a topic does not come up frequently in PRRs, does not necessarily mean that citizens are not interested.

Open data and FOIA serve complementary, important roles in information access

There are a number of datasets that we expected to see commonly appearing in PRRs, but that were not identified above as the “most popular datasets” residents were requesting. For example, while budget data is a very popular open dataset, it did not appear in our review of common PRRs. This may be because cities routinely publish budget information even in the absence of a formal open data program, often in lengthy pdf documents. The same is true of aggregate crime statistics.

Analyzing PRRs isn’t the only way cities should try to understand and meet demand for open data. For certain communities, publishing usable and machine-readable data can transform how communities use open data. For example, our research didn’t find particularly high PRR for transportation-specific categories, but transportation is the second most accessed category of open data. We might say that when cities chose to begin to publishing GTFS-standardized open transit data spurred the formation of a vibrant new user community.

PRRs will continue to play a critical role in providing citizens access to government information that may not be able to be published as open data. Some of the information that’s significantly requested via PRRs would be difficult to publish as open data, such as Police Department audio and video. Even as adopting an open data program decreases the volume of PRRs a city receives, PRRs remain a critical complement to open data to ensure robust access to information.

While cities can look to and learn from nationwide studies such as this one, there is still no substitute for targeted user research, to ‘ground-truth’ national trends and understand the local specificities of demand. We found significant variance in the most popular data types and topics when considering results for individual cities. This variance is starkly illustrated by the demand for accident reports for insurance claims across cities, which was the overall most popular single topic in several of the approaches we tested.

When we looked the individual city results, we found that the result was driven entirely by cities in Washington State. Indeed, when we used the dampened popularity metric which reduces these city-specific effects, this topic fell to number 34 in the rankings. We saw similar variance in the demand across cities for other topics generated by our model.

We set out to answer the question of whether open data and FOI are competitors or complements and ultimately conclude that they are both; open data can provide a substitute many PRRs (especially when data release aligns with citizen demand) while FOI remains a key channel to ensure access to some types of highly demanded information that cannot be readily released as open data.

We encourage cities to use their PRRs as a tool to design demand-driven open data programs by reading the detailed methodology in our white paper and replicating our analysis (which be found in full on our github).