OpenGov Voices: How California’s putting big local government data online

An image of Marc Joffe, founder of Public Sector Credit Solutions.
Marc Joffe, founder of Public Sector Credit Solutions. Image credit: Public Sector Credit Solutions

Many state governments in the U.S. collect and report local government data. These data compilation efforts are often many decades old and deeply rooted in the hardcopy delivery methods that prevailed before the digital age. Recently, the California State Controller’s Office (SCO) has started to replace its traditional reports with big, open, machine-readable and visualizable data sets.

In California, state reporting on local government can be traced all the way back to a 1911 law that required cities to file financial reports with the state controller. Each year, the controller’s office compiled this data into a book. The first edition of the “Cities Annual Report” was recently digitized by Google and can be found here. In later years, the California state government began requiring counties, special districts and pension plans to submit analogous reports to SCO – also for hardcopy publication.

Over the past decade, SCO published these volumes as PDFs on its web site. As Controller’s Office spokesman Jacob Roper told me, this publication method has a number of limitations. Aside from the obvious problem of the PDF format making analysis of the underlying data difficult, conversion to a book-style presentation delayed the release of the information and also limited the number of fields that could be included. The PDF version of the Cities Annual Report only provides a small fraction of the data SCO collects from reporting cities.

In September, SCO made the transition to open data by publishing the city, county and pension system data sets on Socrata’s Open Data Portal. SCO’s open data website is called “ByTheNumbers” and can be found at In early December, SCO added special district data and refreshed the city and county set with fiscal year 2014 statistics. This update is especially important because until fiscal year 2012 (which ended on June 30, 2012), city and county data were not published until at least 14 months after the end of the fiscal year. This delay has now been reduced to five months, most of which is accounted for by the time given local governments to submit their reports to SCO after year end.

Bulk data sets available on the site go back to fiscal year 2003, enabling users to evaluate relatively long time series of data. Roper hopes to extend the data back further, recalling a recent incident in which a researcher requested county data from 1978. The question triggered a mad search at SCO for a dusty old volume that fell apart when it was placed on a flatbed scanner. If all these old books were scanned and digitized, the full time-series would be available to anyone on demand. I suspect that Google could help SCO backfill the data set, but, if not, the optical character recognition and table recognition tools listed at might come in handy.

ByTheNumbers is the latest of three open data web sites SCO has created. The first,, provides compensation data for state and local employees. State Controller John Chiang decided to build the site after learning about a scandal in Bell, Calif. The Los Angeles suburb with a population of about 38,000 residents was paying its Chief Administrative Officer $787,637 annually – about 30 times the city’s per capita income. Since the site’s launch, it has had eight million hits. While SCO’s PublicPay site anonymizes employees (showing employer and title, but not employee names), a privately managed web resource, Transparent California, adds names to the salary and benefits data.

Chiang’s second foray into data transparency came after California voters passed Proposition 30 in 2012. This measure imposed a temporary tax increase, with proceeds largely earmarked to restore previous cuts in K-12 education and community college funding. At, users can see how much Proposition 30 revenue has been allocated to a school district and how the funds are being spent. The site also provides links to community college district audited financial statements; links to K-12 school district audits will be added in the coming months.

In a previous Sunlight post, I explained the value of these audited financial statements and called on the federal government to publish those that it collects. It is nice to see California state government setting an example in this regard. I hope that they will add city, county and special district financial statements to the universe of audits they publish. In the meantime, California readers can find many city and county audited financials on my web site at – built, in part, with a Sunlight OpenGov Grant.

As the Controller’s office shifts from John Chiang – who becomes State Treasurer in January – to Betty Yee, SCO staff is reaching out to the public for ideas on how to leverage its open data. Last week, SCO announced a “build-a-thon” – basically a month-long, off-site hackathon – in which participants are being asked to build apps on top of the newly released special district data set. Prospective hackers can find the rules here and the data set here (Excel 2007-13 workbook, 76.2mb).

With the help of Socrata’s Open Data Portal, the state of California has taken an enormous step in the direction of making local government readily available to the general public. I hope SCO continues to build out this capability and that other states follow California’s lead.

Interested in writing a guest blog for Sunlight? Email us at