The mind-boggling decentralization of education data

A screenshot from the open data portal of Montgomery County (Md.) Public Schools. (Image credit: MCPS)

When writing about a lack of bulk access in our previous post comparing releases of education data, it hinted at another problem: the mind-boggling decentralization of education datasets.

It is bad enough that data about individual schools will often appear in separate files, rather than combined together (for instance, here’s one good example of bulk). To make matters worse, different datasets are frequently only found on different web pages — even hundreds of different web pages. Instead of compiling data on a single site that’s easy to navigate, education data may be hopelessly scattered across pages managed by local principals, districtwide offices and state authorities. As a result, education data can be ridiculously harder for people to find or use effectively.

To be fair, this is far less of a problem for some of the more data-savvy school districts. New York, Montgomery County and (for a handful of datasets) Philadelphia schools go to third parties to manage centralized open data sites. The school districts covering Washington, D.C., Los Angeles, Baltimore, New York, San Francisco and Chicago also have main “data” or “transparency” pages of their own with links to popular education datasets. Still, the latter pages commonly have one of two problems: They aren’t extensive enough themselves, or they simply link to data elsewhere without actually hosting files. The first problem may be worse because it forces potential data users onto a more difficult search for the desired information.

Take the District of Columbia Public Schools (DCPS) for an example. DCPS’s “Student Data” page will give both districtwide and school-level data on enrollments, graduation rates and testing. Say, though, that you’re curious about student attitudes in schools with more disciplinary measures. To find survey information, you might first jump over to “the DCPS Stakeholder Surveys web page” and scan page 37 of the latest “survey report.” To find discipline information, you might then head over to the DCPS “School Profiles” site, click on a school name, go to the “scorecard” tab, and then scroll down to click on “safe and effective schools,” with data on truancy, safety perceptions and suspensions. Since DCPS doesn’t show discipline statistics in bulk, you would have to repeat that last process for as many schools you want to check, or perhaps all 113 of them. If you want to add individual school budget allocations into the picture, the search gets even more complicated.

That’s a lot of trouble for a search of basic education data points, but that trouble has other bad side effects: By making data harder to get, we lose the benefits of more informed debate and more informed decisions about our schools.

Endlessly searching out datasets not included in a central site may sound frustrating, but linking to datasets hosted elsewhere could also cause problems for would-be data users. As the previous post indicated with attendance information on New York’s open data portal, links to data run on different sites may break or fall out-of-date, unnoticed by site administrators yet requiring inter-office hassles to fix. Data hosted elsewhere can also be less than “open,” for instance, without the machine-readable formats that a school district data center might otherwise provide.

As the Los Angeles Unified School District data site suggests, there are sometimes risks to relying on data hosted elsewhere. (Image credit: LAUSD)

Of course, groups completely independent from school districts could take education data releases into their own hands. Apples 2 Apples does a brilliant job working with data from Chicago Public Schools — republishing datasets, interpreting and discussing them, and presenting them with useful visualizations. However, independent groups shouldn’t have to do all of this by themselves, all while trawling through many pages to find the datasets they need.

Apples 2 Apples left a comment a few months ago about Chicago Public Schools’ increasingly open data: “I’m seeing more Excel spreadsheets … [and there now] is easier access to data at the school level than … when I started looking at data in 2011. Especially budget data. Progress.” That’s a good improvement, but here’s a simple tip for school districts to make their datasets vastly easier to access: put as much data as possible in one place. Leaving different datasets stranded between different offices and different schools limits accessibility and creates unnecessary hassles in managing data releases. Centralized open data sites and portals have already worked well for plenty of city governments across the country. School districts should be learning from their example.