12 Days of Open Data to Get You Through the New Year

by and
Photo of gift-wrapped packages with Christmas lights
Photo credit: Frank Tellez/Flickr

In the weeks since Nov. 30, when federal agencies were supposed to comply with President Barack Obama’s open data executive order by providing a list of all the data they are holding, it has already become clear that the executive order is unearthing treasure troves of information that previously haven’t been available or accessible. This despite the fact that agencies are still making their lists and checking them twice, and so far results have been mixed — so mixed that it’s giving our lawyers itchy trigger fingers.

More about that below. ‘T’is the season of giving! So, first, the good news.

This is not just for wonks! Listen up, advocates, activists, scientists, journalists and citizens: The data that’s being unveiled involves a mind-boggling range of policy areas that touch on citizens’ lives in countless ways. Wondering how your kid’s school is doing? Government data can help you with that! Worried that the factory down the street might be storing dangerous chemicals? You can find out with government data too. There’s a really good statement about why open data is important here. The data we’re talking about represents information dug up for, by (courtesy of your tax dollars) and about you, and it has long been Sunlight’s position that you should have access to it.

So, in the spirit of the season, we’ve picked a selection of shiny new data sets to share with you. Whether you’re digesting a big holiday meal, looking for something quiet to do after a rambunctious New Year’s celebration, or trying to figure out ways to get out of all that “quality time” with friends and family, we’ve got a reason for you to crack that laptop over Yuletide! Here’s to more in 2014!!

For our initial dig, we zeroed in on seven agencies, both cabinet level and independent. We picked agencies that did a few things well right out of the gate, as well as a couple that came up short.

The best:

  • Listed more data sets in their public lists than they currently make available on data.gov (showing they made an effort to comply with the spirit of more openness);
  • Made some attempt to comply with guidance requiring them to make their data listings available in “human readable” formats, and
  • Shared information about their “restricted” data sets in their lists.

Social Security Administration

The Social Security Administration collects data that can be useful to demographers, sociologists, and Buzzfeed. The federal government’s best known safety net doesn’t make a massive amount of information available; only 69 datasets were on SSA’s public data listing at last count, but what’s there is interesting.

Lists of the most popular baby names always make fun headlines, for instance, and the SSA is happy to oblige with datasets looking at national and state trends. Not as cute, but potentially much more useful, the SSA’s public data listing reveals a variety of datasets tracking the language preferences of people with Asian and Pacific Island heritage who apply for disability, retirement, survivors, and supplemental security income benefits.

Environmental Protection Agency

Some agencies collect, and are now publicizing, far greater stores of information. The Environmental Protection Agency has always had a robust presence on data.gov and moved a step further with its public data listing, including information about several hundred previously unaggregated datasets. Many of these datasets, which have restricted access levels, are potentially useful to a number of interest groups.

The EPA data listing includes a group of datasets covering all American Indian Tribal lands in the continental United States and Alaska. Datasets of Risk Management Plan Facilities identify locations that possess “greater than certain threshold quantities of 140 chemicals.” Knowing that these facilities exist and have to produce risk management plans could be helpful to inspectors, safety advocates, and employees, not to mention school administrators and moms and dads.

Department of Education

The Department of Education has made open data into a party — a datapalooza to be specific. Prior to the White House’s open data executive order, the agency had already made the release of machine-readable data sets a priority. The department rolled out the first of its offerings in January to the praise of Education Secretary Arne Duncan and U.S. Chief Technology Officer Todd Park.

Now that their full public data listing has gone live, we know how the government has been tracking the state of education in the U.S. Want to learn how U.S. high school students are tackling the job market? The “nationally-representative” High School Longitudinal Study of 2009 tracks students’ trajectory from secondary school through the postsecondary years, recording academic assessments, career paths and information on how students choose careers in the STEM fields. The ongoing study is currently collecting follow-up data and will have another update on these high schoolers in 2016.

Department of Justice

The Department of Justice wants to monitor the future. Or, at least, the attitudes of future adults. Monitoring the Future: A Continuing Study of the Lifestyles and Values of Youth is the ambitious title given to a yearly survey tracking high school students’ attitudes about, and experiences with drugs.

The study — sponsored by the National Institute on Drug Abuse — samples around 15,000 students from a variety of educational backgrounds and according to the DOJ the study has been conducted “every year since 1975 by researchers at the Institute for Social Research (ISR) University of Michigan.”

Because the Justice Department is one of the few agencies to make a web based public listing of its data set file (some 689 projects in total) it’s relatively easy to paw through the whole list to find other useful data sets.

While death penalty data may not be your idea of a cheery Christmas gift, the Capital Punishment in the United States study from the Justice Department promises to be a treasure trove of information on the backgrounds of inmates sentenced to death. Encompassing 37 years of data, the study compiles information about race, socioeconomic background and previous criminal record, among other factors.

If that’s not enough to quench your thirst for research on the criminal justice system. The Federal Justice Statistics Program: Offenders Released From Prison, 2010 is another DOJ goldmine — containing data on age, race and citizenship as well as the terms for criminal offenders released from prison. Be sure to check out the other studies from the FJS Program as well.

Health and Human Services

Another agency with a mammoth stash of data, Health and Human Services, exposed some studies that were not previously listed on data. gov (they have since been added, though access to many is still restricted). For those hungry for more resources on health care in the United States, the inventory unveiled studies like the MEDPAR (Medicare Provider Analysis and Review) Limited Data Set, which tracks claims from “Medicare certified inpatient hospitals and skilled nursing facilities” or the Medicare Current Beneficiary Survey: A “continuous, multipurpose survey of a nationally representative sample of the Medicare population.”

HHS’ full data catalog is available on the web, so don’t be afraid to dive in.

More disappointing were the performances by two important agencies — one of them a part of Obama’s executive branch. The public data listings provided by both the Office of Personnel Management and the Treasury Department don’t even include as many data sets as both agencies include on data.gov. If these offices want to make it off the naughty list, they’re going to have to learn to share their data!

Office of Personnel Management

The Office of Personnel Management is one of the agencies that has struggled to comply with the Open Data order. As of publication the agency’s data listing only includes 18 of the 68 data sets already available on data.gov. None of the data files are “restricted,” meaning they are already publicly available just waiting to be sliced and diced.

OPM’s public data listing —  marginal as it is —  serves as a good reminder of the mass of data that is kept on the federal workforce. Of particular note are the FedScope Employment Cubes, quarterly reports that contain key data (who, what, where) on the civil service.

Department of the Treasury

The Treasury is in a similar position to OPM. Its White House – mandated data listing contains a fraction of of what is already available on data.gov, though all of the data in this list is publicly accessible. The repository features key data sets like the Treasury Securities Auction Results to the wonkier Federal Borrowings Program Reports Detail Principal and Accrued Balances and Summary General Ledger Balances.

IMAGE of response to Sunlight FOIA request with a red bow attached.
So far, the feds have responded to our request for datasets with an empty package. Click to see letter. (Image by Lindsay Young/Sunlight Foundation.)

Of course, the data sets in this article represents just a tiny fraction of the government information we know is being collected (check out our handy chart to see how each agency stacks up in compliance). We encourage you to explore this data on your own and to stay tuned for more updates on the Open Data Executive Order.

Right now, agencies are only releasing lists of data that are, or can be made, publicly available. Behind closed doors, they are keeping much more comprehensive data inventories, featuring information about data sets that they want to stay private. We’re fighting to open those doors and won’t be satisfied until every piece of data that should be public is public. In the meantime we’ll get busy with what we do know about.

So you can bet that in 2014, we will be digging even deeper into this treasure trove, and keeping tabs on whether the government is making good on its promise to open our data. Count on us to take a critical look at those agencies that struggle or dawdle about complying with the open data executive order.

Happy holidays and happy data mining!