data

 

Filming OpenGov Champions: Sandra Moscoso, Washington, DC

I met Sandra Moscoso at TransparencyCamp last year and was immediately impressed by her work opening up D.C. public school data to improve area schools. She is an obvious choice to be this month’s OpenGov Champion.

  During the daytime, Sandra manages an open data portal at the World Bank for the bank’s financial sector, so she is familiar with the usefulness of open data. But it is her work in her local D.C. community that sets her apart. As a mom of two public school students, she is a member of the Capitol Hill Public School Parent Organization (CHPSPO), which looks to improve the local school system by organizing rallies and bake sales, restoring school buildings and talking to city officials. Sandra is often very hands-on in these activities, but her biggest personal mission with CHPSPO has been introducing the use of open government data as a basis in all they do.

As you can see in the video, she and other CHPSPO members were able to collect data to show how the schools that had a full time librarian had better test score results than those who had lost theirs due to budget cuts. The group was able to use that figure as an effective basis for their request to the city to restore funding for librarians. She also recently sent an open letter to Mayor Vincent Gray, asking for public school data she wanted use in an Open Data Day Hackathon in D.C. The city released the data, and even sent a data analyst to the hackathon, too. Who knows if Mayor Gray's administration would release this data had Sandra not publicly asked for it? Going to Sandra’s home to film the interview felt more like visiting family friends for brunch. Which, in fact, they were preparing as we arrived. Sandra and her husband have a cozy Victorian townhouse in the Capitol Hill neighborhood. We saw some very nice Lego projects and other things created by her smart and sweet kids and heard about how much they love their school.

Sunlight's video team filming at Sandra's home

“I have the best community here in Capitol Hill” Sandra says. She knows most of her neighbors, many of which also actively participate in neighborhood projects and politics. “I want them to stay.” Many D.C. families end up moving to the suburbs in Virginia and Maryland when their kids hit middle school age, as public middle schools in the District have a bad reputation and it’s a vulnerable age for children. She hopes to improve the situation by advocating for better schools, armed with all the open data she can get her hands on and a lot of enthusiasm. Her home was not the only place where we filmed. When I first approached Sandra about filming her for the OpenGov Champion series, she sent me a flurry of links to tons of activities she was doing around town. If you follow her on Twitter, there's barely a day goes by without her tweeting to D.C. government officials, trying to make them see the usefulness of opening their data and that there are people out there like her who really want to put said data to use.

A case in point was when in 2010 she and a group of other engaged parents drafted a proposal using open DC Public School data as well as data they collected for a new middle schools plan that the then D.C. Public School Chancellor Michelle Rhee approved and implemented (although not perfectly.) Sandra thinks that the most effective change happens at the local level, by people who truly care about what is happening in their own community. That is why having access to local government data is so important, and the more detailed and specific the better: it enables OpenGov Champions like Sandra and many others to be better advocates for their communities.

Our OpenGov Champions are remarkable ordinary people who have done extraordinary things to open up our government. Get inspired by their stories and nominate someone in your community to become an OpenGov Champion.

Senator Tester Champions Government Transparency; Reintroduces POIA

Today, Senator Jon Tester reintroduced The Public Online Information Act (POIA) a bill that would take already public government information out of file cabinets and put it online in user friendly formats.

In the Internet era, something cannot be considered public unless it is online. Unfortunately, some of the most important information held by government agencies is hard to access. POIA brings the federal government into the 21st century by enabling the public to access information quickly with just a few key strokes. It also brings together all three branches of government to determine the best ways to make information public.

POIA is common sense legislation that should be embraced by Members of Congress who support an efficient, transparent government.

OpenGov Champions: Shea Frederick, Baltimore, MD

Meet Shea Frederick, our latest OpenGov Champion. Last September, Sunlight’s video team -- myself and Associate Video Producer Solay Howell -- spent two days in Baltimore, MD, with Shea to see how he uses city open data to build useful tools for Charm City residents.

One of those tools is baltimorevacants.org, a dynamic map that lets you search and see more than 30,000 vacant houses and vacant lots in Baltimore. To capture on video the source of that data, we drove around Baltimore filming abandoned houses, streets and even entire blocks that are just left to decay, attracting crime and rats.

 

 

Like Shea says in the video, it’s impactful to see 30,000 vacant houses or lots mapped out over the city. But it is even more powerful to see the actual places. I’m still haunted by the sight of all those vacant, rotting houses with boarded up windows and doors we saw all over Baltimore. As a visual storyteller, I could imagine how each one of these houses has a story to tell. Maybe a factory closed, people lost their jobs, packed up and moved, and after enough of their neighbors had left, the ones left behind could not bear to live on an empty street and finally they all went.

Looking at Shea’s work, I realized that data can be used tell a story too, one from real life that literally “connects the dots” and paints with broader strokes to get the full picture. That’s why Shea loves hacking on the open data the City of Baltimore started releasing in 2011: there is always a real life connection to the work he is doing and he can see it all around him.

Another one is an app called Spot Agent that uses parking citation data to warn you if a meter maid might be close by. Then there’s one that uses the city’s 311 data to show the most common problems occurring in any Baltimore neighborhood based on words that appear the most in the service requests, such as “trash,” “rat,” “illegal” or “light.”

He does a lot of this work with the help of other developers and interested citizens, connected through hackathons and other events. There is a vibrant community for this sort of work in Baltimore such that when the city started releasing its data sets through the Open Baltimore portal there already was an active bunch of people ready to go and put it to use. The city has been pleased with that, as these civic hackers can build something for fun and for free in a weekend that would take them weeks, maybe even months to complete and cost tens of thousands of dollars. Shea has been tag-teaming with the city directly, using the data it released and giving the city advice on how the data could be improved upon, mainly that it should be updated in real time instead of doing a one-time dump.

Why does Shea Frederick spend so much of his own time sorting out this data into meaningful, usable formats when he might as well be competing in a cyclocross race somewhere? Well, for one, he loves what he does. And second, he has grown to love Baltimore and wants to give back by giving others tools that can help them connect with what’s happening around the city. This is OpenGov Championship at work: taking data that’s available and putting it to use, and working together with the local government to make it even better.

Our OpenGov Champions are remarkable ordinary people who have done extraordinary things to open up our government. Get inspired by their stories and nominate someone in your community to become an OpenGov Champion.

 

Two principles to avoid common data mistakes

If David Brooks is correct, the “rising philosophy of the day” is “data-ism.” But you don’t have to believe David Brooks. Just look at the big data (e.g. Google Trends) on “big data.”

For the political junkies, data became sexy in 2012. First, the New York Times’ Nate Silver’s meta-analyses of polling data triumphed over the pundits’ “gut feelings.” Second, the Obama campaign successfully used data analytics to increase voter turnout. This caused people to pay attention (witness, for example, David Brooks’ new devotion to the subject as prime column-fodder).

Of course, for those of us in the transparency and accountability advocacy community, data has long been a prized commodity. And as governments around the world increasingly commit to open data promises, more and more data is becoming available.

At its best, data allows us to transcend our personal anecdotal experiences, giving us the big picture. It allows us to detect relationships and patterns that we wouldn’t otherwise see. Using data smartly can help us to make better decisions about both our own lives and our society.

But it’s important to understand that data and data analysis are merely tools. They can be used well, or they can be used poorly. It is remarkably easy both to mislead and to be misled by data. Hence the old adage: “There are three kinds of lies: lies, damned lies, and statistics.”

For many people, data can quickly overwhelm and confuse. It’s easy to misinterpret data, or to use it irresponsibly. We as humans are not particularly good at intuitively grasping large numbers, and our educational system generally does a poor job of helping us to counter this problem.

For that reason, I want to offer two basic principles that I think could prevent a majority of the data mistakes that I observe:

  1. Cherry-picking works better with fruit than data
  2. Correlation provokes questions better than it answers them

Let’s go at these one at a time.

Read more

Keeping GPO's Data Free

The Government Printing Office's data portal, FDSys, is a major pillar of US government transparency and access to information. Information from all three branches of government is distributed freely and on tremendous scale, often in machine-readable form. They provide the official text of all bills in Congress. They provide the official source of data on all regulatory activity, the Federal Register. They are trying to grow into an official source for federal court opinions. They're adding new things all the time.

Anyone who gets their hands dirty with FDSys can come up with a list of recommendations for improvement, but their existence and overall output is hugely important and increasingly vital. They also set a strong precedent for how to balance the need for authentication with the need to make data easily consumable by third parties, by following the simple approach of providing digests that can be checked when needed.

The public needs the information in FDSys, and we need it to be free.

So you can forgive me for spitting out my drink upon reading that among the recommendations in the National Academy of Public Administration's audit of GPO was that GPO should start charging citizens for the right to download this information.

Given the unique role of FDsys in providing permanent public access to authentic government information, it is imperative for GPO to secure long-term, consistent funding for FDsys through cost recovery and/or appropriation to ensure current and future access to government information. ...

Rather than charge a publication price, GPO could explore charging a small user fee to recoup the cost of providing access to government information on FDsys, or allowing users to view documents for free, and charging for document downloads.

NAPA cites the failed attempt to charge for an earlier GPO Access program, but says "the problem" was payment processing fees, and mentions public outcry as an afterthought:

When GPO Access was launched, GPO charged users for access to digital content. The problem was that the administrative costs of collecting payments were higher than what GPO could charge. Also, there was resistance from public interest groups and other stakeholders.

NAPA goes on to make a very poor argument that because people don't mind paying to enter national parks anymore, they won't mind suddenly having to pay to download government information. This is an argument they are advancing in 2013.

Still, this report is worth taking seriously. The report was requested by Congress, it covers a wide range of issues, and GPO has already held it up as a validation of their mission.

What's clear is that on the specific issue of whether it's acceptable to charge for access to fundamental government information, the answer that's obvious to citizens and advocates on the outside — No! — is much murkier among the various pieces of the US government.

The concern NAPA expresses is that the information in FDSys is too important to be tied to the whims of Congressional appropriators. This is definitely a concern; James Jacobs has already written eloquently about the repeated historical attempts to defund or privatize the distribution of the information GPO is reponsible for. NAPA describes user fees as a way to guarantee that this information will stay available.

However, the solution can't be to ask citizens to pay access fees. There's no such thing as a nominal fee for government information this fundamental. Public services like GovTrack.us, OpenCongress, Scout, and even other government initiatives like FederalRegister.gov, can only exist by first obtaining entire datasets — millions of pages — from FDSys. Imposing access fees for FDSys seriously reduces transparency, crushes innovation and experimentation, and hampers research and analysis.

Instead, the data in FDSys must come to be viewed by everyone — from NAPA to Congress — for what it is: part of the lifeblood of information in the United States. It must remain free.

FDLP Allergic to Curl

Waldo Jaquith discovered that the FDLP (Federal Depository Library Program) appears to have an allergic reaction to people downloading their data with basic command line tools.

Read more

House Begins Publishing Committee Data

The House of Representatives' document portal, docs.house.gov, launched in January 2012 with a surprisingly rich and relevant set of data: all bills and amendments (including drafts) that would come to the floor over the next week, and extensive XML metadata about each document and when it was updated. It's pretty difficult to overstate the value of this data. After all, information on what the House is about to do is vital -- to participate effectively in our democracy, you need to have some lead time.

The House has doubled down on its pledge to keep innovating, and has begun to release what promises to be an expansive set of committee information. Docs.house.gov’s expansion in breadth from floor proceedings to include committee activities provides significant new opportunities for the public to understand how the House functions as well as a much earlier entry point for citizens to become substantively involved in the legislative process.

Docs.house.gov is organized around a new calendar of committee activities that extends what's available on House.gov. The calendar identifies committee activities further in advance than the current system and provides a landing page with extensive information and documents related to committee activities such as the names of witnesses, written testimony, draft legislation, and so on. In addition, each committee activity has associated XML with structured information on both the activity and all related documents, so that developers can easily access and reuse the information. This more than satisfies our recommendation that the House improve how it gives notice about upcoming committee activities.

All documents contained in this portal can be searched and filtered by committee and subcommittee (here's documents from House Rules, for example), and every committee and subcommittee has its own RSS feed (like this one). It’s still not perfect: for example, it's not obvious how one could automatically discover the available XML on the site without scraping any HTML to discover associated IDs and URLs. But this could be addressed by offering a full XML feed of activity, like the House’s Floor page already does.

Taken together, these additions to docs.house.gov provide both a useful set of data and a promising new scope for this important legislative information portal. Our experience has taught us that gathering information in any automated way about House and Senate committees is an extremely frustrating experience, because every committee has its own website and its own way of doing things. Because of this, even while House and Senate floor votes are posted quickly and centrally, we've ended up in a situation where the votes members of Congress take while in committee are in no timely, central location. It's easy to imagine docs.house.gov evolving to become an incredibly useful guide that connects citizens to the information they seek. While it will never replace committee webpages – nor should it – docs.house.gov will help ensure that committee information is made more prominent among the activities of the House.

One additional noteworthy aspect of docs.house.gov is that it is built and maintained by the Clerk of the House. This means that the information it contains is non-partisan and should persist over time. While committee websites are often wiped clean when a new chairperson takes power, docs.house.gov should provide a measure of institutional memory independent of leadership and party. This is a smart move. In addition, we’ve noticed that some of the legislative support agencies have been unwilling or unable to play the role of a central legislative document clearinghouse. Having the Clerk’s office serve as a clearinghouse has managed to sweep aside all the bureaucracy and allow tangible progress to be made. Let’s hope that both the Senate and the legislative support agencies follow this example, now that the House has demonstrated what’s possible.

written by Eric Mill and Daniel Schuman

Making our Data and APIs Bigger, Better and More Accessible

It's no secret that we're dataphiles here at Sunlight, or that we want everyone to have access to the underlying data that powers many of our applications. It's why we've always released downloadable data and APIs (application programming interfaces) that support our data products. You can find most (but not all!) of our offerings at http://services.sunlightlabs.com. To use our APIs, you just need to sign up for an API key. You can watch a video of Tom explaining the general idea.

But that page is several years old, and as Sunlight has grown up over the years, our data offerings have gotten more expansive but also more far flung and difficult to keep track of. That's why we're re-investing in our data. Coming next year, we'll be releasing a brand new site with comprehensive and cohesive documentation on all our apis, as well as an interactive query builder. But this is where we need your help.

What can we be doing better?

We use our own APIs to build products, but we need a better sense of how all of you are using (or potentially aren't using) them. What are we lacking? How can we make them more accessible? What do you wish we had? How can we make the API key signup process better? How can we improve the feedback loop from developer to maintainer? What languages are you using to access our APIs? Do our language specific API wrappers need to be better maintained?

Please let us know your thoughts, and sound off in the comments!

Happy Holidays!

Scout, in Open Beta

We're opening a new tool to the public today for beta testing, called Scout.

Scout is an alert system for the things you care about in state and national government. It covers Congress, regulations across the whole executive branch, and legislation in all 50 states.

You can set up notifications for new things that match keyword searches. Or, if you find a particular bill you want to keep up with, we can notify you whenever anything interesting happens to it -- or is about to.

Just to emphasize, this is a beta - it functions well and looks good, but we're really hoping to hear from the community on how we can make it stronger. You can give us feedback by using the Feedback link at the top of the site, or by writing directly to scout@sunlightfoundation.com.

Read more