Sunlight Foundation

Tell The WH How To Improve the Gov'ts Web Presence

Today at 4pm, the White House will host an online chat on how to improve the online experience with Federal websites. Although the event is being framed as a way to cut wasteful websites, it presents an opportunity to give useful feedback on what the government should be doing better online. The right people will be on the other end of the line: White House Director of Digital Strategy Macon Phillips, Federal Chief Information Officer Vivek Kundra, and Director of the GSA’s Center for Excellence in Digital Government Sheila Campbell.

There are some questions that we'd like answered:

  • Why isn't more government data being published online automatically?
  • Why aren't there more APIs?
  • Why isn't more data available in bulk?
  • What's being done to improve the quality of information published online, particularly with datasets?
  • Why do so many government websites look like they were designed in the mid-90s, and what's being done to improve the user experience?
  • What's next for the Open Government Directive?
  • Where's all that lobbying and ethics information the president promised would be online?
  • If this effort is about cutting websites, what's being done to make sure that public information isn't taken offline?
You can ask your questions by filling out a form on WhiteHouse.gov, tweeting with the #dotgov hashtag, or going to the WH facebook page.

You can watch the chat at WhiteHouse.gov or using the WH facebook app at 4pm.

Update: Video from the chat is now available here.

Tools for Transparency: Opening Up Data with Socrata

Socrata logoI've been a long time fan of Google Fusion Tables and have written about the service previously. The platform is great for using and sharing data, but I'm realizing that it pales in comparison to services like Socrata.

Socrata offers a similar service with the same goals of making data easy to find, easy to use and easy to explore.  The most obvious difference between the two services is the level of thought and detail that has gone into making Socrata accessible and user-friendly.  Exploring data sets, embedding and sharing information, creating charts, graphs and maps, measuring engagement of your data and communicating with others on the platform is quite simple. Fusion Tables is missing most of these features and completely lacks a social layer.

At Sunlight, we use Socrata to add context to our blog posts and offer further detail to our projects. To show you a simple example of how you can quickly embed data on your site, check out this list of 23 independent expenditure-only committees I've embedded below -

Powered by Socrata

You can sign up for an account at opendata.socrata.com. Once you're logged in, you can make changes to your profile, begin to explore data, import your own data and test out the various features of the service right away. Because the site is social, you can also search Socrata to find interesting data sets and use them how you see fit. If you're looking to connect more directly with other users, check out the chat function, something that definitely puts this service over the solo experience of Fusion Tables.

For more information on Socrata, check out this video from ABC.com. (Sorry about the ad at the beginning!)

Financial public information slow in coming

Six months old today, the Dodd-Frank Wall Street Reform and Consumer Protection Act provided vast new powers for federal regulators to collect information on financial institutions—information that could be crucial in staving off future crises, some vital elements to be released publicly. Yet the machinery of government grinds slowly, and the public has yet to see much of it.

  • President Barack Obama has yet to appoint a director for the  Office of Financial Research, a new Treasury office that has been called the “CIA of financial regulators” because of its far-reaching power to gather data and information on financial institutions. The office, which is supposed to supply data and analysis for the also newly created Financial Stability Oversight Council, is also required to produce several public databases to help track financial instruments. So far the office’s main action was in November, when it published a proposed policy statement in the Federal Register on creating universal identifiers for a particular legal entity that takes part in a financial transaction.
  • Bizarrely, the federal government has no way of tracking foreclosure rates. Even the General Accountability Office relied on private data sources to do  analysis of the foreclosure crisis. The new law mandates that the department of Housing and Urban Development create a new public database tracking foreclosures. However, an agency spokesman, Brian Sullivan, says the department faces legal and resource issues and that the database may be as many as three years in the making—even as the national scandal over banks employing robo-signers on loans continues to unfold.
  • One of the major transparency provisions of the new law is to bring trading of derivatives—or swaps—into transparency, by requiring that such trades be reported publicly in “real-time”. The Commodities Future Trading Commission (CFTC) published proposed rules for “real-time” reporting in December. While the law requires that the final rules be published by July, the agency notes that “participants will need a reasonable amount of time in which to acquire or configure the necessary systems,” and says that some of the reporting is not expected to be made public until January 2012.
These are just a few examples from the vast world of financial regulation where vital public information is slow in coming. In the coming months, the Sunlight Foundation will be tracking closely the tender transparency points in the way our government regulates financial industries, as well as the lobbying forces that help sway the government’s actions. Stay tuned.

Similac recall: new FDA data on beetle-infested formula, other recalls too

A few days ago I received a plain white envelope in the mail from Abbott, the infant formula manufacturer. Inside was a form letter, telling me that the company was voluntarily recalling some of its cans of Similac powdered formula because they may contain beetle parts. The news was jarring; while I had just started my six-month-old son on solids, I wasn’t planning on including beetle bits on the menu.

The letter was dated September 27, five days after the company had announced the recall; while there were earlier news reports, somehow I’d missed them.

Coincidentally, I’d also just seen that that U.S. Food and Drug Administration (FDA), had launched a new data set on recalls, which we had called for earlier this year and which the agency later confirmed plans to do.  Sure enough, this new data set, which is available in programmer-friendly XML format, contains information about the Similac recall.

I also found the April report when McNeil Consumer Healthcare voluntarily recalled some batches of infant Tylenol and children’s Benadryl, among other drugs, which at the time  set me on a medicine cabinet emptying frenzy to find all the half empty bottles we used with my three kids.

The FDA’s release of these data in this format ultimately should be a boon to parents who would like to get this information in real time, once some enterprising programmers (hello Sunlight Labs!) start mashing it into some useful interfaces that makes it easier to search. It should also help public health, consumer advocates, journalists and others more easily analyze recall data to find out if there are patterns with particular companies or outbreaks. This can only help in a time that has seen recent massive recalls because of salmonella infestation in peanut butter and eggs.

But the FDA is the first to admit that the data are still lacking. The agency does not have the power to require companies doing recalls of drugs, medical devices, or food to provide a standard list of specific information. This includes such basics as an estimate of how many items are affected, the reason for the recall, and the geographic distribution of the product. (While some companies provide this information voluntarily, they’re not required to do so.) So last May, the agency issued a draft proposal--one of 21 transparency enhancing changes suggested by the agency--that it should seek such authority. Getting standardized information from companies would enhance the usefulness of this tool tremendously.

Last week the FDA also released another new data set, this one a database where the public can explore medical device inspections.However, the data only include 25 percent of inspections, and leaves out clinical trial inspections “because information about new medical device development is confidential.”

The trade journal Medical Device Daily (subscription required) calls the data “all bones and no meat,” because a user would still need to file a Freedom of Information Act (FOIA) request to get copies of actual inspection forms that give more detail.

For example, a search of St. Jude Medical, the company that we investigated in our award-winning “Heart of the Matter,” shows a list of five records; clicking on a record, however, shows us that the company was asked to take voluntary action to fix something, without saying why or what. These are data most useful to experts or investigative reporters with plenty of time and resources to dig in, not for ordinary consumers.

Incidentally, the new FDA recall databases appear to be available at Data.gov, the government’s data clearing house, but the medical device inspection data are not. All are now listed on the Sunlight Foundation’s National Data Catalog, here and here.

Next time there’s a recall of food or drugs affecting my kids I don’t want to wait to get an envelope in the mail. I want to get the scoop as soon as the government has it, with all the relevant details. I can’t act on what I don’t know.

Tools for Transparency: Google Fusion Tables

Google Fusion TablesJust look at any one of Sunlight's projects and you'll realize that it takes a mountain of data to help keep government open and transparent.  From district information to campaign expenditures to lobbying dollars, making sense of large data sets is an intensive, concerted effort.

Many of your own projects use dozens of spreadsheets, take up thousands of rows of data and live somewhere on our laptop, accessible only to you.  This works to a point, but in an era of sharing, collaborating and web-based storage, it isn't an optimal solution.

Google Fusion Tables is an experimental project from Google Labs with the goal of making sharing and collaborating on large sets of data much simpler.  Fusion Tables isn't focused on the traditional database system that requires "complicated SQL queries and transaction processing," but is rather focused on "fusing data management and collaboration: merging multiple data sources, discussion of the data, querying, visualization, and web publishing."

Google Fusion Tables allows you handle large amounts of data: you can upload files of up to 100 MB in formats like Excel, CSV and KML. You can also programmatically update, delete, query and visualize data using their API. Plus, you can merge your own data with existing public sets, allowing you to add further value and context to your own information.

While Fusion Tables is an experimental Google project, it shows great potential in allowing the less technically savvy to easily leverage large data sets while communicating and collaborating much more effortlessly.

For more information, check out the video and related links below -

Who uses government data?

“Forty percent of internet users went online for government data or information in the preceding twelve months," according to the Government Online report released in April by the Pew Internet & American Life Project. In other words, 92 million Americans (with 3/4s of Americans online) accessed data on the business of government.

Among the findings:

  • 23% went online to see how federal stimulus money has been spent
  • 22% downloaded or read the text of any legislation
  • 16% visited a site that provides access to government data (such as data.gov)
  • 14% looked for information on who contributed to elected officials
According to Pew, going online for data or information about the government “is not associated with greater or lesser levels of trust in government.” And although whites generally are more likely to visit government data sites, that distinction disappears for sites such as data.gov.

These findings may provide additional impetus for governmental efforts to improve the data offerings at recovery.gov, THOMAS.gov, data.gov, FEC.gov, and from the House Clerk and Senate Office of Public Records.

Tories and Open Gov

There's a new government across the pond with Tory leader David Cameron as Prime Minister. Worth noting is that our friend Tom Steinberg (mySociety) signed on as an adviser to the Tories last year. Tom is a brilliant open government innovator and some of his ideas can be seen in the "Technology Manifesto" presented by the Tories a few months ago. One of the items in the manifesto is very similar to both the Open Government Initiative begun by President Barack Obama and the yet-to-be-enacted Public Online Information Act (POIA):

Legislating to enshrine the freedom of government data and create a powerful new ‘Right to Government Data’, enabling the public to request – and receive – government datasets. This will radically increase the amount of government data released – and will provide a multi-billion pound boost to the UK economy. President Obama’s administration has already implemented a ‘Right to Data’ policy.
Legislating this "Right to Data" is vital for those who support an open government. That's why we support the passage into law of the POIA bill that has been introduced in both chambers of Congress here in the United States. Another proposal offered in the manifesto is also excellent:
Publishing online every item of central government and Quango[1] spending over £25,000 – including every contract in full. This will create new jobs by opening up government procurement to more SMEs. We will also publish online every item of local government spending over £500 – including every contract in full. In addition, detailed information on the salaries of senior civil servants and local council officials will be published online.
The Tories have also promised to use open source software "as much as possible." Another proposal is to allow the public to comment on all legislation before it is debated. This includes the ability to rewrite and reject parts of the legislation.
Hague will say: "A public reading stage for new legislation will throw open the doors of parliament and enable the public to play a role in the legislative process." The party leadership believes its plan is an example of the "post-bureaucratic age" – a phrase first used by supporters of Bill Clinton, suggesting that in the age of the internet voters can exercise a greater influence on figures in authority.
I'm not sure how much the public input will be taken into consideration once a bill reaches the debating stage in Parliament or whether there is any binding nature to the revisions made by the public. While I'm supportive of providing time and space for people to give their input on legislation there are numerous problems with requiring that input to be adopted in the legislation. In general, there ought to be more input from the broader public in the legislative process. Depending on how this policy is structured it could be a very useful tool or an obstacle in the legislative process.

All of the other proposals are outstanding just as they are. Hopefully the new government follows through on their promises.


1 Quango is an acronym for a quasi-autonomous non-governmental organization. For more information, click here.

Federal agencies drop their data IOU notes

Federal Agencies Drop An IOUCowritten by Laurenellen McCann

When federal agencies released their open government plans earlier this month the thing I was most excited for was new data. While the Open Government Directive didn't specifically require agencies to submit previously unavailable data, 75 new data sets have been promised for public release.

Some of these new data sets have never before been seen by the public. For others you needed to purchase expensive proprietary software. A few only covered a few years and are now being expanded - some to the turn of the 20th century!

There's no single good methodology for determining "what's new" just like there isn't a single good methodology for defining a "single continuous data set". To be clear this is a list of new data sets that are to be released on or after April 7th, 2010. Many government agencies released new data earlier in the year as well. This isn't an exhaustive 2010 list - it's a look into the future.

We've saved you the trouble of going into each plan individually by publishing the spreadsheet below. You can also download the information in a variety of formats by clicking on "Menu" in the upper left corner and then "Download this data." View the data full screen by opening up the "Views" menu:

Open Government Plans: New Data after April 7th, 2010

This spreadsheet is meant to serve as a resource for citizens, journalists and government officials to get a heads up on what data the entire federal gov. has committed to publish. It might come as a surprise that of the 31 agencies that published their Open Government plans on April 7th, only 16 are responsible for proposing the 75 new sets of data. The key word is new. Most government agencies promised to release data tools that were actually aggregators, dashboards, or other web services that run on information you can already find on Data.gov or other agency web sites. Others counted their recent releases of information that is already public and published every year. These data sets did not meet our "newness" criteria. In order to account for what information is genuinely new a data set had to be newly released -- that is, the data must have never before made available to the public online in a freely readable format. It also had be named, evidence that the agency releasing the data is actually initiating the process of opening up this information. Where possible included hyperlinks to additional information about the data set - in some cases it's even the link to the page where the data will be published. This spreadsheet is meant to serve as a resource for citizens, journalists and government officials to get a heads up on what data the entire federal government has committed to publish. Our Reporting team has already been writing about new data coming out of the Open Government plans and pointing out places where work still needs to be done. We're also releasing a single file download for all the federal Open Government plans. Instead of having to round up all 31 files individually, you can just download this single ZIP and get them all in one go. If you're looking for additional information about the Open Government plan of a particular agency, the White House has has published a list of hyperlinks. Beth Noveck, Deputy CTO and Director of the Open Government Directive has published a look into the horizon from her perspective. She also submits her highlights from other sections of the Open Government plan. Our focus has been the data section of the transparency plank - agencies were also asked to develop plans on participation and collaboration. Any ideas on other ways we can make the Open Government Directive more useful to you is welcome in the comments. Be sure to sign our transparency pledge to keep government accountable to their promises! Stay tuned for the week before May 1st - the White House will be releasing their own assessment of the Open Government plans. Photo credit: “Federal IOU” by Laurenellen McCann Photo model: Nicko Margolies

Local Spotlight

In honor of  Tax Day Ohio's Jason Hart took a look at salary information for Franklin County's administrators, specifically looking at raises over the past three years.  He got some interesting results -- most places the highest level people didn't get raises except for one department.

Friday, April 9th is Tax Freedom Day, when the average American has earned enough to pay Uncle Sam and Uncle Sam’s various relatives what they demand. Ohio is somehow a day ahead of the average, so in honor of the big day tomorrow I thought I’d dig through some salary info for public administrators here in Franklin County. As boring as I am, I ought to make an effort to avoid any talk of numbers or statistics. As stubborn as I am, I won’t!

With employment and the economy in general down for the past year and a half, I wanted to see how the smallest of government big-shots were rewarding themselves relative to 2007 and 2008. Despite widespread populist railing against private industry salaries and bonuses, I expected to see pay increases for the insulated local bureaucrats our tax dollars keep employed. Given some of the things I’ve read recently, I was pleasantly surprised by the data.

A helpful CPA in the Franklin County Auditor’s office responded to my public records request promptly, with salary data on all Franklin County employees from 2007-2010. Download the Excel file if you’d like to check my numbers or do some analysis of your own. I’ll list hourly rates instead of annual salaries, as 2009 contained 27 pay periods instead of the usual 26. Let’s start with the highest branch on the Franklin County tree, shall we?

Read the rest here.

This is a really fascinating use of data and another reason why we can all benefit from continuing to demand that all kinds of information be made available online.  Information than can empower people to find the right questions to ask their government.

What Open Government plans could learn from retail management

retail

After working several depressing retail jobs in my teenage years, I used to think that it was a kind of job I would never wish upon anyone. After reviewing the open government plans of 29 federal agencies, I'm starting to take a second look at the lessons I learned at those jobs.

For example, it gave me a deep appreciation for the need to conduct occasional inventories of the store: a listing of every single piece of merchandise under the store's roof. In my assessment, the majority of the open government plans failed to provide clear inventories of the "high-value" (a problematic term, as we've discussed before) data.

Department of Commerce - Data Inventory

Most plans gave a general narrative of the type of data that was out there without actually creating an invoice of said data, hyperlinks, citations or even a spreadsheet - in other words, no inventory!

Given the importance of inventories in retail, it shouldn't be a surprise that the Department of Commerce (DOC) provided one of the best data inventories. A screenshot of the inventory including a link back to their open government plan can be found at right.

To give credit where it's due, the General Services Administration (GSA) also had a pretty solid inventory [PDF] (page 55). It's not surprising since the GSA is responsible for acquisition solutions of supplies for many government organizations.

Last week, we devoted a fair amount of digital ink to highlighting the shortcomings of the data in open government plans so I wanted to make sure we continue showcasing the awesomeness of certain aspects of particular agencies' plans. The kudos to the DOC doesn't stop with their data inventory. Clear organization and concise writing typified the DOC’s “What Commerce Will Do” section. It also helps that the plan is written in plain English.

I urge you to read that section in its entirety [PDF] - it starts on page 4. The real star of this section is the National Oceanic and Atmospheric Administration otherwise (NOAA). Factoring out my automatic positive association with the name the new NOAA data being released is absolutely great.

Whether it's digitizing weather station data from the 18th and 19th centuries or making public for the first time soil moisture observation data, the new data from NOAA will improve climate studies and help business make better economic decisions. NOAA was already putting huge amounts of data online, even before the Open Government Directive. Recognizing that the data is sometimes hard to find, NOAA is also expanding the scope and functionality of its Climate Services Portal to help citizens and scientists find the data they need.

The Sunlight Foundation has been focusing its eye on the transparency plank of the open government plants, specifically on data transparency. We'll continue to do so this week but it's important to note that transparency is only one of three Open Government Directive planks: the others are participation and collaboration. Agencies were also asked to come up with an open government flagship initiative. Heather West of the Center for Democracy and Technology has a great post on Govfresh highlighting certain flagship initiatives.

We'll continue to dig deeper into the transparency portion of the open government plans and link to other evaluations going up round the net. If you see a perspective on the plans we've missed drop it in the comments below!

Photo credit: "Discoveryland Retail Packaging" by Flickr user Design Packaging.

« Previous
1 2 3 4