As stated in the note from the Sunlight Foundation′s Board Chair, as of September 2020 the Sunlight Foundation is no longer active. This site is maintained as a static archive only.

Follow Us

Tag Archive: Technology

Data Visualeggzation

by

Months ago, Josh and Tim bought an Egg-Bot kit from Evil Mad Scientist Laboratories. Despite the obvious utility of this piece of office equipment, it fell into disuse not too long after assembly. But with the year's premiere egg-decorating holiday fast approaching, we decided to dust off the Egg-Bot and see if we couldn't put it to good use during our team's weekly lunch meeting. Things kind of spiraled out of control from there. We blame the sugar high from eating all that candy.

Continue reading

New On Influence Explorer: Fundraisers, Bills and More

by

One of the primary goals of our site Influence Explorer is to show users a wide variety of influence data on one page. Today we're expanding the scope of influence data by integrating with three important Sunlight projects: Party Time, Lobbying Registration Tracker and OpenCongress. Along with integration with these sites we're also including new data from our partner organization, Center for Responsive Politics, that gives a far more accurate picture of the campaign donations of registered lobbyists.

Continue reading

The Worst Government Website We’ve Ever Seen?

by

Yesterday the government's Federal Awardee Performance and Integrity Information System (FAPIIS) came online. This is something we've been looking forward to for a while. It's easy to find horror stories about the mismanagement of contracts; this isn't surprising when you consider the disorganized constellation of contractor oversight databases that exists, many of which aren't open to the public. Getting FAPIIS online should be a step toward fixing that problem. Yesterday government took that step.

POGO has some thoughts about it that are certainly worth your time. But we can't help chiming in as well. In short: this site is terrible. As one colleague said, "This might be the worst website I've ever seen."

This is at least debatable. Contracting databases are part of the world of procurement, procurement is heavily influenced by the Defense Department, and DoD has a proud heritage of producing websites so ugly that they make you want to claw out your eyes. So FAPIIS has company. But if this was just a question of aesthetics, we wouldn't be complaining.

Assuming you're using one of the few web browsers in which the site works at all (Chrome and Safari users are out of luck), the experience is off-putting from the start, as users are warned that their use of the site may be monitored, surveilled, or otherwise spied upon (you don't necessarily surrender your right to speak privately to your priest by using the website, though--thanks for clearing that up, guys!). Perhaps this is why their (arguably superfluous) SSL certificate is utterly broken. But let's click past the security warnings and proceed.

Here's the next screen. It contains a captcha.

Let's be clear: the use of a captcha to gate government data is outrageous. Government should be making its data more accessible and more machine-readable. Captchas are designed to interfere with automated tools that facilitate malicious acts. But downloading government data is decidedly not a malicious act. Why are we trying to limit machines' ability to use this data?

But our irritation with the captcha is softened a bit by how laughably inept its implementation is. It's made of black and white text, unrotated, unskewed, superimposed on the same black and white grid every time. Here's a stab at how you'd beat it:

  1. Subtract grid
  2. Flip every white pixel that's bordered by 2 or more black pixels to black
  3. Identify columns of all-white pixels and slice the image by them
  4. Crop the resulting slices, then recombine
  5. OCR

You could probably get this done using a stock PHP distribution in about an afternoon. But you don't need to, because even this pathetic level of security isn't properly implemented! Instead the human-readable text is sent to the client as a SHA1 hash in a hidden field. That hash is compared to the hash of what the user enters for the captcha code. So a scraper can just ignore the captcha and resend a solved hash for every request -- it'll work just fine1. They didn't even salt the hash. Whoever wrote this has absolutely no idea how to implement a secure system.

After the captcha, things start to get really weird, with radio buttons with onclick handlers being used as hyperlinks. It's unclear to me whether the programmers responsible for this interface had ever actually used the web or simply had it described to them. Either way, whoever built this should be embarrassed. Whoever managed the project should be embarrassed. Whoever signed off on delivery should be embarrassed! But we haven't even gotten to the worst part yet.

That's because while all of the above will be embarrassing to any developer who takes pride in his or her craft, the quality of a government website is ultimately less important than the data it exposes. And there is no FAPIIS data in FAPIIS. Not yet, anyway. Such data exists, mind you. But the decision was made not to include any historical data when FAPIIS went public. Presumably the contractors who did a bad job, and who were reported for doing so, are concerned that people might look at those reports and get the impression that, uh, they did a bad job. Others may be concerned that the database could cast them in a bad light and raise uncomfortable questions. That government caved in to the demands of these vendors -- vendors who are supposed to be serving government! -- can only be described as an act of craven capitulation. We've FOIAed for this data, and if we're lucky, perhaps we'll even get it. But it ought to be online right now.

As a matter of principle, it's good to see government opening closed databases, and Congress deserves credit for deciding to open this one. But what has followed that decision deserves only whatever the smallest quantity of plaudits is that's still distinguishable from zero. I hope that the site removes the captcha, offers bulk downloads, and fills up with useful, unsanitized data. But whoever built this travesty deserves to have an entry in FAPIIS of their own.

1: You do need to update the JSESSIONID cookie and get a fresh value for the org.apache.struts.taglib.html.TOKEN hidden variable, but this is easy enough to do.

Continue reading

Data.gov and the Developer

by

The response to our Save the Data campaign has been phenomenal. Although the Electronic Government Fund has been cut from $34 million to $8 million in the compromised budget, we can take solace in the fact that members of Congress are indeed listening to us.

A particular question has been popping up again and again: Is Data.gov worth saving? Sunlight's answer, not surprisingly, is a resounding yes. The impact of Data.gov is broad. For those of us who write software, Data.gov acts as a strong foundation that we can build upon. In fact, many of us have done just that.

Continue reading

FCC Finds Web Modernity

by

Image of new FCC beta site

In August of 2009 we over at Sunlight started the challenging task of rethinking FCC.gov as part of our Redesigning the Government series. We made mock-ups, gave suggestions and since that time we've been fortunate enough to have had a number of back and forth conversations with the people over at the FCC. We've talked about the problems with their current site, the challenges and possible solutions. Since that time the team put in charge of rebuilding the site at the FCC has made great leaps. They started by launching reboot, which served as a good brochure site while the team tackled the more difficult content. This last November they pushed out some decent wireframes and Tuesday they excitingly released beta.fcc.gov. Through this whole process they've been incredibly transparent, and have consistently asked for feedback from the public, which should be applauded.

Continue reading

Influence Data APIs

by

Followers of this blog are probably already aware of two of the main sites developed by our Data Commons team: TransparencyData.com and InfluenceExplorer.com. Both sites present a variety of influence related data sets, such as campaign finance, federal lobbying, earmarks and federal spending. Influence Explorer provides easy to use overview information about politicians, companies, industries and prominent individuals, while Transparency Data allows users to search and download detailed records from various influence data sets.

In this blog post I want to show how easy it can be to use the public APIs for both sites to integrate influence data into your own projects. I'll walk through a couple examples and show how to use both the RESTful API and the new Python wrapper.

Continue reading

What’s Going On In The Labs

by

Tom has been working on finding some new team members, organizing an event about open corporate identifiers, writing grant proposals, and -- fingers crossed! -- arranging a grant from Sunlight to get a very cool project's very cool code open-sourced. More on that soon, he hopes.

Eric has been chugging along on building Sunlight Health, an upcoming iOS/Android app that aims to use open data about hospitals, pharmaceuticals, and more to help people make better local health care decisions.

Luigi conducted a webinar on HTML5 for the online News University. Naturally, an interactive HTML5 version of the presentation is available online. He also prepared an article on WebSockets and EventSource which should be published soon, and has continued work on Datajam.

Upon returning from SXSW, Jeremy released django-mediasync 2.1 which added support for Django 1.3. He has also been working with David on an analytics dashboard project funded by the Knight Foundation. As always, the month has been filled with a slew of improvements to various Sunlight properties including a relaunch of Read the Bill.

James and Michael continued their work on Open States work by writing more scrapers for state legislative data. At PyCon we hosted another Open Government Hackathon and the Open State Project saw quite a few new contributions. Additionally James expanded our bulk data download offerings and Michael changed the Open States Geo lookups to use boundaryservice, a project from our friends at the Chicago Tribune. The speed at which new states are being brought online is increasing and we expect we'll start turning on 2-3 new states per month.

Andrew has been working on a new scraper for pulling and parsing public comment data from Regulations.gov, as well as implementing the front-facing portions of some new functionality for Influence Explorer.

Ethan has been working on a clustering tool for detecting duplicate comments in federal rule making. The tool will be used by Reporting to find corporate influence in public comments.

Alison has been working on two name-matching tasks, matching politicians with officers of non-profit organizations and White House visitors with lobbyists. She has also been working on streamlining our data update process for Transparency Data and Influence Explorer.

Last week Kaitlin released the housing sector on Subsidyscope and interviewed some candidates for the open position on the Subsidyscope team. She also continues to plug away on new features for the tax expenditure database on Subsidyscope and some internal tools for grants and contracts analysis.

Caitlin has completed the design work for Sunlight Health and is working with Eric to build it out. She continues to work with Kaitlin on the redesign of Subsidyscope.

For the month of March, Chris continued working on wireframes and comps for the House Staff Directory; designed promotional materials for the Advisory Committee on Transparency; created some graphics for an upcoming partnership; made a bunch of presentations for Ellen; and created assorted design elements for other projects, like an icon for the Foreign Lobbying Influence Tracker.

The internet is a scarey place and timball had to deal with that badness for the month of march. First via an ISP switch over and then with a small contained "security" issue on a dev instance when a former consultant's keyring got hax0red. Otherwise the month of March has been learning and cooking with chef (learning the finer points of ruby has been interesting). He also bought a comically large amount of styrofoam peanuts for an April Fools' prank.

Aaron added yet another data set to the Reporting Group's lobbying tracker. We've already got a database of foreign lobbying filings, but it's updated infrequently. This new feature, scheduled to launch this week, will allow users to see Foreign Agent Registration Act filings as soon as they're posted on the government's fara.gov. But instead of having to use the government's search interface, users will be able to see the filings as a stream as they come in. He also continues to work on other Reporting Group projects and on Capitol Words.

Ali has been working on promoting Transparency Camp, creating some design elements for a cool email tool that the Data Commons team has been working on, building small organizing campaign pages, starting to build out the new design for Sunlight Live and teaching CSS classes to the organization.

Continue reading

The Market For Government Data Heats Up

by

Those interested in the business potential of government data will definitely want to check out Washingtonian's story about Bloomberg Government. It's a good introduction to what really does seem to be the D.C. media landscape's newest 800 lb. gorilla (albeit a very quiet and well-behaved one so far).

Readers of this site will probably be most intrigued by these two pragraphs:

[...] BGov subscribers, of whom there are currently fewer than 2,000 individuals, get something potentially more valuable than news. BGov’s “killer app”—the feature that sets it so far apart from its competition that prospective customers will feel compelled to buy it—is a database that lets users track how much money US government agencies spend on contracts, something no other media organization in Washington offers. Users can break down the spending by agency, company, amount, or congressional district; they can track the money over time; and with a single mouse click, they can call up news associated with the companies and the type of work they do. They can also see which contractors are giving money to elected officials.

All that information is extraordinarily hard to gather, largely because the government doesn’t store it in one place. But when it’s collected, and explained by journalists, the data has the potential to give businesses an inside track on winning government deals. It shows where spending trends are heading and thus where the next business opportunity lies.

Data quality problems aside, this is true as far as it goes -- I've seen a demo of the BGov interface, and it really is quite impressive. But in fact the data isn't that spread out. Between Sunlight's APIs, bulk data from USASpending.gov, GIS data from Census and the admittedly hard-to-scrape Regulations.gov, any startup with enough time and technical talent could replicate the majority of the site's functionality (the business intelligence data provided by Bloomberg Financial is an admittedly tougher nut to crack). That's the great thing about public sector information: it's there for the taking. Anyone can use it.

I've written about this before, and generally argued that government data is a tough thing to create a business around because there's no way to prevent competitors from undercutting you. But there's money to be made in the undercutting. Mike Bloomberg thinks it's worthwhile to bet $100 million on reselling government data. He's made some pretty good business decisions in the past. A smart startup might want to take the hint.

(Of course, nobody will be building businesses on this data if it goes offline -- please don't forget to support our work to save the data)

Continue reading

Cutting The e-Gov Fund Would Be A Disaster

by

Yesterday evening I posted a message to the Sunlight Labs mailing list that discussed the looming cuts to the e-government fund -- drastic cuts that could mean that sites like data.gov, USASpending.gov, apps.gov, paymentaccuracy.gov and the Federal IT Dashboard go offline altogether.

Before I go any further, let me catch the tl;dr crowd and send them here. These cuts would be a very, very bad thing. We need your help to stop them.

But it's probably worth talking about this in more depth. A few folks have responded to the news by asking: what's the big deal? Won't the data on data.gov still be available on agency sites? Won't the FAADS PLUS spending data on USASpending.gov still be obtainable through a FOIA? Won't we still be able to grab contracting data from fpds.gov?

Well, yes and no. Although agencies have been encouraged to rely heavily on data.gov for hosting, it does seem unlikely that defunding will result in data being outright deleted. Agencies will still collect information; departments will still track their spending; and I've been assured that the nuclear batteries that power Todd Park are good for at least another ten years.

Still, while nobody's going to be setting fire to filing cabinets, it would be a terrible mistake to simply shrug these cuts off. Yes, you might still be able to FOIA for a lot of this data. Is that what we want? It often takes months to have a FOIA request fulfilled. How are you going to update a project on an ongoing basis if it relies on government data and FOIA is your only tool? There's no system for distributing FAADS PLUS data other than USASpending.gov -- even that site's bulk downloads are only a few months old (before that, Sunlight was shipping hard drives back and forth to Maryland to get the data). There's no bulk download capability at all on fpds.gov. Moving back to FOIA would be hard enough for organizations like Sunlight. For many other citizens and watchdog groups, it would mean the data wouldn't be used at all.

And let's not forget the effect that these projects have had within government -- arguably, this has been even more important than the sites themselves. Are data.gov and usaspending.gov everything that we want them to be? If you follow Sunlight's blogging at all, you know that the answer is "not yet." There's still plenty of work to be done before these sites live up to their potential. But there's no question that it's been useful to tell agencies that they need to get their data in order and make it available to the public. There's no question that code written on the public dime ought to be shared within government and with the public. There's no question that citizens should be able to see how their tax dollars are being spent.

The projects made possible by the e-gov fund have helped to formalize these responsibilities. I'll be the first to admit that the work isn't yet complete: that's why public servants, organizations like Sunlight, and concerned citizens have been pushing for better data quality in USASpending and more data availability on data.gov. But the progress we've made is real. To have the clock turned back now would be tremendously disappointing -- and, given the money-saving and economic potential of some of these projects, an act of tremendous irresponsibility by Congress.

Continue reading

To the cloud!

by

It's time that Sunlight Labs got its act together and joined the 21st century. Today we are proud to announce that we are partnering with cloud hosting provider Angelfire to move all of our sites to the cloud.

Of course, cloud computing offers concrete, well-defined benefits like agility, focus and flexibility. And cloud solutions can achieve things that traditional servers never could, like allowing users to leave their desks. Developers love the cloud, too: by now most are familiar with how simple it is to deploy to the cloud; many also appreciate how cloud datastores create more demand for developers. Plus, cloud solutions offer enhanced security both through obscurity, and technical monoculture.

This is a bold, forward looking move that we feel will help us accomplish our goals of transparency long into the future.

Visit our new homepage.

Continue reading

CFC (Combined Federal Campaign) Today 59063

Charity Navigator