As stated in the note from the Sunlight Foundation′s Board Chair, as of September 2020 the Sunlight Foundation is no longer active. This site is maintained as a static archive only.

Follow Us

What’s Going On In The Labs

by

Tom has been working on finding some new team members, organizing an event about open corporate identifiers, writing grant proposals, and -- fingers crossed! -- arranging a grant from Sunlight to get a very cool project's very cool code open-sourced. More on that soon, he hopes.

Eric has been chugging along on building Sunlight Health, an upcoming iOS/Android app that aims to use open data about hospitals, pharmaceuticals, and more to help people make better local health care decisions.

Luigi conducted a webinar on HTML5 for the online News University. Naturally, an interactive HTML5 version of the presentation is available online. He also prepared an article on WebSockets and EventSource which should be published soon, and has continued work on Datajam.

Upon returning from SXSW, Jeremy released django-mediasync 2.1 which added support for Django 1.3. He has also been working with David on an analytics dashboard project funded by the Knight Foundation. As always, the month has been filled with a slew of improvements to various Sunlight properties including a relaunch of Read the Bill.

James and Michael continued their work on Open States work by writing more scrapers for state legislative data. At PyCon we hosted another Open Government Hackathon and the Open State Project saw quite a few new contributions. Additionally James expanded our bulk data download offerings and Michael changed the Open States Geo lookups to use boundaryservice, a project from our friends at the Chicago Tribune. The speed at which new states are being brought online is increasing and we expect we'll start turning on 2-3 new states per month.

Andrew has been working on a new scraper for pulling and parsing public comment data from Regulations.gov, as well as implementing the front-facing portions of some new functionality for Influence Explorer.

Ethan has been working on a clustering tool for detecting duplicate comments in federal rule making. The tool will be used by Reporting to find corporate influence in public comments.

Alison has been working on two name-matching tasks, matching politicians with officers of non-profit organizations and White House visitors with lobbyists. She has also been working on streamlining our data update process for Transparency Data and Influence Explorer.

Last week Kaitlin released the housing sector on Subsidyscope and interviewed some candidates for the open position on the Subsidyscope team. She also continues to plug away on new features for the tax expenditure database on Subsidyscope and some internal tools for grants and contracts analysis.

Caitlin has completed the design work for Sunlight Health and is working with Eric to build it out. She continues to work with Kaitlin on the redesign of Subsidyscope.

For the month of March, Chris continued working on wireframes and comps for the House Staff Directory; designed promotional materials for the Advisory Committee on Transparency; created some graphics for an upcoming partnership; made a bunch of presentations for Ellen; and created assorted design elements for other projects, like an icon for the Foreign Lobbying Influence Tracker.

The internet is a scarey place and timball had to deal with that badness for the month of march. First via an ISP switch over and then with a small contained "security" issue on a dev instance when a former consultant's keyring got hax0red. Otherwise the month of March has been learning and cooking with chef (learning the finer points of ruby has been interesting). He also bought a comically large amount of styrofoam peanuts for an April Fools' prank.

Aaron added yet another data set to the Reporting Group's lobbying tracker. We've already got a database of foreign lobbying filings, but it's updated infrequently. This new feature, scheduled to launch this week, will allow users to see Foreign Agent Registration Act filings as soon as they're posted on the government's fara.gov. But instead of having to use the government's search interface, users will be able to see the filings as a stream as they come in. He also continues to work on other Reporting Group projects and on Capitol Words.

Ali has been working on promoting Transparency Camp, creating some design elements for a cool email tool that the Data Commons team has been working on, building small organizing campaign pages, starting to build out the new design for Sunlight Live and teaching CSS classes to the organization.

Continue reading

The Market For Government Data Heats Up

by

Those interested in the business potential of government data will definitely want to check out Washingtonian's story about Bloomberg Government. It's a good introduction to what really does seem to be the D.C. media landscape's newest 800 lb. gorilla (albeit a very quiet and well-behaved one so far).

Readers of this site will probably be most intrigued by these two pragraphs:

[...] BGov subscribers, of whom there are currently fewer than 2,000 individuals, get something potentially more valuable than news. BGov’s “killer app”—the feature that sets it so far apart from its competition that prospective customers will feel compelled to buy it—is a database that lets users track how much money US government agencies spend on contracts, something no other media organization in Washington offers. Users can break down the spending by agency, company, amount, or congressional district; they can track the money over time; and with a single mouse click, they can call up news associated with the companies and the type of work they do. They can also see which contractors are giving money to elected officials.

All that information is extraordinarily hard to gather, largely because the government doesn’t store it in one place. But when it’s collected, and explained by journalists, the data has the potential to give businesses an inside track on winning government deals. It shows where spending trends are heading and thus where the next business opportunity lies.

Data quality problems aside, this is true as far as it goes -- I've seen a demo of the BGov interface, and it really is quite impressive. But in fact the data isn't that spread out. Between Sunlight's APIs, bulk data from USASpending.gov, GIS data from Census and the admittedly hard-to-scrape Regulations.gov, any startup with enough time and technical talent could replicate the majority of the site's functionality (the business intelligence data provided by Bloomberg Financial is an admittedly tougher nut to crack). That's the great thing about public sector information: it's there for the taking. Anyone can use it.

I've written about this before, and generally argued that government data is a tough thing to create a business around because there's no way to prevent competitors from undercutting you. But there's money to be made in the undercutting. Mike Bloomberg thinks it's worthwhile to bet $100 million on reselling government data. He's made some pretty good business decisions in the past. A smart startup might want to take the hint.

(Of course, nobody will be building businesses on this data if it goes offline -- please don't forget to support our work to save the data)

Continue reading

Cutting The e-Gov Fund Would Be A Disaster

by

Yesterday evening I posted a message to the Sunlight Labs mailing list that discussed the looming cuts to the e-government fund -- drastic cuts that could mean that sites like data.gov, USASpending.gov, apps.gov, paymentaccuracy.gov and the Federal IT Dashboard go offline altogether.

Before I go any further, let me catch the tl;dr crowd and send them here. These cuts would be a very, very bad thing. We need your help to stop them.

But it's probably worth talking about this in more depth. A few folks have responded to the news by asking: what's the big deal? Won't the data on data.gov still be available on agency sites? Won't the FAADS PLUS spending data on USASpending.gov still be obtainable through a FOIA? Won't we still be able to grab contracting data from fpds.gov?

Well, yes and no. Although agencies have been encouraged to rely heavily on data.gov for hosting, it does seem unlikely that defunding will result in data being outright deleted. Agencies will still collect information; departments will still track their spending; and I've been assured that the nuclear batteries that power Todd Park are good for at least another ten years.

Still, while nobody's going to be setting fire to filing cabinets, it would be a terrible mistake to simply shrug these cuts off. Yes, you might still be able to FOIA for a lot of this data. Is that what we want? It often takes months to have a FOIA request fulfilled. How are you going to update a project on an ongoing basis if it relies on government data and FOIA is your only tool? There's no system for distributing FAADS PLUS data other than USASpending.gov -- even that site's bulk downloads are only a few months old (before that, Sunlight was shipping hard drives back and forth to Maryland to get the data). There's no bulk download capability at all on fpds.gov. Moving back to FOIA would be hard enough for organizations like Sunlight. For many other citizens and watchdog groups, it would mean the data wouldn't be used at all.

And let's not forget the effect that these projects have had within government -- arguably, this has been even more important than the sites themselves. Are data.gov and usaspending.gov everything that we want them to be? If you follow Sunlight's blogging at all, you know that the answer is "not yet." There's still plenty of work to be done before these sites live up to their potential. But there's no question that it's been useful to tell agencies that they need to get their data in order and make it available to the public. There's no question that code written on the public dime ought to be shared within government and with the public. There's no question that citizens should be able to see how their tax dollars are being spent.

The projects made possible by the e-gov fund have helped to formalize these responsibilities. I'll be the first to admit that the work isn't yet complete: that's why public servants, organizations like Sunlight, and concerned citizens have been pushing for better data quality in USASpending and more data availability on data.gov. But the progress we've made is real. To have the clock turned back now would be tremendously disappointing -- and, given the money-saving and economic potential of some of these projects, an act of tremendous irresponsibility by Congress.

Continue reading

Defining “High Value Data” Is Hard. So Let’s Not Do It.

by

Yesterday I had the pleasure of sitting on a Sunshine Week panel moderated by Patrice McDermott, along with CRP's Sheila Krumholz, Pro Publica's Jennifer LaFleur and Todd Park of HHS. We touched on a lot of different topics, including one that by now is probably familiar to anyone who's followed the progress of the Open Government Directive: frustration with the vagueness of the term "high value datasets." Various organizations--Sunlight included--have criticized the administration for releasing "high value" datasets that seem to actually be of questionable usefulness.

Jennifer coined a formulation of what she considers to be a high value dataset, and it attracted some support on the panel:

Information on anything that's inspected, spent, enforced, or licensed. That's what I want, and that's what the public wants.

I don't think this is a bad formulation. But while I'm not anxious to tie myself into knots of relativism, we should keep in mind the degree to which "high value" is in the eye of the beholder. It's clear how Jennifer's criteria map to the needs of journalists like those at Pro Publica. But if you consider the needs of someone working with weather data, or someone constructing a GIS application--two uses of government data that have spawned thriving industries, and generated a lot of wealth--it's obvious that the definition isn't complete. To use a more melodramatic example, if World War III broke out tomorrow, a KML inventory of fallout shelters could quickly go from being an anachronism to a vital asset.

The point isn't that Jennifer's definition is bad, but rather that any definition is going to be incomplete. The problem isn't that agencies did a bad job of interpreting "high value" (though to be clear, some did do a bad job); rather, it's that formulating their task in this way was bound to produce unsatisfactory results.

We're going about this backward. Ideally, we'd be able to start by talking about what the available datasets are, not by trying to figure out what we hope they'll turn out to be. Government should audit its data holdings, publish the list, then ask the public to identify what we want and need. This won't be easy, but it's far from impossible. And any other approach will inevitably leave the public wondering what we're not being told.

Continue reading

What’s Going On In The Labs

by

... or what was going on in the labs. I'm horribly late in posting this -- it turns out that I'm much, much worse at this than Josh was. Just another piece of evidence that we need more talented folks around here! Remember, we still have open positions.

Luigi has been working on Datajam, a data-driven platform for reporting live events on the Web. You can follow its development on Github. Datajam will soon power our Sunlight Live events.

Jeremy has been working on various Sunlight sites including the relaunch of the Advisory Committee on Transparency. February also saw the launch of Capitol Defense, a JavaScript/SVG/HTML game developed with Andrew and Chris. Other various interesting tasks included: launching Sunlight Jobs, teaching a half-day HTML class to Sunlight employees, releasing django-cloudmailin which we use for blog post drafting via email, and preparing for TransparencyCamp 2011.

Ethan attended the Computer Assisted Reporting Conference, worked on an algorithm for fast entity matching in text, and researched new content for the Influence Explorer homepage. He's now planning for new corporate accountability datasets and new lobbying-related features.

Eric released the Real Time Congress API, and version 3.0 of the Congress app for Android. He also continued his work on an upcoming mobile app to help people make better local health care decisions.

Kaitlin had a lovely vacation and then spent several days updating the USASpending data on Subsidyscope and is now squashing bugs in the soon-to-be-expanded tax expenditure database on the site. She also interviewed many a candidate for Subsidyscope and pitched in a little bit on the Clearspending testimony.

timball has been crying a lot over ISPs and is starting to familiarize himself with Chef, a new ruby based scaling solution. Also he says he gained 5lbs from eating in NOLA. We thought you should know.

Chris has been fabulously wireframing new layouts for the House Staff Directory, designing magically delicious HTML emails and newsletters, creating spectacular presentations promoting Sunlight's awesomeness, and providing Sugar-free-Red-Bull-fueled graphics support for a variety of little projects along the way (e.g. Capitol Defense, one Influence Explorer postcard, Sunlight's meetup page, new Twitter background, etc).

James and Michael have continued the process of expanding the reach of the Open States Project and migrating content to the new site The most recent update brings the project to 20 states and the District of Columbia. New functionality in the API is in the works, including the ability to query for bills by sponsor or issue area. We are also working on adding more ways for people to access the data without having to access the API directly.

Aaron added an additional lobbying dataset to the Reporting Group's lobbying tracker. Users can now see a list of post-employment notifications for former congressional staffers and members, including when they'll be eligible to lobby their old colleagues. He's also continued work on Capitol Words.

David is working on an analytics dashboard. He uploaded some sample data to Google's Public Data Explorer. He worked on pulling out structured data from GAO reports -- making some progress but also hit some obstacles.

Caitlin has been working with Eric and the reporting team on nailing down wireframes for the healthcare app and has been translating them into pretty sexy comps. She is also working with the other Kaitlin to redesign and streamline the Subsidyscope site. ...and stuff. She also helped launch the new Openstates site since the last Labs update.

Ali has been making a lot of ads lately to remarket the Sunlight Foundation and the reporting group and for new and upcoming Sunlight Live events. She has also been working on building out a new page for the organizing section of the Foundation and Sunlight Live.

Andrew has been working on new tools for adding influence-related context to text, focusing on a plugin for enhancing Gmail. He has also been experimenting with new scraping technologies.

Alison has been updating our Wikipedia scraper to pull in corporate logos to display on the organization pages in Influence Explorer. She has also been working on adding information to Influence Explorer detailing which bills organizations hired lobbyists to work on.

...and I (Tom) have been working on a bunch of proposals, organizing meetings around the corporate ID issue, writing some testimony related to Clearspending, and trying to find staff to fill the spots left by Josh and Kevin's departures. Also, daydreaming about what we're going to do with these enormous 7-segment LEDs.

Continue reading

LexPop

by

I hope that readers will spare a second to check out LexPop. It's a contribution to a problem that a lot of you are interested in: how to allow citizens into the legislative process to a greater degree. There's no question that that old machinery that we use for transmitting public opinion to lawmakers and rulemakers suffers from some serious pathologies. So I've been very glad to see efforts like POPVOX and Expert Labs emerge.

LexPop is working in that same vein. I met Matt Baca, one of the people behind the project, at an event last month, and was struck by the ambition of his experiment. LexPop isn't working at the federal scale, but the scope of what they're doing is large: they're trying to write a state law from start to finish. What makes the effort really fascinating is that they've got a legislator interested, ready to engage with the process. It's going to be interesting to see how this unfolds.

Continue reading

Clearspending Heads To Capitol Hill

by

I'm thrilled to say that tomorrow morning Sunlight's Executive Director, Ellen Miller, will be testifying before Congress about our Clearspending project. You can read more about it here, or just check out the posts we wrote about Clearspending back when it launched.

We think that the data quality problems identified by the project are important, and we're glad to see that government is taking them seriously. Without a clear understanding of how our government spends money, it's difficult to make smart decisions about how to adjust that spending.

Having Congress pay attention to our results is a tremendous vindication for the work that Kaitlin and Kevin have done on Clearspending. I think it's also a great example of why Sunlight is such a cool place to work.Where else can your diligent SQL-wrangling turn into a chance to give sworn testimony before Congress?

And speaking of working here: as I've mentioned before, we have a couple of open positions. As you might imagine, preparing testimony has gotten in the way of reviewing resumes. But we'll be diving back into that process very soon. If you've been thinking about it, stop hesitating!

Continue reading

Visualizing the Budget vs. Visualizing Spending

by

If you haven't yet checked out The Data Viz Challenge, you should. A data visualization contest? Sponsored by Google and Eyebeam? Focusing on the federal budget?! There is basically nothing in the world more up our alley. Yes. A thousand times yes.

And yet I do feel obliged to offer one small criticism: I think the contest would be a little more exciting if entries weren't limited to using data from What We Pay For. Mind you, this is not because of anything wrong with WWPF. That site has done a very nice job of parsing budget data and, through the Challenge's website, exposing it via an API.

The problem is that the budget is only part of the story. As Kaitlin has already explained, tax expenditures--more commonly known as tax breaks--are vital to understanding our nation's finances. When the government declines to collect tax revenue from some particular individual or industry, it's not very different from simply sending them a check. The beneficiary has more money and, all else being equal, the rest of us have to pay more taxes (or take on more debt) to make up for it.

Unfortunately, this is the point at which politics enters the equation. The two major parties tend to pursue their spending priorities in different ways, and this has created political incentives for pretending that tax expenditures don't affect the budget. But this is silly--it's like pretending that if you worked two jobs and neglected to deposit your paycheck from one of them it would have no effect on your finances.

You can find people from both sides of the aisle saying unsupportable things about tax expenditures, but the truth is that every serious scholar who works on this issue regards tax expenditures as a type of spending. That's why the literature uses the word "expenditure."

I don't know if WWPF declined to wade into this space because it's politically charged or because the data's historically been so tough to access, but I wish they had decided differently. If you don't include tax expenditures, you wind up ignoring huge government subsidies to business (accelerated depreciation), housing (mortgage interest tax deduction) and every kind of nonprofit, from museums to soup kitchens to the NCAA (tax exemption). This isn't to say that we shouldn't subsidize those entities and uses. Maybe we should! But we should at least talk about it. We need to make sure these expenditures are considered if we're going to get a clear understanding of our nation's finances. Make no mistake, we're talking about a lot of money -- have a look at the chart from Kaitlin's post and you'll see what I mean.

At any rate, I'm sure that we'll see some stunning visualizations come out of this contest, and I don't hesitate to encourage anyone reading this to participate. But now that Sunlight and the Pew Charitable Trusts have worked together to expose data on tax expenditures--and we'll be adding more such data soon--I hope the visualization community will be inspired to tell that part of the tale as well. Without it, any story about our government's spending is incomplete.

Crossposted from the Sunlight Foundation blog

Continue reading

Come Work Here!

by

One of the few downsides to working with incredibly talented people is that other folks are constantly trying to hire them away. Worse, sometimes they even succeed! This has just happened, in fact. But while we're very sorry to have lost Josh and Kevin to (some admittedly amazing) new opportunities, there is a bright side: the chance to bring some new brilliant technologists into the Labs fold.

So! Please direct your attention to our jobs page. There you'll find two listings for developers in our Washington, D.C. offices. We're looking for someone to lend a hand on the Subsidyscope project, and another dev to serve as a jack/jill-of-all-trades working on a variety of technical projects. But while one of the two positions will have a specific project responsibility, prospective candidates should understand that all labs members have the opportunity and obligation to work on a variety of different things.

Who should apply? More than anything, we're looking for people who are passionate about improving our government, excited about technology's capacity for doing so, and who are interested in digging into some genuinely tough problems. You can find some of the specific technologies we use in the job descriptions, but if you're a smart, creative technologist, don't let, say, a background with Couch instead of Mongo dissuade you from applying.

It really is a pretty great place to work. The compensation's competitive, the work environment's relaxed, and the opportunities for doing exciting, important work are tremendous.

Continue reading

CFC (Combined Federal Campaign) Today 59063

Charity Navigator