Tag Archive: Technology

The data behind Capitol words

by ddrinkard

technology

Dec 21, 2011 10:06 am

Last Monday we launched an update to our Capitol Words project, which indexes and tokenizes the Congressional Record daily. With the launch behind us and the dust starting to settle, I'd like to walk through how we get from raw text to attributed, searchable quotations, and provide some examples of how you can interact with the data directly.

Before delving into how it works, though, it's important to acknowledge the myriad developers whose work on this project has made it possible. I'm only the most recent steward of the site; the bulk of the data legwork for this iteration was handled by Aaron Bycoffe and Jessy Kate Schingler, and the web interface owes its beauty to Caitlin Weber and Ali Felski. Timball provided the hardware, and the list continues from contributions to the scrapers all the way back to the original conception and implementation of the idea by Josh Ruihley and Garrett Schure. It's the combined efforts of everyone involved that brought us the site that's available today.

Now, without further ado...

House Approves Sweeping Open Data Standards

by Eric Mill

technology

Dec 19, 2011 1:24 pm

At a Friday hearing, the House of Representatives significantly raised the bar on open data by passing a resolution requiring that a wide variety of crucial House legislative information be published online, in open formats, and at permanent predictable URLs. Daniel Schuman covered this on the Sunlight Foundation blog on Friday.

The new standards create a new central website, run by the Clerk of the House, that will host all House bills, resolutions, amendments, and conference reports. These documents will be online on January 1, 2012, and will be in XML.

Beyond that, the standards require committees to post their amendments, votes, hearing notices, which bills and resolutions they're considering, and lots of other documents. The Clerk is charged with building tools for committees to post this information to the new website; in the meantime, committees must post them to their own website, in PDF. Committees are also encouraged to post this information in XML, and "should expect XML formats to become mandatory in the future".

This is hugely valuable information that, to date, has been extremely difficult to discover in a reliable way. To get House legislation, one either needs to scrape THOMAS.gov (a Sisyphean ordeal), or to rely on the good work of people who've already done it. Committee information is terribly fragmented, and in some cases there is often no way to get it at all (such as committee votes and amendments), short of hiring people to go sit in committee rooms and record what goes on (a practice that forms the basis for a number of business models here in DC). This is the beginning of bringing much needed order to chaos, and sunlight to the legislative process.

These standards demonstrate excellent leadership on the part of the House, and offers a modern vision for how a legislative body should view its responsibilities to the public. The Senate should hear the sound of a gauntlet being thrown. The Committee's action is in keeping with Speaker Boehner's and Majority Leader Cantor's April call for the House Clerk to release legislative data in machine readable formats. It is very gratifying to see this call taken so seriously.

Name Standardization: Problems and a Solution

by Alison Rowland

technology

Dec 15, 2011 3:29 pm

Name standardization, on its surface, would appear to be a primarily aesthetic problem (no pun intended). People's names can be listed "last, first" or "first last". Simple, right? Not exactly. When you're naming different things— people vs. organizations, for instance— and dealing with different ordering, capitalization styles, honorifics, suffixes, metadata or other additional info embedded in names (e.g. politicial party signifiers, company departments or locations), or just general cruft and typos, name standardization is a thorny problem. Add to that the fact that there are no universal identifiers for people or companies in many datasets, names rarely (if ever) come split into their constituent parts, and we are often expected to link data via little more than a name string, and you can see how relevant the issue is to the world of open government data.

Sunlight in ACM’s XRDS

by lmontanez

technology

Dec 15, 2011 10:35 am

Those of you who were computer science majors in college may have belonged to your school’s student chapter of the ACM (Association for Computing Machinery). If you were a dues paying member, you likely received their quarterly magazine XRDS (called Crossroads when I was a student).

The latest issue of XRDS is themed around “CS in Service of Democracy”, and I’ve contributed an article about Sunlight Labs to the issue. If you’re able to get a copy, you’ll also find articles by friends of Sunlight like Josh Tauberer of GovTrack and POPVOX, and Harlan Yu and Stephen Schultze, who built RECAP.

My article is reprinted after the jump.

FederalRegister.gov Wins Innovation Award

by lmontanez

technology

Dec 13, 2011 2:56 pm

Remember the inspiring story of FederalRegister.gov 2.0, and its humble beginnings as Apps For America finalist GovPulse.us? Well, the team behind the site has won another commendation, this time from ACUS:

According to its website, the Administrative Conference of the United States is an independent federal agency dedicated to improving the administrative process through consensus-driven applied research, providing nonpartisan expert advice and recommendations for improvement of federal agency procedures. In a writeup about FederalRegister.gov, ACUS describes some lessons learned that other agencies should take to heart:

Make your data available in bulk so others can use it.
Work with volunteers in the community and encourage them to develop new applications with your data.
If the volunteers come up with something great, work with them and use those components on the government web site.
Make the source code for the government web site open source so other agencies and other non-governmental organizations can make customized versions.

We at Sunlight Labs could not agree more. Congratulations to the team at FederalRegister.gov!

Labs Update: December 2011

by Jeremy Carbaugh

technology

Dec 12, 2011 2:59 pm

It’s the most wonderful time of year… Montgomery County property tax payment time! It’s also the holidays, which are quite nice as well. Things are wrapping up here in the Labs before we head off for winter break. We have a lot going on right now and even more big plans for next year.

In tangentially related news, Scott Weiland released a holiday album. I can sense your blank stare from here… please don’t let it distract you from reading the rest of this post.

Influence Explorer

The Data Commons team has launched a redesign of Influence Explorer that greatly improves navigation on long, complex profile pages. As you scroll, the navigation bars stay with you so that you know which data set you are currently viewing and can jump between them quickly. The year selector also follows you so that you can easily switch to different year views.

Ryan and Lee have been working closely with Ethan to dig through the data stored in Influence Explorer. Interested in reading up on lobbyist bundling for the Super Committee? How about the political ties behind Zuccotti Park? Want to find out how lobbying can reduce your tax rate?

In addition to all this lovely work, the team has been acquiring more timely campaign contribution data from the FEC, exploring the federal regulatory process and upgrading the server infrastructure.

Open States Project

James and contributors have been knocking out the states, bringing us ever closer to 50 + DC. Kentucky, Oregon, Idaho, Arkansas and Nevada have all graduated from experimental status based on several months of stability. North Dakota and South Carolina were also recently added to the API.

James has been prepping the Boston Sunlight office, new home of the Open States Project. He just hired a new developer and has secured office space. I patiently await an invitation to the opening party.

Congress for Android

Eric released a major update to the Congress App for Android that includes a visual redesign and information on what’s coming up in the next couple of days on the floor of Congress. This is a really great release and Eric did a lot of great work on the new redesign. He’s got many plans for new features that will be included over the next year, so stayed tuned!

The section in which I post Chris' update verbatim

Chris wishes that there was a more eloquent and loquacious manner in which she could describe her continued work in the mobile game app and the 180 Project. Alas, these projects defy description as the day-to-day minutia of design eventually amounts to: move this there, rinse repeat. However, Chris is pleased to report that the completion of the 180 Project is in her sights, barring any timeline disrupting events. She is coding, thus all is well.

Team Sysadmin

Tim has been involved in the long and arduous process of upgrading our office network. As it currently stands, the new fiber connection is a frustrating 15 feet from the office. Tim can see it from the ceiling tiles above our server room, but it is caught up in insurance, contractor and building management turmoil. To ease his mind, he’s been configuring our new Juniper Junos EX-series switches. It’ll be like a cute little ISP here in Sunlight’s office!

Team C-Level Executive

Tom is freshly back from the TAI Bridging Session and News Foo. Aside from that he’s been working on filling our open positions and some end-of-year planning stuff.

Tidbits

Expanding on the Sunlight Labs Olympics, we’ll be participating in the Sunlight Foundation Olympics early next week. Results will be posted shortly thereafter!
We now have 40 instances running on Amazon EC2. I’m sure we know what’s on each of those boxes, right?
Drew has been lending a hand to reporting to keep their projects running while we search for someone to fill the open position.
Dan and Capitol Words. Soon. Promise.
Eric and Andrew begun a project on gathering the data to connect bills and laws to the regulatory process. This effort should yield lots of bulk data over the next month or two for the legal and legislative communities to use.
Kaitlin has updated the video endpoint in the Real Time Congress API to support some upcoming changes to our Roku apps.
Upwardly Mobile is coming together nicely. There will be lots of great things to show early in January.
Renaissance man Luigi Montanez authored How can software engineers help make government better? in the latest issue of the ACM’s XRDS (Crossroads) magazine.
The hottest Labs holiday gift this season is Well Dressed’s El Gordo burrito.

When working with raw meat for your holiday meals, remember: though Sunlight is said to be the best of disinfectants, bleach is better.

Sunlight at the International Open Data Hackathon

by Eric Mill

technology

Dec 8, 2011 4:38 pm

This past Saturday was the second annual International Open Data Hackathon, a globally coordinated day for people to gather and hack on open public data from the world's governments. As part of this, POPVOX hosted an Open Data event here in DC at the MLK Memorial Public Library.

Several Sunlighters showed up, and we had a pretty great time. Andrew and I came expecting to work alone on our project, an ambitious attempt to bridge the data gap between legislation and the regulations they generate, that we're tentatively titling Crosslaws. Instead, after we (and everyone else) described our project to the room at the start of the day, we had 6 people come to our table and ask how they could help - 5 of whom weren't developers at all.

Despite Andrew and I not having any obvious tasks to hand out, after we explained the finer points of the work, everyone figured out their own valuable research and development to do for the entire course of the day, from scholarly articles to actual parsing code. You can find some of our group's notes on the Crosslaws wiki, as well as an overview of what's left to be done (there's a lot!).

Drew and Daniel went to the hackathon to work on their statistical analysis of USASpending data, using Benford's Law. They were hoping to find a stats wizard to help rigorously test the findings, and while they weren't able to find one, their search was still fruitful. The project did attract interest from a handful of very thoughtful people, and they had a long discussion that helped refine the goals of the project. Drew was very thankful for that, as he came away from the hackathon better focused on a concrete goal. At the end of the day, they had the parser and downloader written, but weren't able to download enough data to test it thoroughly. You can find Drew's team's code on Github.

In general, it was a fantastic crop of people who showed up on a Saturday morning at the MLK Library, from awesome self-directed policy people, to talented folks from the DC and federal governments. My project got real momentum from it, and we'll be capitalizing on that momentum with more work over the next couple months. Given all that, the hackathon felt like a real success to me, and I'm looking forward to next year's.

WhipCast – Promotion Isn’t Transparency

by Eric Mill

technology

Nov 18, 2011 1:41 pm

On Tuesday, the House Majority Whip's office released a "WhipCast" app through the iOS, Android, and Blackberry app stores.

It contains updates from the House floor, and various documents and publications from the Whip's office. It's being billed by the House Republican leadership team as "a step towards fulfilling the House Republican's commitment to transparency and accessibility". Unfortunately, there's nothing transparent or accessible about the app. Most of the information available through the app is extremely partisan, and serves to push House leadership's talking points.

Labs Update: November 2011

by Jeremy Carbaugh

technology

Nov 14, 2011 10:15 am

With a regularity typically seen only in cron jobs, the monthly Labs Update is coming at cha! As you read this post and plan your Thanksgiving dinner, listen to Thanksgiving Theme by holiday season favorite, Vince Guaraldi.

Halloween Open House

If you didn't make it out to our Halloween Open House, then you missed out, my friend! Between the dry ice, Chocolate City Beer, taquitos, halloween costumes, photo booth and demonstrations of a colloidal suspension, we actually got a few chances to talk about the work we do here!

The best part, though, was Eric's Kinect and Processing powered rendition of Vigo the Carpathian. You can find the source code on GitHub.

Goodbye and Hello

We recently bid fond farewell to our reporting team embed, Aaron, who has taken a job with the Huffington Post. His position is still open, if you are interested in applying.

Joining us in the Labs is Lee Drutman. He will be working as our data visualization fellow, exploring interesting uses of the data with which we work.

Six Degrees of Corporations

Ever had to identify a unique corporation in a large government data set? How about corporate subsidiaries? You know that it can be almost impossible, but our new project, Six Degrees, shows you just how actually impossible it can be. The project looks at DUNS numbers used in federal spending data and finds that inaccuracies in reporting, among other issues, make the identifiers virtually useless.

Kaitlin and Drew created Six Degrees from their work on Subsidyscope. My description isn't doing the project any justice so go check it out for yourself!

Capitol Words

Soooooo close! A few weeks ago we gave an internal preview of Capitol Words to the rest of the Sunlight staff. The feedback was so good that we felt it had to be incorporated before the official launch. Caitlin and Dan have been working hard on the new features and should be finished within the next week or two with a launch soon after that.

Upwardly Mobile

Formerly named Moving Up, Upwardly Mobile is the second in our series of Knight Foundation funded mobile applications. Having a tough time making ends meet? Upwardly Mobile helps you find other ares of the country where someone in your occupation can make a better living. We're pulling in a number of government data sets that will be fully explained as we near launch. Caitlin is just wrapping up Sunlight's first responsive design for the project and it is looking amazing. I'll be templateing and making it all work over the next few weeks. Look for a blog post later this month about the progress!

Influence Explorer

With election season upon us, the Influence Explorer team has been working on tools for tracking candidates and contributions. We've often pointed out the problems with less-than-timely release of data, especially campaign finance data. The issue is painfully evident during elections when clean data can be many months behind. You go into the voting booth without knowing what has happened in the previous weeks or months of the campaigns. We're limited by the FEC release schedule, but there is a plan in place to get some information to you as soon as we can. Ethan should have more details for you soon.

In addition to these new features, Alison has added a new data set, Lobbyist Bundled Contributions, and has refreshed EPA and federal campaign finance data.

SunlightFoundation.com

We're embarking on a complete refresh of the Sunlight Foundation brand! Along with a new logo and updated visual design, we are rethinking the organization of our content. The new SunlightFoundation.com will be organized around the work we do and not our internal team structures. We'll be pulling in some of our department specific sites like labs and reporting back under the Sunlight Foundation umbrella.

Ali recently blogged about the design process and gave a sneak preview of our new logo. Meanwhile, I've been whining about having to setup WordPress and have been trying to find ways to weasel out of it. We'll be blogging more about the progress and hope to launch late 2011 or early 2012.

Sunlight Live and Datajam

October saw the first Sunlight Live produced on the new Datajam platform. The card system has been completely rewritten in both the front-end and the admin. While we've so far reused the existing 3rd party widgets, Dan and Luigi are continuing to implement our own versions that will make Sunlight Live even better than it currently is.

Team Executive Leadership

Tom has been spending a lot of time on grant reporting, proposal documents and making sure things are in motion for staffing back up in DC and Boston. He also briefly embarrassed himself by attempting to close some tickets on Capitol Words (this was a mostly terrible idea). And he spent some time out in SF for the Code for America Summit, which was both fun and inspiring.

Tidbits

Chris has been busy working on design for the 180° Project.
Daniel's been plugging away at an analytics dashboard that will be used to provide an at-a-glance overview of various types of traffic and impact analytics.
Minor updates were released for our native mobile apps, Congress for Android and Real Time Congress for iOS.
Tim now has graphs of everything our servers are doing. He has also been shutting down old and unused servers, much to our checkbook's delight.
Kaitlin attended Open Government Data Camp in Warsaw, Poland.

Official Sunlight Labs Thanksgiving side dish: Oyster stuffing.

On content management systems and an unreasonable need to DIY

by Jeremy Carbaugh

technology

Nov 2, 2011 10:34 am

TL;DR I want to write a new CMS and force people to use Markdown. Should I?

As Ali mentioned in a previous post, we are embarking on an effort to redesign and reorganize SunlightFoundation.com. Part of the reorg involves consolidating our "brands" back under the Sunlight Foundation umbrella so that our content reflects what we do and not our organizational structure. The plan is to merge our existing labs and reporting blogs into technology and reporting channels on our main blog.

With this, though, comes some major technical challenges. One such challenge is having to merge content from a myriad of existing CMSs. We have one blog running on our own django-blogdor application, another running a forked version of django-blogdor (don't ask), and two more blogs running WordPress on the backend but using django-wordpress for the public facing Django-based site. Once the blogs are dumped into an Atom-based format (with custom elements for additional metadata) they can be reimported into the new content management system. What that system will be is a decision that causes me constant angst.

We could just use WordPress like we are now where a private instance is used for authoring and Django pulls from the WP database for presentation on the public site. There are some requirements that would require us to develop WordPress plugins to keep track of additional metadata and make an internal version of django-wordpress that is aware of the plugin tables. To make things even more complicated, there isn't a good way to map Django models to WP tag/taxonomy tables since Django models cannot handle compound keys. This results in quite a bit of database overhead when tags on a large number of posts are accessed. Another option would be to take this a step further and create a generic RESTful API around the WP database that our public site can use. This approach, which would allow us to swap WordPress out for other blog engines at will, is being used by Talking Points Memo.

But if we have to do all of this development just to use WordPress as a backend content store, why use it at all? Why shouldn't we just reinvent the wheel and write our own CMS? The actual management of content isn't hard; django-blogdor can do this just fine and would require only minimal improvement. The hard part, which WordPress does well, is the authoring interface.

Ah, the authoring interface. That brings us to the biggest source of contention. Whenever I discuss this project with the rest of the organization, I'm always told that a WYSIWYG editor is crucial. While I understand the need to insert media and add headings, I'm less sympathetic to other forms of visual styling such as the changing of font colors. I find myself increasingly convinced that content creators should be using Markdown to author their posts in a tool that has WYSIWYG-like helpers for inserting chunks of markup or HTML. Not only does this produce cleaner HTML that is more fault tolerant to future changes, but it creates a clear separation between the creation of content and it's visual display. Plus Markdown is just easy; you can pick it up in a few minutes. More complicated visual needs can and should be handled by our design team or built into the CMS. I want to free our authors from having to worry about presentation and focus on what they do best, writing good content. How considerate of me!

So I come to you, dear reader, for advice on what we should do.

Am I being unreasonable here with my urge to write yet-another-CMS? Feel free to tell me to shut up, suck it up, and make it work with WordPress.
Is it snobbish and elitist to expect content creators to use either Markdown or learn HTML rather than use a WYSIWYG editor?
Any better suggestions?

« Previous
1
…
30
31
32
33
34
…
74
Next »