Tag Archive: github

Sunlight APIs: One Billion Served!

by Tom Lee

technology

Oct 7, 2013 12:46 pm

Last week was dominated by news of the first government shutdown in seventeen years. But at Sunlight it also marked a different, more cheerful milestone. Last week, Sunlight's APIs served their billionth request!

A Modern Approach to Open Data

by Eric Mill

Aug 20, 2013 3:33 pm

Last year, a group of us who work daily with open government data -- Josh Tauberer of GovTrack.us, Derek Willis at The New York Times, and myself -- decided to stop each building the same basic tools over and over, and start building a foundation we could share. We set up a small home at github.com/unitedstates, and kicked it off with a couple of projects to gather data on the people and work of Congress. Using a mix of automation and curation, they gather basic information from all over the government -- THOMAS.gov, the House and Senate, the Congressional Bioguide, GPO's FDSys, and others -- that everyone needs to report, analyze, or build nearly anything to do with Congress. Once we centralized this work and started maintaining it publicly, we began getting contributions nearly immediately. People educated us on identifiers, fixed typos, and gathered new data. Chris Wilson built an impressive interactive visualization of the Senate's budget amendments by extending our collector to find and link the text of amendments. This is an unusual, and occasionally chaotic, model for an open data project. github.com/unitedstates is a neutral space; GitHub's permissions system allows many of us to share the keys, so no one person or institution controls it. What this means is that while we all benefit from each other's work, no one is dependent or "downstream" from anyone else. It's a shared commons in the public domain. There are a few principles that have helped make the unitedstates project something that's worth our time, which we've listed below.

OpenGov Voices: PySEC, bringing corporate financial data to the masses

by Luke Rosiak Aug 2, 2013 2:58 pm

Disclaimer: The opinions expressed by the guest blogger and those providing comments are theirs alone and Luke Rosiak do not reflect the opinions of the Sunlight Foundation or any employee thereof. Sunlight Foundation is not responsible for the accuracy of any of the information within the guest blog.

Luke Rosiak is a former Sunlight Foundation reporter and database analyst who now writes for the Washington Examiner. This post addresses the tooling around Extensible Business Reporting Language and provides recommendations on what needs to be done. You can reach Luke on Twitter at @lukerosiak.

In the early 1990s, long before most federal agencies had embraced the digital era, the Securities and Exchange Commission (SEC) undertook a truly “big data” initiative that showcased some of the best that open data had to offer: Its quarterly reports were uploaded in real time, in text, rather than PDF, format, to a public FTP server called EDGAR. (File Transfer Protocol or FTP is a standard network protocol used to copy a file from one host to another over a network.)

Like with the Federal Election Commission, the companies submitted their own reports, but they immediately entered the public record, and it was the government who required the submissions, dictated the forms and made them available.

EDGAR, which was implemented in part by Sunlight Foundation supporter Carl Malamud, revolutionized a massive industry of financial watchers who used the reports to decide what companies to invest in--and which to dump. Firms like Bloomberg and Reuters processed the text files into structured data, and analysts pored over them.

washington times image And before long, the SEC was pushing the ball even further, with talk of XBRL or Extensible Business Reporting Language. After all, financial information was almost entirely numbers-based, lending itself to computer analysis, and was fairly structured, with accountants all using a core of the same carefully-defined terms--though Wall Street accounting is too complex to fit in simple columns and rows, necessitating nested structures and the ability to dynamically define new terms.

The X in XBRL was a double-edged sword there. XBRL’s power, advocates said, was derived from its flexibility. It was a unified language that could express financial ideas whether the company was trading goats in Ethiopia or derivatives in Manhattan. To provide that kind of flexibility, it allowed accountants to define their own terms in financial documents, extending from a base of agreed-upon terms, in America called the US-GAAP.

But the US-GAAP itself had thousands of terms, and accountants who were accustomed to filing paper reports never bothered to learn its structure. Lazy filers created their own custom terms when buried somewhere in the GAAP, there was already a universal term that meant the same thing. That defeated the purpose of structured reporting, because it made comparing across companies impossible.

The future of civic software reuse?

by Rebecca Williams

policy

Jun 10, 2013 12:00 pm

On Thursday June 6th at the Personal Democracy Forum (an annual conference exploring technology’s influence on politics and government), New York City’s Comptroller John Liu announced that the code behind Checkbook NYC 2.0, the city's transparency spending web portal, had been open-sourced and made available for forking on Checkbook NYC 2.0's github page. This is significant because (1) Checkbook 2.0 is enormous: it makes over $70 billion dollars in New York City spending available online in a timely, structured, and human-readable form, demonstrating that best practices in data disclosure can be followed even at scale; (2) it marks a shift to proactive civic application-sharing, by the way of the municipality’s desire to share the resources they’ve developed with other local (and even state) governments and NYC’s partnership with common municipal software vendors in this endeavor; and (3) it raises questions about what’s next for government transparency tools, civic software partnerships, and reuse.

OpenGov Voices: Data.gov relaunches on open source platform CKAN

by Irina Bolychevsky May 24, 2013 11:56 am

Disclaimer: The opinions expressed by the guest blogger and those providing comments are theirs alone and do not reflect the opinions of the Sunlight Foundation or any employee thereof. Sunlight Foundation is not responsible for the accuracy of any of the information within the guest blog.

Irina Bolychevsky is the Product Owner of CKAN -- data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. (@CKANproject) is the leading open source data management platform, at the Open Knowledge Foundation (@OKFN). She led and managed the new release of data.gov from the CKAN team and previously managed the relaunch of data.gov.uk. Follow her on twitter: @shevski.

A huge milestone was reached yesterday with the relaunch of the U.S. government data portal on a single, open source platform. A joint collaboration between a small UK team at the Open Knowledge Foundation and data.gov, this was an ambitious project to reduce the numerous previous catalogs and repositories into one central portal for serious re-use of government open data.

Catalog.data.gov brings together both geospatial as well as “raw” (tabular or text) data under a single roof in a consistent standardised beautiful interface that can be searched, faceted by fomat, publisher, community or keyword as well as filtered by location.

Users can quickly and easily find relevant or related data (no longer a metadata XML file!), download it directly from the search results page or preview spatial map layers or CSV files in the browser.

Of course, there is still work to do, especially about improving the data quality, but nonetheless a vast amount of effort went into metadata cleanup, hiding records with no working links and adding a flexible distributed approval workflow to allow review of harvested datasets pre-publication.

OpenGov Voices: Hack Jersey hackathon — public data solving problems

by Tom Meagher Mar 29, 2013 10:12 am

Tom Meagher is the co-founder of Hack Jersey and the data editor at Digital First Media's Project Thunderdome in New York City. His team builds interactive news applications, supports computer-assisted reporting projects in local newsrooms and offers training. He served as the data editor for The Star-Ledger in Newark, and he lives with his family in suburban New Jersey. Reach him at @ultracasual or @hackjersey.

Wrapped by the hanging air quotes of New York City and Philadelphia, New Jersey's history of invention and investigative reporting tends to get overlooked. Even within the state, the two disciplines haven't acknowledged each other much. In recent years, there've been hackathons at local colleges or tech groups, but the Garden State's journalists never really mingled with programmers or dipped their toes into building news applications. Until now.

This winter, Hack Jersey held the state's first news hackathon and attracted dozens of journalists and developers to learn from and compete with one another. Sponsored by the NJ News Commons, Knight-Mozilla's OpenNews and many other organizations, the hackathon revolved around a simple (and maybe obvious) idea. By bringing coders and journalists together to use public data and solve problems, we could sow the seeds for an amazing new community here.

On Legislative Collaboration and Version Control

by John Wonderlich Sep 27, 2012 4:22 pm

We often are confronted with the idea of legislation being written and tracked online through new tools, whether it’s Clay... View Article

Data for Better Bill Searching

by Eric Mill

technology

Apr 10, 2012 11:34 am

I've put up a dataset on Github that maps popular search terms to bills in Congress. It's a simple, 5-column CSV designed to help people create better search engines that take in user input to search for bills. The idea is that this will be useful to, and get contributions from, the community of people out there working with legislation and building tools around them.

It's humble - I started it out with a mere 7 rows, assigning the keywords "Obamacare", "SOPA", "PIPA", and "PPACA" to the appropriate bills. There are certainly more good candidates than that, so please contribute via pull request, or if you don't know how to do that, open an issue and talk about it with words.

Tools for Transparency: 12 Resources You Might Have Missed

by Scott Stadum Oct 14, 2010 12:27 pm

Since I started the Tools for Transparency post back in July, I’ve written about quite a few social media resources... View Article

Celebrating 100 GitHub Projects

by James Turk

technology

Oct 13, 2010 1:13 pm

Last week we hit a milestone that we're pretty proud of. The Sunlight Labs github account now features more than 100 projects. We've been putting projects on the account for just under 2 years, which makes for a rate of about one new project a week.

We thought it'd be interesting to look at a breakdown of our projects on GitHub, to look at the work that has been done in the last two years.

Also, because we realize that there's no way we'd be where we are without help from this wide range of contributors who have submitted code, forked our projects, and submitted tickets we've decided to offer a small prize in exchange for your help.