For the past few weeks, we’ve been hard at work wrapping up Sunlight Labs. As the senior technologist on staff, I’m focused on two priorities: getting our projects into a state where they can easily be used and adopted by others, and moving our core websites into solutions that don’t require a development team to maintain.
Our primary focus has been on our core, currently active projects: Open States, Email Congress, Political Ad Sleuth, Political Party Time and so on. There are dozens of smaller tools that comprise these larger projects, as well as many, many other applications we’ve written over the years to solve various problems. In total, we have several hundred individual projects. About half of these projects lived in a git repository on our own internal GitLab server, and the rest were already on GitHub. Most everything is currently running on AWS or Heroku.
To make these projects ready for re-use, we have to do several things:
- Move all of our projects to GitHub.
- Add licenses to the projects. Without a license, our code isn’t open source and can’t be reused. Following Sunlight’s ethos, we’ve chosen GPL 3.0 as our default license.
- Pull out any secure credentials, API keys, passwords and other private information that shouldn’t be shared with the world.
- Document our projects so that everyone can tell what they’re for. Lots of our projects have clever, but not necessarily helpful, names that do not give any indication of what they do.
- Export all of the publicly shareable data from the live running projects, and put this somewhere that people can use it.
I have created a collection of tools to help automate many of these steps whenever possible. I was able to move the many projects to GitHub without too much trouble thanks to the GitHub and GitLab APIs. From there, I created a quick script to take an inventory of what we did and didn’t have for documentation and licensing. Not all of the repositories are public yet, as we need to finish steps two and three.
Adding licenses was a bit more tricky, as some of our projects were primarily static assets and not suitable for GPLv3 — in which case we’d prefer Creative Commons (either CC0/Public Domain, or Attribution and Share Alike). For that, I needed an interactive tool for creating licenses wherever they were missing, based on user input.
Scraping out secure credentials was a similar problem, but complicated by the fact that we needed to remove these from the entire history of the git repo. As a first step, I looked for obvious configuration files and flagged ones that contained secrets. Then there was quite a bit of manual double-checking as API keys had frequently been left in random source files. After creating example configuration files and removing credentials, I used the BFG tool to cull these from the entire history of the projects.
With that being mostly done, we’re now reaching out to our many Labs alumni to help us create documentation for our projects that need it. Since everything is now on GitHub, it’s as simple as filling a pull request! Please feel free to contribute if you have something you can add to this effort.
We’ll be holding off on exporting the data until after TransparencyCamp next week so that we won’t have to do this repeatedly as new data comes in. We hope to put everything we can on GitHub or in collections within the Internet Archive.
We also have a number of other content-driven sites, including the Sunlight Foundation main website, the TransparencyCamp site, Open Data Policies Decoded and others. Almost all of our sites and projects are custom in-house projects, built with Django, Flask or other tools. Currently, these all require a programmer to make changes to the sites and require a complex setup on AWS.
Wherever possible, we’re attempting to transfer these to platforms that staff can manage without technologists. In many cases we’ll be switching to WordPress or GitHub Pages for the hosting of our content-driven sites. There are certainly other great tools out there, but we’re deliberately choosing two of the most popular ones, so help will be easy to find.
We have a lot to accomplish in the weeks remaining before we wind down Labs. Having watched so many projects shut down recently, my goal here is to preserve as much of the legacy of Sunlight Labs as possible, for future work to build off of. At The OpenGov Foundation, I brought an ethic of open by default to all of our work, and it’s in this same spirit that I’m approaching the closing of Labs — using solid, open source principles to make sure these tools are available to the community for years to come.
No one can predict when a project or organization will end when they’re just starting out. Making sure that you have a plan for your work, with a logical beginning, middle and end, is critical. The tech world is increasingly unpredictable – make the effort now to make sure your work is preserved for the future. Whether you’re a nonprofit, a for-profit or a government agency, open whatever you can, however you can!