Ladies who Code (like the name suggests) is a gathering of ladies who code. Ladies Who Code, which already has chapters in Manchester, New York and London, recently opened their DC chapter and the first Meetup will be hosted at Sunlight’s DC offices.
What: Ladies Who Code DC Meetup
Where: Sunlight Foundation, 1818 N St. NW Suite 300 Washington, DC, 20036
When: September 19, 6:30pm
Sign up: http://www.meetup.com/Ladies-Who-Code-Washington-DC/events/139227392/
Continue readingSunlight and Open Source
David Eaves has a thoughtful post over at TechPresident talking about open source and the transparency community's commitment to it -- a commitment that David sees as half-hearted. Sunlight's mentioned in the post, and the MySociety initiative that prompted the post is something that our team has been thinking about a lot. I think there's something to David's criticisms. But he's missing a few important things.
But let's get the baseline stuff out of the way first. Sunlight loves open source. Our whole stack is built on it, from the Varnish cache your browser connects to, to the Django/Rails/Flask/Sinatra/whatever app behind it, to the Postgres/Mongo/Redis/Solr/elasticsearch datastores that power it, to the OpenOffice suite that edits the grant application that paid for it all. All of our code is up on GitHub, and we welcome and celebrate contributions from the community.
But, Kindle contest aside, the above examples are mostly about us benefiting from open source. What have we done for the movement lately? This is the crux of David's critique:
So far, it appears that the spirit of re-use among the big players, like MySociety and the Sunlight Foundation, only goes so deep. Indeed often it seems they are limited to believing others should re-use their code. There are few examples where the bigger players dedicate resources to support other people's components. Again, it is fine if this is all about creating competing platforms and competing to get players in smaller jurisdictions who cannot finance creating whole websites on their own to adopt it. But if this is about reducing duplication then I'll expect to see some of the big players throw resources behind components they see built elsewhere. So far it isn't clear to me that we are truly moving to a world of "small pieces loosely joined" instead of a world of "our pieces, loosely joined."
I think David's missing a few important examples. For one thing, Sunlight's been adopting and investing in other organizations' code for a while now. PPF's OpenCongress has long been a Sunlight grantee, of course, and their code is entirely open source, including specific components like Formageddon that we commissioned. It's been more than a year since we began providing support for the Media Standards Trust to open-source and continue to develop SuperFastMatch; that's a partnership we think has tremendous potential to benefit both us and others, and you can expect to see some additional collaborations announced soon. Politwoops is a recent example of Sunlight adopting, extending and then launching a project started by another NGO -- the Open State Foundation, in this case (we're in the process of working with them to open-source the code).
But this is at the level of fairly specific partnerships with other transparency NGOs. The fact is that the more specific a project's use case, the harder it is to generalize its adoption. The more fundamental and abstract a tool is, the easier it is to adopt it and contribute back to it. It's no coincidence that we have people on our team who have patches in the Linux kernel but none who have patches in FixMyStreet. We see plenty of people use our Django apps and middlewares, but (so far) no successful redeployments of Influence Explorer. We've contributed a number of patches to the Boundary Service project that David mentions, but none to Ushahidi. Heck, back in my fixed-width font days, even I managed to get a minor patch into PySolr.
It simply gets harder to collaborate when you move to a less-abstract level of software. Requirements become more specific, and there cease to be good, general approaches to tackling problems. I saw this first-hand when I threw together the Elena's Inbox project. That effort generated a lot of excitement from other folks who had access to email archives, and we were glad to speak to all of them. I was eager to offer advice, answer questions and generally do some hand-holding, but I found myself wishing I had better news for the people who got in touch with me. Because unfortunately the reusable part of the site isn't all that valuable -- it's just some ugly templates and a basic Django app that provides endpoints for search and starring of emails (though we do have some much less ugly templates waiting for the next time we do a similar project). The real work and value-creation comes in the weekend following the government's Friday afternoon email document dump, when you need a programmer to lose sleep writing endless regular expressions that parse the idiosyncratic formatting of what's likely to be a badly-OCRed pile of text, then apply algorithmic approaches -- usually specific to the particular document set -- to stitch individual emails back together into threads. Come Monday morning, you'll be facing a huge, all-hands-on-deck manual review process as your staff tries to collapse duplicate entities down to single individuals (a process that can be aided by some string-similarity techniques, but which inevitably involves a lot of judgment calls and contextual knowledge).
Setting up an EI-style-site is unfortunately never going to be a clean, easily-repeatable process; not until government starts releasing MDBs or exposing IMAP endpoints (something we have yet to see, as far as I know). And this is fairly typical of work in our space: a lot of it needs to be purpose-built because of the quirks of government and the datasets it produces.
The good news is that although our movement is still quite young, we've already learned some lessons. I think MySociety's components strategy reflects this: they're moving down a layer of abstraction -- cautiously and after much consideration -- and tackling a slightly-more-specific task than a typical NOSQL or GIS project; a task that's still abstract enough to be reusable, but which is targeted enough to be particularly relevant to transparency organizations. It's something that we think is worth pursuing, and that we're anxious to help to make into a success. It probably won't make sense to spend time replacing Sunlight's too-specific-to-be-reusable but perfectly-useful-for-us entity store with PopIt in the near term. But those organizations that come to this space after us should be able to benefit from the lessons learned by MySociety, Sunlight and others. It's the same reason why Open States has been refactored twice: it takes time and experience to figure out what parts of a problem can be abstracted and made reusable.
There's no question that we can do better. We're looking at which projects have the most potential for reuse, and -- where appropriate -- we're planning to clean up their docs, add easy Heroku deployment support, roll some AMIs, and support some up-and-coming general source data formats. We'll also be taking a hard look at how our APIs are organized: we can make our data more easily reusable, too.
But specificity is often the enemy of reusability, and we think some of the most interesting opportunities tend to involve very specific problems. It's a real tension, but one that we're committed to continuing to work to address.
UPDATE: MySociety's Tom Steinburg has also posted a response to David, in which he explains the rationale behind MySociety's components strategy in considerably more detail
Continue readingBlog Posts Via Email With CloudMailin.com
I recently learned (with horror) that a co-worker wrote her blog posts in Gmail, copied the rich text to WordPress, then copy and pasted the generated HTML into our Markdown-enabled blog backend. To be fair, our nerdy authoring tool is a bit much for non-technical users and doesn't really fit into most "normal" workflows. Additionally, she emails her posts to an internal list so Gmail was a natural authoring tool.
There had to be some common ground we could find; blog posts still written in Markdown while allowing her to use Gmail to write her posts. Our solution was to enable post-by-email on the blog. By adding a special email address to the recipients, the message is parsed into Markdown, a draft post is created, and she receives an email reply a few seconds later with a link to edit the new post. From there she can review and publish it in a few clicks resulting in a much improved workflow.
We wanted the draft posts created immediately and I didn't care to be polling a mail server every few seconds. Fortunately, we found a new service that made this project incredibly easy to implement.
CloudMailin.com
CloudMailin.com is a fantastic service that does the opposite of most other mail services. Rather than providing an API based method of sending email like Postmark, another fantastic service, CloudMailin.com receives email at a provided address and POSTs the data to a URL of your choosing. In addition to the simple parsing of SMTP headers and MIME parts, the service can handle email attachments. Pay them a few bucks extra and they'll upload the attachments to one of your S3 buckets!
A competing service we evaluated started at a pricey $30 a month; a bit ridiculous if we are receiving 5 emails a week to start. CloudMailin.com's recently announced pricing is right on the mark with a 200 message free plan and a 3000 message micro plan for $9 per month.
So how did we make it work? Let's look at some code...
django-cloudmailin
django-cloudmailin is a Django app we created to make working with CloudMailin.com as simple as possible. First we need a method that will receive the posted email message parameters and create a blog post.
In create_post we extract the parameters from the message to get the author, title, and content of the post. A post object is created and an email is sent back to the original sender of the email with a link to the Django admin for the new post. The author needs to check to make sure the post looks correct and hit publish. This is a greatly simplified example because we do some additional parsing of the content to transform the plain text into valid Markdown, but it should give you an idea of how it works.
Next we register that method with the mail handler.
MailHandler is a class-based view provided by django-cloundmailin that handles the registration and processing of mail messages. In this example we register our CloudMailin.com email address and secret key with the method that is to be invoked upon receipt of a new message. Multiple email addresses can be registered with the handler to allow for many different actions-by-mail in the same application. Finally the MailHandler instance is associated with an URL pattern in urls.py.
All incoming messages are signed with your secret key to prevent any old person from spamming your mail endpoint. The MailHandler instance takes care of verifying the signature so you can concentrate on writing your application.
You can find the source for django-cloudmailin on GitHub.
Continue readingdjango-mediasync 2.1 for Django 1.3
Earlier today we released django-mediasync 2.1 in anticipation of Django's upcoming 1.3 release. The Django 1.3 RC was released last night so the final version should be coming any day now. This release changes the way static files are handled and breaks previous versions of mediasync. The old MEDIA_URL and MEDIA_ROOT settings are now meant to handle media uploaded by users while two new settings, STATIC_URL and STATIC_ROOT, handle static site content.
Mediasync will first try to use STATIC_ settings and fall back to MEDIA_ if not found. This ensures that mediasync will work regardless of the version of Django being used.
Find the package on PyPI and the source on GitHub. And as always, if you use mediasync please indicate it on Django Packages.
Continue readingdjango-mediasync 2.0: Havana Nights
It's been almost a year since the last release of mediasync, but the new features we've worked on are worth the wait! If you use mediasync, please indicate that you do so on our Django Packages profile.
Source on GitHub: https://github.com/sunlightlabs/django-mediasync
Package on PyPI: http://pypi.python.org/pypi/django-mediasync/
Install with pip or easy_install:
pip install django-mediasync
easy_install django-mediasync
What is this media syncing you speak of?
For those of you new to the project, mediasync is a Django app that manages static media in both development and production. Imagine a project where you have to make updates to existing media, but all references are hardcoded to some absolute path in production. Do you update the production media and risk breaking the site or do you temporarily point to local media and hope you don't forget to revert the change?
With mediasync you don't need to worry about any of that. Paths to media are automatically generated: local in debug, remote in production, and manually overridden when needed. Modify your media in your local development environment then use mediasync to push the change to the remote production server. Reduce stress and add years to your life!
Continue readingOn DjangoCon 2010 and conferences in general
Sunlight Labs is a huge fan of Django. We use it in a majority of the projects we produce here and have released the source of numerous applications. So a few weeks ago a bunch of us eagerly packed our bags and flew out to Portland, OR for DjangoCon 2010.
Continue readingRecovery.gov Augmented Reality Mashup
As of today Android and iPhone users can see recovery.gov contract data on their phones via the Layar augmented reality application. Layar is an application that overlays your view of the real world with waypoints representing your favorite coffee place, the movie theatre you're trying to find, or in this case, where some of that $787 billion from the American Recovery and Reinvestment Act is going.
Continue readingDjango hits the big 1.1
Django 1.1 is out! We're big fans of Django.
Continue readingX-UA-Compatible Django Middleware
Microsoft's Internet Explorer 8 may choose to display your site using an older, less compliant rendering engine. Take control and tell IE which engine to use with our Django middleware and decorator.
Continue readingSimplifying web development with django-mediasync
One of the more frustrating aspects of programming for the web is managing the development and deployment of static assets. Everything is fine until your site goes live... then you have to deal with images, CSS, and JavaScript staying in sync and being called correctly from either the dev or production instance. We've developed django-mediasync to rid ourselves of the headaches.
Continue reading