The U.S. is trying to monitor the kinds of transactions that contributed to the 2008 financial crash, and subsequent recession, but the effort has shot itself in the foot, all for lack of a data standard. The Commodity Futures Trading Commission has been tasked with oversight of credit default swaps, but their attempts to define a standard for reporting in this previously unmonitored market have not worked out as planned.
Last year, we wrote about how to get access to our political influence data via the Influence Explorer API. That post is a great introduction, but here's an update on a small, but significant, improvement we've made to make accessing our data easier.
NameCleaver recently received a small version bump (to 0.3!) with a fairly big addition: safe mode. This was the most-requested feature amongst the other in-house users of NameCleaver, so I figured it might also merit a quick announcement.
PyGotham 2012 was a fairly typical tech conference. It was small and regional. This being New York, there were plenty of hipsters. There were a few more women than normal. The venue was quite unusual, but more about that later.
Name standardization, on its surface, would appear to be a primarily aesthetic problem (no pun intended). People's names can be listed "last, first" or "first last". Simple, right? Not exactly. When you're naming different things— people vs. organizations, for instance— and dealing with different ordering, capitalization styles, honorifics, suffixes, metadata or other additional info embedded in names (e.g. politicial party signifiers, company departments or locations), or just general cruft and typos, name standardization is a thorny problem. Add to that the fact that there are no universal identifiers for people or companies in many datasets, names rarely (if ever) come split into their constituent parts, and we are often expected to link data via little more than a name string, and you can see how relevant the issue is to the world of open government data.