Wikipedia is known throughout the world as a valuable source of information on almost any subject imaginable. Since its creation in 2001, Wikipedia has amassed over 30 million articles in 287 languages—over 4.4 million in English alone. This is made possible by the countless hours and efforts of volunteers, each contributing bits and pieces of his or her expertise. Unfortunately, despite the continued advance of technology, the act of writing paragraphs of prose has yet to be automated and still requires the efforts of humans.
But that does not mean that every part of Wikipedia is curated by hand. As we speak, automated software processes called “bots” are responsible for all manner of routine maintenance. These include removing vandalism from articles, sorting pages in and out of categories and checking for instances of copyright infringement on newly created articles. One of the earliest bots, Rambot, was created in 2002 to create articles on places throughout the United States, creating almost 37,000 Wikipedia articles in the process. This was made possible with data gathered from the U.S. Census Bureau and other agencies—more details are available here.
The idea behind the open data movement is that the massive amounts of data collected and generated by our government should be available for the people to use—not just published in reports, but in computer-readable formats so that they can be used in research and analysis. There are many uses of open data—journalists rely on open data to break news and businesses use open data as a component of their business plans.
Since open data is a valuable source of public knowledge, why not use it to improve Wikipedia and keep it up to date? Granted, Wikipedia is not intended to be an indiscriminate dumping ground of data and it would be inconsistent with its editorial policies to use data to come to novel conclusions not already published somewhere else. However, there are still applications of data that would suit Wikipedia’s mission. Many articles have boxes alongside the introductory section, containing quick facts about the subject of the article. In wiki-parlance these are called “infoboxes,” and the new project Wikidata allows people to upload infobox data to one place and then use it on every language edition of Wikipedia.
Wikimedia DC, as an affiliate of the organization that runs Wikipedia, is pleased to partner with the Sunlight Foundation to host the Open Government WikiHack, a hackathon dedicated to finding ways to use structured government data to improve Wikipedia. The event will be held all-day April 5–6 at the Sunlight Foundation’s offices in Washington, DC and will feature a mix of coders and non-coders, Wikipedians and non-Wikipedians. We want you to bring your ideas on how government data can improve Wikipedia, whether it involves the Sunlight APIs or another public source of information and we will give you the opportunity to make it happen.
What: OpenGovernment WikiHack
Where: Sunlight Foundation 1818 N St. NW Suite 300
When: April 5-6, 2014
There will be a happy hour At Sunlight Foundation on Friday April 4 and the hack will begin the next day.
Feel free to share with your networks with #wikihack on Twitter and if you have any questions, please email us at: firstname.lastname@example.org.
James Hare is the president of Wikimedia DC, a nonprofit organization affiliated with Wikipedia and the Wikimedia Foundation. A Wikipedia editor since 2004, he is a free knowledge advocate based in Washington, DC. You can reach James at email@example.com
Interested in writing a guest blog for Sunlight? Email us at firstname.lastname@example.org