Earlier this month, the Department of Defense launched a new open government data platform, data.mil. Above, you can see a visualization of some of the data on the site that tells one aspect of the history of the Vietnam war : the number of aerial bombardments conducted by military forces from 1965-1975. To learn more about the thinking behind the website and how this experiment in open data differs from previous efforts, Sunlight interviewed data.mil’s co-creators, Mary Lazzeri, at U.S. Digital Services, and Maj. Aaron Capizzi, program manager at United States Air Force. Our discussion follows, including a postscript regarding what open data may tell us about newly discovered blockbuster bomb in Germany.
How long has data.mil been online? Where did the idea come from and how has it evolved?
Mary: The site launched on December 15th. Major Aaron Capizzi, USAF had the idea to use open data principles to solve Department of Defense (DoD) problems after attending a panel discussion at the Harvard Kennedy School sponsored by former Deputy CTO, Nick Sinai. In addition, I had been looking to seed an open data effort at DoD. Aaron’s idea, coupled with the opportunity to present the Theater History of Operations (THOR) bombing data in a new and interesting way, provided a perfect opportunity to put energy behind the effort.
We’re looking to use this pilot to jumpstart a larger open data effort at DoD. The beta site is a working proof-of-concept. The next step is to show the larger DoD community that open data merits investment.
Aaron: Our approach is unique in two ways. First, Data.mil will test various ways of sharing defense-related information, gauging public interest and potential value, while protecting security and privacy. We will quickly iterate and improve the data offerings on data.mil, using public feedback and internal department discussions to best unlock the value of defense data. Our goal is to provide all data with enough context that users, both the public and defense employees, can understand the potential value and get started using data quickly.
Second, Data.mil will prioritize opening data using a demand-driven model, focusing on quality rather than standard quantity metrics. The Department of Defense regularly reports on the significant challenges we face in defending the nation, which range from attracting talented recruits to developing game-changing technology within constrained budgets. Most of these aspects of defense business generate large amounts of unclassified data which, if released, can encourage collaboration and innovation with public and private sector partners.
What tech is data.mil built on? How is it different than other sites?
Mary: The site is built using an open data storytelling platform, LiveStories. Rather than simply posting a list of datasets, the goal of Data.mil is to tell stories with data. The site provides narratives to complement the data so users can more quickly understand and begin using it. LiveStories was selected for its visualization and data analysis features allowing us to present an engaging site for its users. In addition, it’s easy to use. Non-technical staff can use the platform to share their data and tell their stories.
We want to compel collaboration from military components, industry partners and the public. The partnership with data.world enables that collaboration providing the social media tools to support exploration and a community discussion of the data.
How much data is on data.mil now? How much is new? How much should we expect to be there by the end of 2017?
Aaron: The site’s first offering, Theater History of Operations (THOR), is a painstakingly cultivated database of historic aerial bombings from World War I through Vietnam. THOR has already proven useful in finding unexploded ordinance in Southeast Asia and improving Air Force combat tactics. This is the first time that the THOR WWI, WWII, and Vietnam datasets have been released as flat tabular data files that can be easily analyzed and visualized with the accompanying data dictionary. Additionally, the site published the Korean War data for the first time ever on December 18th. We hope to feature data from the Gulf War in the near future.
The next featured dataset will be military casualty data, set to be released in February 2017. The working target is to release a compelling data story each month. The story may have one or multiple datasets.
What is the most important data set on the site? What are the most important insights?
Aaron and Mary: The THOR data on the site can be used for a variety of purposes. The public can look up a relative’s call-sign and see what missions he flew. The data has been useful in uncovering unexploded ordinance in Southeast Asia. It’s also been used in air power history and strategy classes at Air Force professional education schools. Its value to historians is immense. The data provides bomb damage assessments in the pilot’s own words dating back as far as 1918.
Of course, we plan to expand the data offering in the coming weeks and months by targeting and releasing data that can help solve defense problems and increase the public’s understanding of their military.
Open government data about the military carries obvious security concerns. How did the DoD approach decided what to disclose and how?
Aaron & Mary: All datasets are thoroughly vetted using the military’s well-established public release processes. While protecting national security and privacy remains the top priority of all defense employees, many aspects of the military’s daily business and operations are unclassified. By following best practices, information ranging from personnel diversity to contracting opportunities and open source software can be shared broadly without posing risk to national security objectives or operations security.
In this initial experimental stage, we are targeting public release of information with low sensitivity and risk for security implications. As the Defense Department gains more experience in providing open data sources to target specific problems, from logistics costs to scientific innovation, we will refine our understanding of where the balance between value and security lies.
What’s the most significant impact that could come out of this site? What could prevent it from happening?
Mary: Data.mil was launched for less than $10,000 and as a 20% project for Mary and Aaron in partnership with LiveStories and data.world. We want to collect feedback from the public and use that feedback to chart the site’s expansion. Data.mil is a working proof-of-concept and seeks to make the case that open data is a low-risk investment for DoD with immense potential value.
Eventually, we hope to make a significant impact on looming defense issues, some of which are summarized in the Department’s 2015 Performance Report. But to be successful, we need participation both from our public and private industry partners, and internal defense data owners. To expand, we need partners throughout the military who are looking to share their data. Potential partners and anyone with feedback on the pilot should reach out to email@example.com.
Ian Greenleigh, a digital strategist currently working as the head of brand at data.world, emailed Sunlight to share a discovery in the data.
I came across this story the other day, about an unexploded Royal Air Force “blockbuster” bomb they just found at a worksite in Augsburg, Germany. I found a few things to cross-reference in other articles, then asked my colleague to query the dataset to find the mission responsible for the bomb.