A flowery title for a blog post, I’ll admit, but I hope that at least the Le Guin fans out there will forgive me. The problem of knowing something’s true name is in the news, most particularly in this story from Wired’s Spencer Ackerman:
Through a “joint venture,” the notorious private security firm Blackwater has won a piece of a five-year State Department contract worth up to $10 billion, Danger Room has learned.
Apparently, there is no misdeed so big that it can keep guns-for-hire from working for the government. And this is despite a campaign pledge from Secretary of State Hillary Clinton to ban the company from federal contracts.
Eight private security firms have won State’s giant Worldwide Protective Services contract, the big Foggy Bottom partnership to keep embassies and their inhabitants safe. Two of those firms are longtime State contract holders DynCorp and Triple Canopy. The others are newcomers to the big security contract: EOD Technology, SOC, Aegis Defense Services, Global Strategies Group, Torres International Services and International Development Solutions LLC.
Don’t see any of Blackwater’s myriad business names on there? That’s apparently by design. Blackwater and the State Department tried their best to obscure their renewed relationship. As Danger Room reported on Wednesday, Blackwater did not appear on the vendors’ list for Worldwide Protective Services. And the State Department confirms that the company, renamed Xe Services, didn’t actually submit its own independent bid. Instead, they used a blandly-named cut out, “International Development Solutions,” to retain a toehold into State’s lucrative security business. No one who looks at the official announcement of the contract award would have any idea that firm is connected to Blackwater.
This is a troubling story. But for those of us who work with government data, it’s an all-too-familiar one. Navigating the link between an entity’s name and its identity is very, very difficult. Sunlight Reporting Group wrote about a similar problem back in January: a blacklist of contractors called the Excluded Party List System has been failing to do its job, partly because of difficulties in positively identifying the companies entered into it. People and even companies can have similar names, or the names entered into the system can contain typos. It’s not uncommon to wind up with a fuzzy sort of match, and then to have to use whatever additional data is on hand — an address, or a date, whatever — to add confidence to the guess.
But even a match may not be enough. As the Blackwater story makes clear, knowing the name of one part of a complicated corporate hierarchy often isn’t sufficient to reveal the structure of the organization. And it’s certainly not enough to provide an understanding of that larger organization’s interactions with the government.
After the Gulf oil spill, our Data Commons team started looking at the records in transparencydata.com and influenceexplorer.com associated with BP. Did you know that BP stations are independently owned? We didn’t until we did the necessary research. But there’s no good way to make those kinds of determinations systematically.
What we need are reliable, unique identifiers that are tied to corporations and to information about their related entities. There are sources of data on this stuff, but they all have problems. Dun & Bradstreet and other firms like them sell business intelligence services, employing an armies of researchers to keep their databases up to date. Understandably, they consider the collected information to be proprietary, which makes it less than ideal from Sunlight’s perspective (and makes USASpending’s adoption of D&B’s DUNS identifier system a problem). The IRS collects this information from companies, but it’s tax information that’s considered private. Publicly traded companies disclose some of this information to the SEC — but that’s only part of the puzzle. The Central Contractor Registry is a useful resource, but it’s also keyed by DUNS, it’s partially redacted for public use, and no public bulk data access is possible. It also doesn’t address the related-entity question, except through its ties to DUNS.
This is a hard problem, but it’s not a new one. Certainly, it’ll sound familiar to anyone who’s tried to work with data at the intersection of government and industry. We’re interested in better solutions — so what am I missing in the above list?