The perils of personally identifiable pre-conviction data

(Photo credit: Joe in DC/Flickr)


Sunlight began examining criminal justice data almost a year and a half ago, as calls for nationwide officer-involved shooting statistics highlighted the fragmented nature of this data across the country. Sunlight unleashed a team of researchers, developers and policy analysts to scour the nooks and crannies of criminal justice data. After 18 months of investigating, researching, cross-referencing and tagging, we’ve collected over 9,000 databases of publicly available criminal justice data from all 50 states, the District of Columbia and the federal government. We call it Hall of Justice, and you can find it here.

The inventory was created to showcase the problems with data and highlight the need for more uniform and accessible standards in its publication and collection. In the process of developing it, our research uncovered troves of personally identifiable datasets. These discoveries made us wonder why so little congruence appeared to exist in how microdata were made public. Furthermore, we questioned the reasons behind its release, and whether the ethics of privacy and the right to a second chance were ever considered beforehand.

Unfortunately, those questions don’t have simple answers. Over the next three days, we’ll reconstruct the landscape of what exists online and the background of how and why it’s made available to the public. Read our first entry below.

Unsurprisingly, standards for criminal justice data vary widely from state to state. California releases criminal justice microdata to academics conducting policy research, but scrubs personally identifiable information in online publications of data — even as the state requires police to make public a fairly comprehensive set of details about adult arrestees. Texas, worn down from processing FOIA requests, takes an alternative route and publishes inmates’ personally identifiable information online — but buries the link so deeply in its corrections site that few realize this database exists.  

Yet many questions remained unanswered, particularly around personal information and privacy. Where do we draw the line between citizens’ rights to freedom of information and individuals’ rights to privacy? Should governments differentiate between pre-conviction data — for example, mugshots or arrest records — and post-conviction data, such as records of inmates and their respective convictions? When personally identifiable information has been released in the name of the public good, how and when should governments protect against its exploitation for private gain?  


Let’s take a concrete example: mugshots. Virginia’s Freedom of Information Act requires local governments to release mugshots of adult arrestees. In Danville, Va., the local government complies with that requirement by posting two PDFs each week. The first lists the name, age, gender, race, address, date of birth and arrest location of each arrestee in the past week, as well as the charges filed. The second provides similar information, with accompanying mug shots.

This is problematic because of an increasingly common exploitative business model — one in which mugshots posted online by cities or counties are rehosted on private sites that charge users for their removal. Fortunately, many states, including Virginia, have passed legislation barring the practice of rehosting mugshots and soliciting payment for them to be taken down. Similar efforts have resulted in more complicated disagreements between lawmakers and the media, who, at least in South Carolina, believe that mugshots are a valuable part of reporting in the public interest. That dispute has bubbled up to the federal level, too. In 2012, the Reporters Committee for Freedom of the Press pressured former Attorney General Eric Holder to release mugshots under the Freedom of Information Act.

Some in law enforcement agencies have taken action unilaterally. A South Carolina jail decided to stop posting mugshots online in response to sites using extortion to profit off of arrestee information, while a Salt Lake County sheriff lambasted the practice as dragging people “through the mud for rest of their lives.” The photographs cause lasting damage, and some seemingly value them as a form of entertainment.

That statement raises the question of why mugshots are considered a part of the public domain in the first place. CityLab unearthed a 1999 court ruling against The New Orleans Times-Picayune, which sued the Department of Justice to get a mugshot of former 49ers owner Eddie DeBartolo, Jr., and lost. The court summed up its decision with an argument that sheds light on the lasting impact of these artifacts.

A mug shot preserves, in its unique and visually powerful way, the subject individual’s brush with the law for posterity. It would be reasonable for a criminal defendant, even one who has already been convicted and sentenced, to object to the public disclosure of his or her mug shot.

Furthermore, the court noted that mugshots “contain information that is intended for the use of a particular group or class of persons.” Principally, they serve as references for law enforcement to identify potential criminals in the case of repeat offenses, but even official uses of photographic data can have drawbacks when taken to an extreme.

The FBI’s Next Generation Identification face recognition database will rely primarily on booking photos to match potential criminals with computer generated candidate profile lists that try to make educated matches with existing data. The Electronic Frontier Foundation warns that the technology has a great potential for false positives given the massive size of the dataset, potentially triggering criminal investigations on certain individuals for no reason other than likeness. Misuses of data by law enforcement are not unheard of either: Dozens of reports of Florida officers conducting unauthorized searches on the state’s driving and vehicle information database have surfaced over the last several years, for example.

Arrest data

Daily arrest records are commonly posted online by local governments in a wide variety of formats. The Hartford (Conn.) Police Department, for instance, posts them via a data export from its records management system, where they include information on the suspect such as address, release status, charge and bond details, and notes on appearance. Similar examples exist at the county level: The corrections department of Orange County, Fla., uploads a daily jail booking report in PDF that appears to be slightly more processed, removing more personal elements such as address and appearance. Finally, some — like the Lafayette Parish Sheriff’s Office in Louisiana — publish these data in a rather unsophisticated paragraph-like style directly onto its site.

An alternative to PDFs and plain text comes in the form of more user-friendly interfaces that allow visitors to search for arrest records by name, ID number, date or offense. Some, like the Harris County (Texas) Sheriff’s Office, only display results if information specific to the individual is entered, while others, like the Mecklenburg County (N.C.) Sheriff’s site, require only a letter or a date to return data on alleged offenders. Finally, others are provided in bulk. Fairfax County, Va., provides both a delimiter-separated version of its weekly arrest database, while West Virginia publishes a statewide bulk display of arrests by county. Either way, in most cases arrest data are only archived for a certain period, and then become unavailable online. In Prince William County, Va., records stay on the site for 60 days before they are removed, a fairly common practice by sheriffs.

In a country with constitutional due process requirements establishing a presumption of innocence, problematic policy questions emerge from these pre-conviction datasets. In California, for example, half of arrests do not result in charges or convictions. Tomorrow, we will examine the post-conviction datasets we discovered and how different stakeholders weigh their merits.