DoD correspondence log converted from pix to spreadsheets
I’m posting, in an Excel spread sheet, the congressional correspondence logs covering the first three months of 2007 that we got a while back in a less than user friendly format from the Office of the Secretary of Defense. Here’s a sample of what we got in response to our FOIA — a .tif or tagged image file format — I picked one at random, but we have a CD-Rom with 189 files just like it.
Anu turned the files over to Scott Wells, our multi-talented office administrator, who used a program called ocrad (it runs on Linux) to convert it to a text file, which Anu posted here. Here’s a sample of what the converted .tif files looked like:
OSD CONTROL NUMBER: OSD 03257-07 DOCUMENTTYPE: INCOMING DOC: 2128/2007 DOR: 31212007
FROM: uss LEVIN, c TO: SECDEF
SUBJECT: REauEsT FOR NUMBER OF IRAal INDIVIDUALS WHO HAVE HELPED THE u.s. SUSTAIN AND MANAGE ITS PRESENCE IN IRAa
AGENCY: JCS TASK: PRS SUSPENSE: 3/1312007 ACD:
FILE NUMBER: IRAa
OSD CONTROL NUMBER: OSD 03288-07 DOCUMENT TYPE: INCOMING DOC: 212812007 DOR: 3/2/2007
FROM: uss VOINOVICH, G TO: SECDEF
___’_BJEIT_CLAIM AGAINSTTHE FEDERAL GOVERNMENT FOR COSTS INCURRED AS A RESULT OF A TERMINATION OF CONTRACT _
AGENCY: SA – TASK: RD SUSPENSE: 3113/2007 ACD: 3/13/2007 _
FILE NUMBER: 160
OSD CONTROL NUMBER: OSD 03443-07 DOCUMENT TYPE: INCOMING DOC: 212812007 DOR: 3/6/2007
FROM: uss CANTWELL, M TO: LA
SUBJECT: REauEsT YOUR SUPPORT IN EXPEDITING MY INVESTIGATION _
AGENCY:SA TASK:RD SUSPENSE: ACD:
FILE NUMBER: T-
Not perfect, but at least digitized and searchable. But still, somehow unsatisfying. I fooled around with the text file and was able to convert it to a tab delimited form (all those years coding agate at the Philadelphia Inquirer really came in handy).
Now, a few explanations. There are three sheets on the spreadsheet. Sheet one is the cleanest version of the data with some value added fields, sheet two has every field–the enhanced ones and the original ones, and sheet three has only the original fields from the text file. I ended up ignoring some fields (in part because DoD stopped sending them to us in response to subsequent requests, and in part because we’ve been unable to learn from DoD what those fields mean–the ones I didn’t really touch were Agency, Task, Suspense and File Number). There’s also a pair of columns called “Extra One” and “Extra Two” — some of the data got bumped further to the right, but it was hard to tell which column to assign the extras to.
There’s also some very messy data. For example, there’s a lot of garble like this, WOULD LIKE7a REIOhR_REPID_R POSITION IN DOD,
and this OSD “j2o5-07. The latter is from the OSD control number field, which one could use in a freedom of information request to more easily get a copy of the actual letter to which it refers. Those numbers are supposed to look like this: OSD 00136-07.
Now, to get this data into better shape, I need to go back to those .tif files, print them all out (there are 189 of them) and painstakingly go through them, comparing each page to the corresponding line in the spread sheet, fixing garble and double checking numbers, names and dates.
Now, the really ridiculous thing about all this is that the Office of the Secretary of Defense keeps its records in a form not dissimilar to the one that we’ve managed to put together here. To respond to our FOIA request, someone at DoD (probably a contractor) printed pages from a database, which they then turned into .tif files, which they then copied onto a CD-Rom, which they then sent to us. I’ll have more to say about that aspect of this later.