"Unfortunately I am declining your request to obtain the text file. The SOD document that we now provide as a PDF is certified by Adobe as authentic. Text files would not be authenticated and the validity of the data they contain would be lost and subject to outside manipulation."
That's the email a friend of ours received after asking for a text version of the House Office Disbursement Data that was released in a 3000 page 9.4MB PDF. He apparently missed the irony of sending this officious message in an uncertified plain-text email.
So we've modified that email for your perusal:
So now that we've gotten the silly off our chest, let's talk about this logically.
"Certified by Adobe as authentic"
Wow. Next thing you know, Congress will only provide data to you dipped in wax with the Congressional seal embossed on it. This model doesn't work. It never has. It never will.
Of course, the only way you can view a certificate, according to the GPO is with Adobe's products. Not any of the alternative readers for the PDF ISO standard, but through Adobe's products. So, in the event that Adobe decides to charge for that service, then what? Then citizens will have to pay money to see if government is providing authentic information?
This is why Government should stick with free, open standards based technology to publish data. It costs less, is more widely distributed, less subject to foundational change, and more open for everyone to participate in.
There are easier, more open, less expensive-to-taxpayer way to certify documents. The web's been doing it for years. Its called md5sum, the web's been using it to verify software's authenticity for years. It exists, is free, and costs the taxpayer $0.
Now you could argue that teaching government employees how to create a hash is too costly and too difficult. Fine, while it sends shivers up my spine to think that our Congress is using a corporate entity to verify that its information is authentic, I still get it. It's easy. It Just Works. But even then, this still doesn't make sense. If you're worried about publishing data in a way that is authentic and verifiable, then do it. That should give you the cover, foundation, and verification you need to also publish the data in more machine-readable, mashable, usable formats.
Publishing an authentic, signed source document makes it so you can publish mashable information that can be better trusted and verified because it can be explicitly cross checked. Adobe even supports this-- you can attach source data and embed it in your digitally signed PDF.
We actively want our government to change this mindset that their data is not for outside manipulation. Of course it is for outside manipulation. And it'll be manipulated whether government likes it or not. People can still modify and republish the data-- just like we did. We're more than happy to go anywhere in any agency and show them how to change-- to make data more available and more mashable to citizens and developers.
GPO: 866.512.1800 firstname.lastname@example.org
CAO: 202-226-5680 Jeff.Ventura@mail.house.gov