Durable data

Alex Reinhart – Updated March 22, 2017 notebooks · refsmmat.com

As evidenced by these notebooks, I am a packrat. I hoard information and try to organize it: I keep email going back to 2007. I have an (infrequently updated) journal going back to 2006. I have something like 1,100 PDFs of scientific papers squirreled away, the detritus of past research projects and idle curiosity.

But I am also aware that digital data tends to rot. Not only do hard drives fail and files get corrupted (or just lost in the shuffle between devices), but formats and services disappear. I have old Pages documents in its proprietary file format, which basically no other software supports; my extensive Mathematica notebooks will become useless if I lose my student license; many social media, photo sharing, and “cloud” websites will likely fold sooner or later when the venture capital runs out; and half my old programming projects are probably incompatible with the latest version of their languages or libraries.

I’m also worried by the iOS ecosystem, where the “file” metaphor is nearly gone and apps don’t store their data in any readily accessible place. There are a zillion notebook and journal apps – how many of those will still be functioning ten years and ten iOS upgrades from now? If they disappear, how do we get our data out?

Plain text tools

Plain text formats are the ultimate in durability: even if the fancy tools no longer exist, ordinary ASCII and UTF-8 text files should be readable for decades, and you can always extract the structured data one way or another if it’s sufficiently important.

References on durability

Open questions