The advent of digital information technology has produced some fantastical results. Digital publishing has made it easier than ever to disseminate information. I can press Publish on a blog post and instantly make it accessible to billions of people (not that any of them will read it). I can buy a book online and have it instantly available on my phone and eBook reader and web browser, doing both the purchasing and the reading anywhere in the world. The introduction of digital information technology has made dissemination of information fast, easy, and cheap.
But this convenience is not without cost. While the dissemination of information has never been easier, going digital has significant drawbacks for the other side of the coin: archiving. Digital was billed as the end of information degradation. A CD will produce exactly the same audio no matter how many times you play it, and an eBook will never have a torn page or coffee stain. A digital photograph of a painting will never need an expert restoration after years of accumulating dust in a museum. Digitization was going to revolutionize archiving, and it has. But not always in the way we expected.
A CD may produce exactly the same audio ever time you play it, but finding a CD player is becoming more difficult as new technology is released. Why carry around hundreds of CDs when you can store your entire music collection on a memory card the size of your thumbnail? Rather than the media wearing out, the hardware used to access it are phased out and no longer available. Like LPs and 8-tracks and cassettes, CDs are an obsolete media format being phased out in the name of progress. Digital photographs will also display identically, until a few bits get flipped that corrupt the file format’s data header, rending it unusable. And unlike a painting, a digital file is so ephemeral that deleting it can be done by accident, or intentionally with far less psychological weight than destroying the painting would carry.
eBooks are a significant change in the way we package and disseminate deep thoughts and collections of information compared to their print counterparts. While it’s true that I can store thousands of them on my phone, keeping my entire library in my pocket, those files have the same danger for corruption and deletion that the digital images discussed above do. And just as CD players are being phased out in favor of other types of media and devices to play them, eBook formats are continuing to evolve, with support for older formats being dropped. Sony introduced the BBeB format, but has abandoned it in favor of EPUB. Barnes and Noble adopted the eReader format, only to drop it in favor of EPUB and PDF three months later. Apple’s iBook format has similarly been abandoned. Mobipocket was adopted by Amazon, only to be extended and extinguished in favor of its modified Kindle format. EPUB version 1 has been abandoned in favor of version 2, which is not backwards compatible, and with version 3 making inroads into the market, version 2 files may soon become unreadable.
Many of these eBook formats are entangled with specific publishers. Kindle books can only be read on Amazon-approved devices, and similar is true for Google Books. When publishers move to new file formats, or go out of business, customers who have “bought” their books through these vendors often find themselves with a pile of bytes they can no longer access. These access restrictions are enforced by Digital Rights Management technologies, locks placed on the digital books by the vendors that allow the vendors to continue to exert control over the digital copies even after purchase. Unlike a physical book, which the user truly owns and which is subject to the first sale doctrine, publishers consider eBook purchases to be for licenses to a book, that are subject to much more stringent limitations. Format changes and business closures aren’t the only threats under this model: Amazon famously encountered licensing issues when it first launched its Kindle product, selling copies of George Orwell’s book 1984 it did not have the rights to. Customers who bought the book woke up one morning to find that Amazon had silently and unilaterally deleted it off of their devices, including all of the notes users had entered as marginalia, a move that many noted parallels the draconian imagining of the future that Orwell had described.
The Digital Rights Management technology enforces more limited rights for book owners than would be possible with traditional print media. Users are often unable to copy portions of the work for use in critique and parody as allowed under copyright, nor can they claim their right of first sale to sell or lend their own copy. And the technology used to create these barriers is illegal to circumvent due to laws like the Digital Millennium Copyright Act of 1998. Netflix is expected to remove a variety of DRM-protected TV shows and movies from its collection at the end of 2023, and while some like Jaws and Saving Private Ryan will remain accessible through other means, Netflix-exclusive titles will simply cease to be accessible by any means. Because of the DRM protections and the legal prohibitions on bypassing them, archivists are unable to utilize their right under copyright law to create a backup, and that cultural art will be lost to time.
Media and format obsolescence, accidental deletion, file corruption, and DRM restrictions have significant drawbacks compared to print media for the user, but also for the archivist. It’s insufficient to simply store a book on a server somewhere to create an archive. If an archive adopts a file format that becomes unsupported, that book is now lost. A server failure can wipe out an entire archive. And vendors can go out of business, losing the keys to unlock DRM-restricted files. A digital archivist has a much harder job than a print archivist does, because there are so many technological and legal hurdles to overcome to maintain access to the collection. Corporations have a vested financial interest in locking in users to their platforms and continually changing supported formats, because that forces users to stay with them to continue to have access to their information and pay for new licenses for each format. And the impact of that for-profit behavior carries over to digital archivists.
So what is an archivist to do? I propose that we focus our attention on three areas. First and foremost, we should seek to add information to our archives using open-source formats like EPUB without any DRM restrictions. This will ensure that we can create tools to reformat them forward as new formats emerge and are adopted. Second, we should support organizations like Wikipedia and the Internet Archive, who are fighting for the rights of digital archivists and are examples of successful large-scale digital archiving. And third, we should lobby for laws and regulations that favor information users over publisher oligarchies, fighting for repeals of laws like the DMCA that restrict legal use under copyright and supporting organizations in lawsuits over topics like controlled digital lending.
Digital dissemination technology has revolutionized information access, but the pendulum has swung far in favor of publishers over the rights of users and future generations to access information. We must take action to secure our continued access to the vast stores of information and cultural heritage contained in these digital forms before corporations restrict and destroy it in the name of profit.