For God’s Sake, Stop Digitizing Paper
We are all aware of the expectation from the general public that EverythingEver is digitized and available online already. This is of course frustrating because, 1) when presented with the reality there is often a negative reaction from the public, primarily that archives are lazy and behind the times, but also, 2) the perception is often trotted out as a reason why preservation efforts are frivolous or don’t need to be funded, especially if the citation is in reference to commercial assets that are widely torrented. “There are a million copies online,” the argument goes, “Why should we care about archives or pay attention to their uptight rules?”, little aware that the bulk of cultural material in existence does not consist of Hollywood films, studio albums, and published works (or student websites from 15 years ago).
These types of reactions immediately put archivists on the defensive, peddling back from blows while throwing up nuanced blocks about the costs, the sheer amount of material, infrastructure, rights, and so on. The problem here is that, besides being unable to mount a clear offensive and head off misconceptions, these defensive arguments fall back on what is essentially inability. There isn’t enough money. It’s too hard. We can’t do it. Even from a defensive standpoint, those responses — those excuses — do not encourage confidence or trust that archives will be able to digitize and provide access to a greater amount of materials.
Why do we digitize collections? Looking at RFPs that are posted and grants that are awarded, I would say it primarily appears to be for access and research, whether that’s for some open data initiative, a digital humanities project, creating an online portal, or “it’s a pain to store and manage all those newspapers, journals, and microfilm”.
Frankly, I think we need to consider a moratorium on this. We should agree to stop digitizing paper and other stable formats for a set period because, in a way, it is bad for preservation.
Now pardon me while I throw up some just-derided, nuanced (or maybe just vague) defenses. Access is a key part of what archives do, and online access has greatly expanded the capabilities in that area. My first archive-ish job involved transcribing manuscripts for the Walt Whitman [Electronic] Archive, one of many digital humanities projects associated with the University of Virginia, most of which I felt were amazing new resources that were critical to the ability of students and scholars to do research.
I also understand the argument that access is preservation. The patron desire for access often drives preservation, or processing, or other activities. And lots of copies keeps stuff safe and all that jive.
However, access and preservation are parallel, not one in the same, and, when dealing with audiovisual materials, preservation creates access, but not always the other way around. For almost all formats in almost all cases, reformatting is required to preserve the signal or image, and in a growing number of cases it is absolutely required to provide any degree of access beyond looking at the physical object. And what we consider preservation level reformatting generally means capturing the qualities and the authenticity of the original signal to the best of our abilities. At this point in time, reformatting means digitization (okay, for most formats, I will graciously concede). Within given scenarios, it is reasonable to digitize audio, video, and film primarily for access purposes, but that can also result in wasted effort and wasted resources when not simultaneously digitizing for preservation purposes.
Additionally, there is a general consensus that we have a 10-15 year window (or 9-14 if you’re being precise and don’t mind numbers unpleasant to our base10 minds) to reformat magnetic media before it becomes too degraded or too expensive to save at any scale. This does not mean that all tape will suddenly become big piles of goop, but that the effort and cost will be unaffordable and, especially with video or audio formats like DAT, degradation and technological obsolescence will overwhelmingly contribute to the rise in costs and difficulty.
We cannot save everything in that time. We shouldn’t try, because not everything is worth it, but also it would be irresponsible in many cases to just push everything through the pipeline. However, if we are not reallocating resources in the near future to digitizing audiovisual materials in archives, we are relegating a much larger portion of materials to the circular file of history than is necessary or is reasonable for what we can actually accomplish.
What this also means is that organizations will require some sort of digital preservation infrastructure or policies (or plans for such) to support digital efforts. I’ve seen many examples where this is not the case with digitization purely for access because there is not the business case for moving ahead with the new version if the original is still viable. This means that a lot of digitization work is essentially a wasted effort if it needs to be done again for access, or future preservation work, if files, access portals, metadata, and digital humanities projects are lost. And I’m not just saying lost as in the fretting about the unreliability of digital files, but lost due to human failure in managing servers, migrating data, or letting websites go dead. In a way, one could see this situation as akin to overproduction or overconsumption of resources…or even akin to the over acquisition of collections without the plan, resources, or ability to properly process and provide access to them.
As the NDSA Levels of Digital Preservation and other efforts have helped show us, we can’t wait for perfection to move ahead. The ultimate system for digital preservation does not have to be in place to move ahead, but there does need to be a baseline effort and a plan for improving. Likewise, there should be a set of levels for physical preservation, though the difficulty here is in the application across formats with such variable physical characteristics. In some cases, proper storage, housing, and periodic conservation is sufficient for the minimal level for buying many more decades of persistence. In other cases, those efforts are only sufficient for immediate and very short term persistence, and a baseline effort must include a near term plan for starting selective digitization — digitization that is appropriately scoped and scaled to meet the needs of the formats at hand and the organization.
Preservation is an active process, but it is also playing the long game. The active processes around it can have gaps in application, can occur over extended periods, or can appear deceptively passive, as with the monitoring of storage environments. We have to have that long game mentality to keep in mind that custodianship extends beyond ourselves, our technologies, and our institutions. But if we ignore the short game of preservation (and the shorter periods we have to act), we allow known risks to fester and increase to a degree that undermines the long term plan.