For centuries, the verbs to handle, to use, and to wield have been used in reference to physical objects. Today, the handholds created by metadata allow us to employ these same verbs in reference to digital assets. We handle files, use data, and wield information. Text-based digital files, such as Word documents or OCR’d PDF files, provide metadata that make search and discovery as easy as a game of catch with a Velcro ball and mitt. Every word or character within the document serves as another Velcro loop, greatly increasing the chances of connecting with its partner in crime, the Velcro hook, when performing a search. All of the loops and hooks that exist in text-based documents make discovery relatively effortless…almost child’s play.
Exposing all of this text to search mechanisms (e.g., indexing, textual analysis) has made finding and using information in text-based documents on a computer or server, or on the internet for that matter, relatively easy and straightforward. By comparison, non-text-based digital assets, such as audiovisual files in their raw form, only have a few lonely Velcro loops for the Velcro hook of an attempted search to grab on to. Revisiting the game-of-catch analogy, this takes us from child’s play to something that is best demonstrated by the following gif.
No hooks. No loops. No opposable thumbs. Just a knobby ball to try to help grip a bit, perhaps equivalent to a cryptic filename which may give you some hope of finding the file if you remember the abbreviations you used when you made up the name T4S6-1-03_Final_Final_Master.mov and put it in a directory called USETHISONE-ALT.
The saying “Possession is nine tenths of the law” essentially means that it is much easier to prove ownership of something if you have it in your possession. In a legal context, ownership is tied to rights and the ability to derive value from an asset. In everyday terms, this might mean use of, or the ability to sell or license a given asset. Outside of a legal context there is also a strong relationship between possession and the ability to derive value from an asset. In simple terms, if you can’t find something you effectively don’t possess it—you can neither use it, sell it, nor license it. You can’t spend the $5 folded up in the pocket of your pants laying at the bottom of your dirty clothes hamper. It is questionable whether you ever will. Maybe it makes it through the wash and you find it the next time you wear those pants. Maybe it gets mangled and torn to bits in the laundry. Maybe it falls out on the way to the laundromat and you lose it forever.
Chances are your organization owns lots of digital assets that are spread across some combination of personal computers, removable hard drives, and one or more servers. In some ways this scenario resembles the era of physical media (e.g., videotapes on shelves). In the same way that physical assets take up space, digital assets live on drives and servers that also take up physical space. However, channeling Bernie Sanders here, a ‘uuuuuuuuge difference between the two is that the organization of physical assets provides us with some inherent basic metadata, even when loosely organized. Most of the time something can be said about the origin of a group of physical items based on its arrangement in both absolute and relative terms (e.g., donor, production, project, series, element type, relative age, size).
The digital asset equivalent of this, of course, is the arrangement of files in directory structures and the naming of those files and directories. However, digital storage space does not have the same presence as physical space, and while digital files and directories are often named cryptically, physical items often have notes scrawled on them, labels affixed to them, or accompanying paperwork attached to them. In short, the physicality of a server is not synonymous with the physicality of a videotape, audiotape or film. While digital files may be stored on physical items that are in our possession, it is hard to argue possession over the digital files if one can not find them or use them.
In the past, the innate challenges of accessibility tied to the difficulty of playing back specialized, obsolete, and degrading formats AND the lack of metadata for discovery proved to be major obstacles. While possession was easy to argue because you had that darn thing on the shelf, without the ability to discover OR play it back, it was of little-to-no value to you. Joshua Ranger put this well in his blog post titled Your Inaccessible, Undocumented Collection Is Not Used & Therefore Has No Value. In many ways, in the digital file-based domain we have overcome the more fundamental technical reproduction issues. It is relatively simple to play back most digital audiovisual file formats these days. However, we no longer have that darn thing on the shelf. Files now populate disparate, siloed storage systems and range in organization from none-at-all to having systematic directory structures and filenames with collection management and digital asset management systems supporting inventory and description. The problem that remains the same through both the physical and file based domains is the one of discovery. However, the remedies available to us in the file-based domain are extraordinary compared to the opportunities available to us in the physical domain.
While these ideas have been swimming around in my mind for some time now (originally prompted after watching that gif of the Kangaroo dropping the ball), they have really solidified in new ways through a project on which we are currently working. The project is with a major university that is digitizing hundreds of thousands of hours of content and strikes at the core of the opportunities that exist in the file-based domain. Our focus in the project has been to envision, design, document, and budget for systems and workflows to efficiently generate as much meaningful metadata as possible in order to aid discovery, access, and rights determination. Without a project like this to complement the digitization efforts, the firehose of files coming out of the digitization process within this organization would be downright scary. It has been extremely fun to have the opportunity to dream up the most sophisticated systems we can, leveraging the combination of innovative automated metadata generation technologies and good ol’ human expertise and elbow grease. Anyhow, the point is that it has been eye opening in many ways, with this project and with other related projects over the past year, indicating that there is a maturation taking place. We are seeing the focus move solely from from large-scale digitization and broadening to include large-scale description.
In a sea of digital information, it is no longer about who owns the hardware. The physical location of the server is not what defines possession. Possession is restricted to those with the appropriate metadata (e.g., credentials, permissions), which grants them access; and, the ability to derive value is constrained to those who have metadata that will help them find and use the content.
One could argue that files sitting on hard drives or servers that are physically operating from one’s office, even with poor/no metadata, are in the possession of that organization. It becomes harder to argue that possession exists when files are stored in the cloud—also referred to as other people’s servers—when your digital assets are stored in a physical space you do not have any right to access and on a server that you do not own. If you are lucky, you might know the city in which your assets are stored. I’m not bringing this up because I think it’s good or bad. It just is. It also happens to be the case that it helps bring my point home.
With the move to the cloud there is a very real shift in the relationship between possession and the role of metadata. THE thing that defines boundaries, permissions, discovery, and access is metadata. If the metadata is irrevocably lost, effectively so are the files. Of course, cloud storage providers have infrastructure and safeguards in place to protect against such an event (Or do they? Have you read your service agreement to be sure?!). But you get the point. Metadata is the equivalent of presence. The role that physicality played in the past is played today, in the digital file-based domain, by metadata.
So go forth and describe. Generated via human and/or machine, metadata is your lock and your key, your mechanism for sharing and finding, your Velcro loop and hook, your ability to manage and use, and the only way to derive value from your digital assets. Metadata is 9/10 of possession and possession is 9/10 of the law.