=================
== Joe Carrano ==
=================
archivist + historian

Digital media survey results

archives digital media survey

Over the last year, I’ve worked on a digital media survey of our archival and distinctive collections at MIT Libraries. I mentioned in that post linked above I would update this blog along the way but that didn’t really happen, sorry friends!

Media circus

Things went pretty well and a bit faster than I expected. It took me just over a year with some fits and starts to finish the whole survey process. The first stage was our archival and manuscript collections, of which, I was able to review over 300 boxes and a few dozen rare books and publications in 8 months.

Along the way, I transferred data from media we could in handle in house within collections that had a total of 2 or fewer pieces of media since it was convenient while the boxes were onsite. The small number ensured that we could get them done relatively easily and have a quick win of crossing a collection off the list with media needing to transfer data from.

After surveying the first stage of surveying, I started on the academic theses, which are also under our purview. The work to assemble the list of media from published material was started earlier when needing to find this information for publications and rare books in the main collections mentioned above. As these material have their metadata stored as MARC records in Alma, I went about putting together a similar list of material as with the archival records. Within Alma, I ran a number of “indication rules” looking in the 300 field, subfields a and e for the terms: dis*, flopp*, zip,computer, cartridge, CD*, DVD*, USB, and drive. Additionally, local cataloging practice for a time was to use a call number prefix for media, often when those items were included with another print title with the same call number. I was able to assemble a list of these prefixes like CDROM, DSKETTE, etc. and search for them in Alma as well, in case this surfaced anything missed in the 300 field search. I combined the results into a spreadsheet, deduped them, and then got to work surveying and storing results in our media log tool.

Work on the theses took about another 4 months as getting them from offsite storage was just as time consuming as when doing so with the main part of the survey. With some of the these stored in boxes as a group, I often had to request many boxes just to get through a year with a few theses that contained external media. Ultimately, I was able to finish up everything by the end of February of this year.

Results

Here are the results from our the total survey (including theses). I’ve simplified CDs and DVDs to be a summary of all of their subtypes.

FormatCount
Floppy disk: 3.5 inch909
CD846
Floppy disk: 5.25 inch394
DVD339
Data tape278
Data cartridge238
Zip disk126
Floppy disk: 8 inch49
Jaz disk7
Hard drive4
Computer4
Flash memory: USB3

Overall, I found just slightly under 3000 pieces of digital media in our distinctive collections and another 250ish pieces of media included with theses. It’s thankfully much less than you might expect for a technology-focused place like MIT. The majority of media found were floppy and optical formats that we have established workflows and equipment to handle. Whether the content is readable or not is another story but we can make an attempt.

While we can handle most of the found formats, the survey revealed a sizable number of 8-inch floppy disks, data cartridges, and mainframe data tapes, that we don’t have an in-house solution for. Those results are concerning but this is exactly what we wanted to surface for planning purposes.

Lessons Learned

Things didn’t always go smoothly, this survey ended up being an exercise in metadata remediation. We often encountered errors in box numbers, missing items, total lack of description, and completely misidentified things.

One particular frustration with the archival and manuscript collections was that 180 entries out of 640 turned out to be false positive matches for digital media. About half of those were from using the phrase “backup” or “back-up”, which was a false match 92 percent of the time. Often it really just meant paper backups. It turns out that despite seeing the words “backup” written on labels of a lot of the media I found, it was almost never entered into a finding aid in those cases.

Future steps

This data has already been a great starting point for better understanding how much work is ahead in analyzing and transferring digital media and how we can plan to go about it. We’ve started dissecting the specifics of the data to prioritize some of it, such as institutional records we have a clear mandate to preserve and things from women and BIPOC related collections. In the next year, we’ll begin more work on getting though the media as well as establishing metrics on how long on average it takes to process different media types. This will help give us a better idea how long we will need to commit to some of the transfer projects.