Taming the all-digital history research collection, 1: tagging and filing
Ed Lazowska points to ways that computer science has changed information management in the last 40 years. Here’s the top of his list:
1. Search. Ten years ago, you would painstakingly organize things – label them and file them – so that you could find them. How 1990s! Today, you can search more than 500 Terabytes of the web (not to mention your own desktop) in 100 milliseconds.
From where I sit, he’s half right. I’ve been doing everything I can to have all my primary sources available in digital format, whether they’re photos of manuscripts and typescripts at a physical archive, PDFs from JSTOR, or PDFs from preservation microfilm. Like many historians, the vast majority of the sources I work with aren’t yet digitized (and may never be), but the benefits of always having my primary sources available on my laptop are huge. I get a lot out of web search and desktop search, but I still end up doing a lot of “painstakingly organiz[ing] things” by hand.
Since 2007 or so, I’ve come up with my own research-organization scheme, because I couldn’t find anyone who’d written in detail about how to keep an all-digital research collection for individual scholarly use. (Everyone I talked to said, “That’s always highly individualized, and different things work for different people. You can probably figure out something good on your own.”)
In the spirit of ProfHacker, my favorite academia-plus-technology howto website, here’s what I’ve found useful. These tricks aren’t perfect at all, but they’re a place to start. It’s Mac-centric because that’s what I use, but Mac and Windows users should feel free to comment with your experiences about what works for you.
One particularly helpful blog (that I’ve lost track of now) suggested using tag prefixes to allow more granularity. For example, a tag “SU) infant mortality” is a subject tag for materials about infant mortality. “PN) Abbott. Grace” means that this item relates to a person by name (American social reformer Grace Abbott). Not complicated, flexible, easy enough to remember. It’s not Dublin Core Metadata, but it’s good enough for personal use.1
Early in my research, on the advice of the internet, I adopted a tag-based filing approach at the file level. I’ve used a bunch of different software that supports tagging and searching by tags, like Ironic Software’s TagIt, but ultimately filesystem-based tagging wasn’t granular enough for my purposes. These days, my main use of tags is in Evernote for individual notes I’ve taken about particular topics, or paragraph-length quote clippings out of longer PDFs.
Tagging my materials well requires a lot of discipline, and I haven’t mastered it yet, but when I do it, it works. One of the pitfalls I’ve had to avoid is the temptation to get every single item tagged properly, because I’m one person with a big project and a finite amount of time. Over time, I’ve moved away from tagging files in favor of verbose, search-friendly file names.
Search-friendly file and folder naming
When I take digital photos at an archive, I file them in a hierarchy by repository, collection, series, box, and folder. (Shooting images of the pull slips, the box label, and the folder label helps immensely with this, since I can identify a folder label from its thumbnail image.) I’ve built some Applescripts and shell scripts to help with that filing process.
When I find an item in a set of photos that I want to remember, I rename the file by adding useful words to the end of the camera-generated filename. (I know it’s around here somewhere, it was a letter in the Children’s Bureau collections from a mother in California in the early 1940s…) A Spotlight search on “California” and “mother” in my folder for “Children’s Bureau Central File 1941-44″ comes up with a manageable number of files, and I can browse them until I find what I was thinking of. The trick here is to pick obvious words or phrases that you’ll think of when you want to search.
Search-friendly filenames also are more likely to survive multiple generations of cross-platform backups than are most of the existing Mac file-tagging systems.
(I keep backups on local harddrives and online via JungleDisk. A redundant backup system is critical before trusting your career-making research to any computer.)
For PDFs, I use my citation management software’s filing system; whenever I can, I rename the files using a modified Chicago-style citation format. That solves the problem of finding Grace Abbott’s writings outside of my photo collection. When I’ve been to a library with a digital-camera ban and have returned with a stack of photocopies, I take advantage of my university’s bulk-feed PDF scanner (and its OCR software).
When I see a little quote I want to remember, I use Evernote‘s screen-clip feature to create a note about it, and I type in a basic citation for that clipping. Evernote does some OCR to make the image searchable, which is a huge help.
Together, these tricks handle the “How do I keep track of it?” question, mostly.2 But they’re rough tools. They don’t solve the problem of creating intellectually meaningful ways to search, sort, and analyze digital-format sources. In my experience, that’s a much harder problem, and I’ll write more about it in an upcoming post.
- I have a little theory that the Omeka collections-management software—which does use the Dublin Core metadata standards—might now be up to handling many of the tasks I describe here, particularly with its bulk-loading plugins, but I haven’t had time to do anything more than install it experimentally. ↩
- Connecting my digital-sources files with my citation management software is more work than it ought to be, which is a subject for another post. I’ve used EndNote and now use Bookends, and someday I’ll switch over to Zotero once the Mellel developers support it properly. ↩