A research hack for US government documents on Google Books
In recent months, I’ve been using a lot of US government reports from the early 20th century— mostly publications by the Children’s Bureau and Census Bureau. There’s a lot of what I need in the collections of Google Books, Internet Archive/OpenLibrary, and HathiTrust, but works that have been digitized aren’t always available to me in useful formats.
- “US Government works” can’t be copyrighted, but not all works published by the Government Printing Office are “US Government works,” and some GPO-published works contain material produced by contractors (which can be copyrighted) or excerpts of copyrighted material.
- Google Books takes a cautious approach; it routinely marks GPO publications published since 1923 as still-in-copyright. You can report something that’s inappropriately marked as copyrighted, but Google doesn’t act quickly to release those works from copyright jail, because they don’t primarily exist to serve scholars. And Google Books metadata is a huge jumble— particularly for item dates.
- HathiTrust, which does exist to serve scholars, is much more responsive about releasing works from copyright jail and metadata corrections, when you report errors. Unfortunately for me, only users affiliated with HathiTrust member institutions can download full PDFs of those works, and my home institution isn’t a member.
- OpenLibrary sometimes has GPO-published items I’m looking for, but their collections are hit-or-miss for these items.
Here’s how I worked around these problems to get what I needed. Maybe this trick will be useful for someone else out there.
To answer my research questions, I wanted the US Children’s Bureau annual reports, roughly 1924-1933— just after the 1923 copyright border. That made everything tougher, since Google Books has only snippet views for all the post-1924 reports, due to mis-administered copyright restrictions. Some of the reports might have been included in the Department of [Commerce and] Labor’s annual reports, which are occasionally less copyright-restricted, but the DCL reports are 600-800 pages each and the Children’s Bureau reports are, at most, about 100 pages.
But wait, there’s a solution.
While poking around, I found a Google Books version of the 1924 report, but it was available only in ePub format, which loses page numbers from the original. To find a better copy with original pagination, I copied a long sentence related to my research: “A colored doctor has been added to the bureau staff, and she is at present assisting the Tennessee Health Department in an investigation and educational campaign among colored midwives of the State.”1
When I pasted that sentence, in quotes, into the Google Books search box, I found three GBooks items which contained it. The first hit was the ePub referenced above; the third hit was copyright-locked; and the second result was a PDF of a bound series volume containing the reports for 1920 and 1923 through 1932.
I downloaded the PDF, then used Acrobat Professional to split it out into each year’s reports and OCR it for searchability. Works like a charm.
The reason this Google Books entry wasn’t copyright-locked was that it’s a library-bound volume, and the first publication in it has a date before the 1923 copyright barrier. I suspect that many bound-pamphlets volumes on Google Books probably have similar metadata errors, which scholars working on the 1920s (and maybe even the 1930s) can use to our advantage.
But if you’re researching the Children’s Bureau, save yourself time.
Only later did I discover that Georgetown’s Maternal and Child Health Library has a nearly complete set of Children’s Bureau publications in PDF, including the Children’s Bureau annual reports. Which, as far as I can tell, don’t show up on Worldcat.