A research hack for US government documents on Google Books

2011 October 8
Poster for a side show at the Vermont state fair, Rutland (LOC)

Sometimes, getting what you need out of Google Books feels like this. (Photo: Jack Delano, FSA/OWI, 1939, via Flickr Commons)

In recent months, I’ve been using a lot of US government reports from the early 20th century— mostly publications by the Children’s Bureau and Census Bureau. There’s a lot of what I need in the collections of Google Books, Internet Archive/OpenLibrary, and HathiTrust, but works that have been digitized aren’t always available to me in useful formats.

  • “US Government works” can’t be copyrighted, but not all works published by the Government Printing Office are “US Government works,” and some GPO-published works contain material produced by contractors (which can be copyrighted) or excerpts of copyrighted material.
  • Google Books takes a cautious approach; it routinely marks GPO publications published since 1923 as still-in-copyright. You can report something that’s inappropriately marked as copyrighted, but Google doesn’t act quickly to release those works from copyright jail, because they don’t primarily exist to serve scholars. And Google Books metadata is a huge jumble— particularly for item dates.
  • HathiTrust, which does exist to serve scholars, is much more responsive about releasing works from copyright jail and metadata corrections, when you report errors. Unfortunately for me, only users affiliated with HathiTrust member institutions can download full PDFs of those works, and my home institution isn’t a member.
  • OpenLibrary sometimes has GPO-published items I’m looking for, but their collections are hit-or-miss for these items.

Here’s how I worked around these problems to get what I needed. Maybe this trick will be useful for someone else out there.

To answer my research questions, I wanted the US Children’s Bureau annual reports, roughly 1924-1933— just after the 1923 copyright border. That made everything tougher, since Google Books has only snippet views for all the post-1924 reports, due to mis-administered copyright restrictions. Some of the reports might have been included in the Department of [Commerce and] Labor’s annual reports, which are occasionally less copyright-restricted, but the DCL reports are 600-800 pages each and the Children’s Bureau reports are, at most, about 100 pages.

But wait, there’s a solution.

While poking around, I found a Google Books version of the 1924 report, but it was available only in ePub format, which loses page numbers from the original. To find a better copy with original pagination, I copied a long sentence related to my research: “A colored doctor has been added to the bureau staff, and she is at present assisting the Tennessee Health Department in an investigation and educational campaign among colored midwives of the State.”1

When I pasted that sentence, in quotes, into the Google Books search box, I found three GBooks items which contained it. The first hit was the ePub referenced above; the third hit was copyright-locked; and the second result was a PDF of a bound series volume containing the reports for 1920 and 1923 through 1932.

I downloaded the PDF, then used Acrobat Professional to split it out into each year’s reports and OCR it for searchability. Works like a charm.

The reason this Google Books entry wasn’t copyright-locked was that it’s a library-bound volume, and the first publication in it has a date before the 1923 copyright barrier. I suspect that many bound-pamphlets volumes on Google Books probably have similar metadata errors, which scholars working on the 1920s (and maybe even the 1930s) can use to our advantage.

But if you’re researching the Children’s Bureau, save yourself time.

Only later did I discover that Georgetown’s Maternal and Child Health Library has a nearly complete set of Children’s Bureau publications in PDF, including the Children’s Bureau annual reports. Which, as far as I can tell, don’t show up on Worldcat.

  1. This is about Ionia Rollin Whipper, MD (1873-1952), the Children’s Bureau’s first African-American professional employee. I’m working on a dissertation chapter which explores Whipper’s work promoting birth registration to African-American midwives in the rural South.
5 Responses
  1. October 9, 2011

    From Australia, all of the Google Books links in this post are snippet view only.

    • October 9, 2011

      Interesting. Is the same true for you of this similar item on HathiTrust, USChB reports from the out-of-copyright period? (For me, only one of the items in that HT link is limited-view: the 1922/23 edition.) Similarly, this HT item has a bound multi-issue volume (Google-digitized) that spans 1919/20 to 1935/36 and is full-view for me. HT marks it as “Public Domain in the United States.”

      • October 9, 2011

        Most of those are full view for me, the exceptions being the 1922/23 volume and the multi-year series. Blocking anything with post-1922 content perhaps?

        I have in general found HathiTrust links more useful than Google Books, but overall I have more luck with archive.org or Gutenberg.

  2. October 12, 2011

    From Canada, I think I’m getting the same results as Brett. For the first link, I get full view for everything but 1922/23. In the second link, the 1919-1936 is blocked, as is the volume that’s even later.

    I’ve been doing a lot of work with early 20th century American magazine journalism and I use Hathitrust very heavily for that. Among the big recent scanning projects, only the Google project and its partners seem to have been scanning magazine volumes in larger numbers. And they’ve picked up some fairly high circulation (at the time) magazines that the subscription services like American Periodicals Series don’t appear to have. In Canada, however, quite a bit of this is full view on Hathitrust and only snippeted at Google Books. I’ve actually been planning a post on this and related issues, but it’ll probably be at least a month before I have time to put it all together.

    By the way, the Online Books Page http://onlinebooks.library.upenn.edu/ is often pretty good about indicating what is going to be viewable in which country – at least for the U.S. and Canada. I think they list access status for the U.K. and some other countries, but I haven’t had to occasion to use any of that information.

    • October 12, 2011

      Thanks for this detail, Andrew, and for the information on the Online Books Page, which I hadn’t known about before. I’d be very interested in reading that post when you have time to write it.

Comments are closed.