I’ve imported my library from
Goodreads
(yet another go at importing all the things!)
I exported my Goodreads data the other day via their Data Request Creation page. The archive you receive is a zip file of zip files of JSON files. The most interesting looking file in there (for my purposes anyway) is activity.json, a timestamped log of events like when you started or finished reading a book. There are two very big problems, though:
- It just contains recent activity. So you can’t use it to publish reading activity from years ago.
- The book information in each event is limited to titles only, which makes doing anything useful with even just the events you do get rather difficult.
Fortunately, Thomas Vander Wal told me about another Goodreads’ data export endpoint: their library Import/Export page, which spits out a simple CSV file. This has author and publisher information, ISBN numbers, etc., etc.
I started to hack on an importer and, as of , it’s in pretty good shape. So I’ve imported my library. I’ve tried to enrich the resulting book pages in a variety of ways. (For books in Project Gutenberg, I link to their copy of the text; I link to the author’s home page or Wikipedia article; same for the publisher; I’ve tagged many of the books for easier discovery; etc.)
Some examples: The Dispossessed: An Ambiguous Utopia, Nevada, Lud-in-the-Mist, Queering the Green: Post-2000 Queer Irish Poetry, Pride and Prejudice, Tipping the Velvet.
I’ve imported (about half of) my Apple Books library as well.
Stable identifiers
I needed to pick a stable identifier to use for books, so that I could give each book a URL. (Different books can have the same titles as one another, after all.) There are several options, including but not limited to:
- Goodread’s own Work ID, which I immediately dismissed—I don’t want my website to be dependent on Goodreads’ continued existence or on the benevolence of a Lex Luthor wannabe, so a proprietary option owned by Jeff Bezos is a non-starter.
- Library of Congress Control Numbers (LCCNs)—but for newer publications from outisde the US, LCCN don’t get issued very promptly, or at all, and Goodreads’ library export doesn’t contain them at all.
- ISBN, in its 10- and 13-digit variants, is a mixed bag. Distinct ISBNs get assigned to each separate publication of the same work, which isn’t great. But even so, ISBNs have some pragmatic advantages: they’re widely deployed, defined by an international standard—ISO 2108—and they’re (usually) present in the Goodreads library CSV, so adopting them requires a lot less work.
- OCLC Control Numbers seem better designed than ISBNs but, like LCCNs, they’re not present in Goodreads’ data. They’re in the public domain, but no standard defines their format, and the organization that issues them isn’t great.
So I’ve gone with ISBNs for now—specifically, the 10-digit variant: it’s older, so when a book only has one ISBN, it’s more often than not a 10-digit one. There are recent publications that only have a 13-digit ISBN—I haven’t decided what to do in such cases yet. There are around a half a dozen books I’ve not yet imported because of this. Apple Books only includes 13-digit ISBNs, so maybe I should have gone with those. Oh well. It wasn’t that hard to track down 10-digit ISBNs for most of them, at least.
Sometimes Goodreads doesn’t know the book’s ISBN, so I used a combination of Wikipedia and ISBN Search to figure them out.