Skip to main content

Theresa O’Connor / Treasa Ní Chonchúir

Phase one of importing all of my data from

Goodreads

(yet another go at )

exported my Goodreads data the other day via their Data Request Creation page. The archive you receive is a zip file of zip files; each of the inner zip files contains a JSON file. The most interesting file in there (for my purposes anyway) is activity.json, a timestamped log of events like when you started or finished reading a book.

Unfortunately, the book information in each event is limited to titles only, which makes doing anything useful with the events rather difficult. Fortunately, Thomas Vander Wal told me about another Goodreads’ data export endpoint: their library Import/Export page, which outputs a CSV file of your library. This has author and publisher information, ISBN numbers, etc., etc.

So I’ve broken up the effort to import my Goodreads data into two phases. In phase one, completed , I’ve imported my library from that CSV file. Phase two (which I will get to at some point) will involve importing events from activity.json, linking each to the relevant library entry.

Stable identifiers

I needed to pick a kind of stable identifier to use for books, so that I could give each book a URL. (Different books can have the same titles as one another, after all.) There are several options, including but not limited to:

I immediately dismissed the use of Goodreads’ Work IDs, since they’re propietary—in the long term, I don’t want my website to be dependent on Goodreads’ continued existence.

Overall, OCNs seem better designed than ISBNs, especially for works published before ISBNs were widley deployed, but ISBNs have at least one pragmatic advantage: they’re already present in the Goodreads library CSV, so adopting them requires a lot less work.

I somewhat arbitrarily went with the 10-digit variant.