Site icon Random Thoughts

OpenLibrary, LibraryThing, Books and Emacs

A commenter on my previous post about this stuff suggested using LibraryThing to deduplicate editions, so I thought I’d give it a go. I’m using Amy Hempel as the test case, because she’s only published a handful of books.

Or as OpenLibrary says: 27.

Let’s have a look at, say, the collection from 2006:

The documentation says that it’s supposed to return a list of “works”, not editions, but of course the data here doesn’t have much quality control. So here we have "The Collected Stories of Amy Hempel", "The collected stories of Amy Hempel", and finally "Collected Stories of Amy Hempel". For these, we have in total three different ISBNs (ISBN-10 and ISBN-13 are both listed in the output, apparently).

Let’s look up one of these in the LibraryThing API:

And actually… it looks like LibraryThing does ISBN-10 only? But it kinda looks like the LibraryThing de-duplication would work for that book. So that’s promising, even if it means doing a whole bunch of calls to LibraryThing.

Hm… Oh, OpenLibrary also lists this:

Which is a translated edition, but would also have gotten caught by the LibraryThing de-duplication. OK, I think I’m going to code up something and see what I get.

type type type

Viola!

That’s actually a pretty good list! OpenLibrary returned 27 publications, and after 17 LibraryThing API calls, we’re down to 12 works.

(There’s only five-ish of these that are “books by Amy Hempel” by any reasonable measure, but the rest are chapbooks, collections and collaborations, so it’s OK that they’re listed.)

Now, the number of LibraryThing API calls would make it pretty abusive to use this on a more prolific author, but as a proof of concept, it works.

(LibraryThing publishes data dumps of all this stuff, which would be more sensible to use, but that apparently costs $$$.)

So… uhm… could I use this to get “give all books published by the fifty authors I follow published since 2023”? I think that would be possible: For each author, ask OpenLibrary for the list, and then for each book published after 2023, do the LibraryThing deduplication to see whether it’s a book that also appears earlier on the OpenLibrary list? Yeah, I think that could work, but I guess the proof is in the programming.

Anyway! Since I’m blathering on about this here, amusingly enough the previous post landed on “Hacker News”. But it didn’t get enough points to get high enough on their home to totally ruin my statistics chart:

The last time Hacker News happened, there were so many visits that the “normal” days were just a single line of green pixels at the bottom, making the chart useless. I know, I know, I should use a logarithmic chart, but I just don’t like those.

(Or a discontinuous chart, even.)

There’s the expected interactions on Hacker News:

I like the idea of adding trigger warnings on non-hype LLM articles.

But also some useful stuff. For instance, there’s a real Emacs library to interact with the LLMs, so I didn’t really have to write my own shims. But it was trivial code (in my case; that project linked there looks quite ambitious), so whatevs.

And:

There’s an app that’s had to deal with the same issues.

So there you go.

Exit mobile version