Are LLMs finally becoming actually useful for… stuff?
Yes, of course, LLMs are pretty good at programming now, but I’m talking about being able to ask them stuff in general. My use case is the Emacs book handling package — I want to be able to check whether there’s any books from an author I’ve missed. This is an unexpectedly hard thing to determine (click on the preceding link for details).
People have been nattering on about how awesome the latest Claude is, so I had a look today — and I briefly got very enthusiastic. Because I hit m to search for missing books (i.e., books I don’t have) by Georgette Heyer. She’s a good test case, because she’s written so many books that it’s genuinely a tedious job to try to find out this stuff manually — and I know that I’ve got most of her books.
What I do is ask the LLM “list all books by AUTHOR, but exclude these books: “. Results:
Wow! The Transformation of Philip Jettan!? I don’t have that book! *sounds of shopping* I’m not sure why it’s listing the other three, because I have those, and I told Claude so:
And I don’t understand why it listed four books, but then said it didn’t find anything? But this is almost good! Almost useful! So I started writing this blog post, which was going to go “whoho! LLMs rule now! etc!”.
But then I hit m again, just to check, and:
(And so on and on.)
*sigh* It listed all the books — no filtering. OK, m again:
Now it listed just nine books — eight of which were on that list.
So my momentary enthusiasm here was just another FALSE ALARM!
My impression is that LLM fans only have experience with using LLMs interactively — the LLM says a lot of bullshit, and then you correct it, and then it says “oh, you’re right!”, and then these fans get a warm, fuzzy feeling and are really impressed. It’s a great sleight of hand. Using the API this way (where you only give the LLM one try) really lets you see how unreliable these things are.
While I’m here, I might as well check with other models again… Here’s gemini-2.5-flash:
OK, it lists just one book, and one I have. (But not the one I don’t have; the Jettan thing.)
Perplexity:
It lists The Reluctant Widow four times? Well, it includes Philip Jettan, but all the other ones are on the list.
ChatGPT 5.2:
No Jettan, but it has a whole lot of books from the list.
So… nope.
One thing that’s changed since the last time I checked like this is that there are no hallucinations? These lists used to have some mixed in, but I can’t stop any obvious ones, at least.
[Edit: I spoke too soon:
I asked for missing books by Jasper Fforde, and this showed up — it doesn’t exist; that’s just a character from one of the novels.]
I guess I’ll have to check again in six months to see whether things have improved.
Somewhat Bemusing Similes (Part IX)
I may have fallen slightly behind on my magazine reading
Are all books on Goodreads 3.59?
Whenever I look up a book on Goodreads, it feels like I see the same number every time. No matter whether the book was awful or awesome, the Goodreads rating apparently remains stubbornly the same.
Or is that just my memory playing tricks on me?
I read a lot of books this year, and I had Emacs record the Goodreads rating for every book. So now I have data! Behold:
OK, my memory shouldn’t be relied on, but it wasn’t that far off — the mean rating of the books I read was 3.86, but the spread is pretty small — 90% of the books are between 3.4 and 4.5.
It seems like this is a smaller range than on imdb, but perhaps that’s more to do with the range on imdb being 9 and it being 4 on Goodreads.
Let’s math it… If you have a 90% spread of 1.1 on Goodreads, you’d expect a 90% spread on imdb to be 2.5, while it’s… 3, according to Google. Well, that’s not really a huge difference.
But you almost never see ratings charts like this on Goodreads. Because you rarely have brigading going on there, while on imdb it’s pretty common — whenever a movie goes viral for being soooo bad (in one forum or another), you have all these morons going on imdb and rating pretty mid movies “1”. (Or “10” in the opposite case.)
You almost never see a book on Goodreads that has a different shape than this: More than 60% of the votes are going to be 3 or 4 stars.
But do ratings matter? Well, I’ve found that an imdb rating of 6 or less is a pretty solid indication of the movie probably being naff. But so is a rating of 7.5 and up — then it’s either been brigaded, or it’s some mid movie that nerds are totally into. So an imdb rating of around 6.3 is usually a good indicator of the movie being spiffy.
Before I started the book blogging project last year, I assumed that the same would be the case for Goodreads ratings, but… not really? Yes, you have the same nerdiness effect:
Fantasy books, for instance, have a way too high rating. Most of these books were totally mid, but almost all of them have a rating above 4. I.e., fantasy readers have pretty bad taste.
I mean, they’re really enthusiastic about their hobby.
Science fiction readers are similarly enthusiastic, but not to the same degree.
Literature readers, on the other hand, are more realistic — most books are kinda mid.
And perhaps a bit surprisingly — mystery readers are also pretty realistic in their assessments.
OK, these data sets are pretty small, of course, so perhaps I’ve just chosen Totally Fantastic fantasy books to read, and Pretty Mid mystery books? It’s possible, of course. But my conclusion from this is that you should subtract one point from fantasy book ratings, and half a point from science fiction books. If you want a more realistic score.
But overall, I found myself agreeing with the Goodreads score more often than I thought I would. I guess I’m not as contrarian when it comes to books as I thought I am.
Let’s see… can I torture this insignificant data set some more?
That’s how many books per genre I read (or skipped; I dropped 15 of these books mid-reading). I thought the literature/junk ration would be lower…
And I didn’t think my recency bias (heh heh) was that bad, but there you go.




















