Random Thoughts

Yes, of course, LLMs are pretty good at programming now, but I’m talking about being able to ask them stuff in general. My use case is the Emacs book handling package — I want to be able to check whether there’s any books from an author I’ve missed. This is an unexpectedly hard thing to determine (click on the preceding link for details).

People have been nattering on about how awesome the latest Claude is, so I had a look today — and I briefly got very enthusiastic. Because I hit m to search for missing books (i.e., books I don’t have) by Georgette Heyer. She’s a good test case, because she’s written so many books that it’s genuinely a tedious job to try to find out this stuff manually — and I know that I’ve got most of her books.

What I do is ask the LLM “list all books by AUTHOR, but exclude these books: “. Results:

Wow! The Transformation of Philip Jettan!? I don’t have that book! *sounds of shopping* I’m not sure why it’s listing the other three, because I have those, and I told Claude so:

And I don’t understand why it listed four books, but then said it didn’t find anything? But this is almost good! Almost useful! So I started writing this blog post, which was going to go “whoho! LLMs rule now! etc!”.

But then I hit m again, just to check, and:

(And so on and on.)

*sigh* It listed all the books — no filtering. OK, m again:

Now it listed just nine books — eight of which were on that list.

So my momentary enthusiasm here was just another FALSE ALARM!

My impression is that LLM fans only have experience with using LLMs interactively — the LLM says a lot of bullshit, and then you correct it, and then it says “oh, you’re right!”, and then these fans get a warm, fuzzy feeling and are really impressed. It’s a great sleight of hand. Using the API this way (where you only give the LLM one try) really lets you see how unreliable these things are.

While I’m here, I might as well check with other models again… Here’s gemini-2.5-flash:

OK, it lists just one book, and one I have. (But not the one I don’t have; the Jettan thing.)

Perplexity:

It lists The Reluctant Widow four times? Well, it includes Philip Jettan, but all the other ones are on the list.

ChatGPT 5.2:

No Jettan, but it has a whole lot of books from the list.

So… nope.

One thing that’s changed since the last time I checked like this is that there are no hallucinations? These lists used to have some mixed in, but I can’t stop any obvious ones, at least.

[Edit: I spoke too soon: