C’mon, Jimmy: Yet another tale of LLM disappointment

Yesterday, there was an intriguing announcement about an inference card that’s, er, basically 100x faster than ChatGPT. As a hackernews said:

I often remind people two orders of quantitative change is a qualitative change.

Which is true — speed matters. So I went to check out chat Jimmy, and gave it my standard LLM question:

And started looking for those books… which mostly were hallucinated. Very 2023. When I finally scrolled down to ask “er, what?”, I saw:

Oh.

Oh.

OK, I get it — it’s llama 8B, and it has no particular knowledge of anything, so all it can do is output fantasies. You need to hook it up to something that knows something about the area you’re interested in to get it to do something useful. And it is indeed very fast:

So this is not really interesting in and of itself — this is the expected result, but it just lines up with, basically, every single time I’ve tried to use an LLM for something useful after reading some hype. My previous attempt was yesterday, when I hooked up an LLM to a RAG that had ingested 1 million fanzine pages about comics, and the results were pretty unimpressive — it’s no more than a really bad search engine that’s able to form English-sounding sentences.

It’s just… I don’t think I’ve ever seen such a disconnect between the relentless hype — not just on Twitter, but in the press, and from every CEO of every major company: LLMs aren’t just going to be transformative in the future, they’re absolutely essential to use now, right now, at this very minute… and then I try to use them for something, and what I end up with is basically an unreliable toy. And one that’s very expensive to use.

(I had one LLM success story I used to tell so that I wouldn’t sound like a luddite fighting against windmills: “I had a 200 line Javascript package that used jQuery, and I asked ChatGPT to rewrite it to use standard Javascript, and it worked on the first try!” But then, after being in production for some weeks, I noticed that I wasn’t getting one particular (rarer) event… and looked at the code, and saw that it mostly worked (for the main event) by accident, and not at all for the side case.)

(OK, OK, LLMs are really good for OCR, I’ll give them that…)

There is pushback like the above, but it’s still mind-blowing how people are letting hucksters like this set the agenda:

People are buying Mac Minis so that they can run something that queries the Anthropic API? Whyyye!? (And if they’re running local models, they need a beefier GPU.) But it makes total sense that it’s the people who lost their money on NFTs that are spearheading this revolution.

At least we get some comedy out of all of this.

It’s not unlikely that these technologies will be actually useful and more reliable at some point in the future. Q1 2026 is not that time, and there is still no hurry to jump on any bandwagon: Experimenting with it now is a waste of time and money.

There’s no hurry. You’re not missing out on anything. Relax.

If (or when) this technology matures, you can start using it then. Isn’t that the hook here? This stuff is going to be so intelligent that you don’t have to know anything about anything? Right? So you don’t have to start learning now.

The Further Adventures of KwakkBot

Yesterday, I started messing around with a chatbot based on magazines about comics, and today I’m experimenting a bit more to see whether anything interesting… er… happens?

So I’m using Qwen/Qwen2.5-7B-Instruct, and as expected, the results are of the LLM quality we’ve all come to expect: “Never been married…” “They annulled Superman and Lois Lane’s Marriage”…

This query was done after just incorporating the English-language (and American-language) magazines. After doing the rest of the world (which increases the corpus by 80%):

That’s more useful. (Keep in mind that the “knowledge cut off” in the collection of comics magazines around 1999 — it has few new magazines.)

With 14B, there’s not much difference, really…

And, of course, the most burning question on everybody’s mind…

Well, that’s better than the response yesterday.

OK, it’s a leading question…

So that’s all bullshit. Nice that LLMs still are kinda stupid, but I guess it’s all going to be fixed in the next version, right?

Meanwhile, the ChatGPT window that has been guiding me through the LLM installation and corpus ingestion has gotten so slow that it’s not usable any more.

So this has been a problem for many years, and OpenAI’s brilliant engineers haven’t fixed it. The problem is all client side — it updates too much of the DOM tree when inserting new text.

So how do brilliant prompt engineers deal with this problem?

Asking a question and then reloading the page seems to work, but man.

Mistral-Nemo-Instruct-2407.

Llama-3.1-8B-Instruct seems to be the best of them?

But very chatty.

So there you go… or rather, you don’t, because I’m not making this publicly available.

Well, we’ll see whether this’ll actually be useful when blogging about comics, which is my use case here.

KwakkBot

I know that LLM usage is controversial, especially in comics circles. And of course it is — generative AI is a plagiarism machine, and people trying to use it for art are ridiculous.

But on the other hand, LLMs are somewhat useful for trying to zoom in on non-clear data. Of course, most of what you get out is nonsense, still, but for data that has never been systematised, you can use it to find where you’re supposed to read, at least.

So here I am with kwakk.info, which has about a million pages from fanzines and magazines about comics. And the search index is indeed very useful, and what I use all the time to see what people thought about various comics. But it’s pretty much useless for tasks like “I want a list of comics-related fanzines from the 60s, ranked by popularity”.

I wondered whether I could point an LLM at my data set and make it cough up interesting things like that, and so…

… I asked an LLM what to do.

GOOD!? Anyway, I spent an hour or so being the go-between, and:

Now it’s gonna be working for 17 hours, ingesting all those pages.

I can really understand now why LLM Agents have gotten so popular — it feels pretty stupid, sitting here cut and pasting between the LLM and the console. “Why can’t it just do that itself!?” And then you end up with your root file system gone and a very contrite LLM.

100% GPU utilisation! That’s what we like to see.

So after letting it cook away (it’s cold here, so I need the heat), and I then asked it: “What’s Superman’s height?”, and:

Wow! It works!

(For the technically minded: This is Qwen/Qwen2.5-7B-Instruct with Qdrant as the vector DB, the embedding model is BAAI/bge-m3, and the GPU is an RTX 5000 Ada.)

So I spent half an hour and made a web interface, and:

Opinions vary about who’s stronger of Hulk and Superman.

The LLM is set up to answer questions only based on what’s in this magazine/fanzine data set, to post references to its claims, and not give an answer if it doesn’t know.

But here’s the real test — something you can’t just search for, but need actual analysis to give an answer to: “What’s the beef between Harlan Ellison and Gary Groth”. Behold!

That answer is, of course, 100% nonsense, proving that this LLM works just as well as all other LLMs:

Additionally, [6] provides insight into the tension between Ellison and Groth, where Groth is criticized for editing the “Harlan Ellison Letters” section in a manner that Ellison found offensive, using derogatory terms like “gimp” and “feep.” Ellison felt that these terms were inappropriate and that Groth was unfairly targeting him.

Where’s that from?

Oh, it’s from a page that talks both about Harlan Ellison and John Byrne, who sounds like an asshole.

So there you go — an LLM trained on magazines and fanzines about comics. But I don’t think I’ll make a public interface for this — it takes about ten seconds to give an answer, and as you can see, the answers aren’t very good. I mean, it’s fine for simple questions:

But everybody knows that already, so…

First book written by J M DeMatteis?

Jack Kirby?

Well… OK… this does actually look kinda handy.

Actually, now I’m changing my mind! This is nice.

OK, I’m convinced! We’ve achieved general AI!?

It’s very modest…

I think… all of that is true?

Anyway. Setting this up was kinda fun, even if I’m never putting a public interface up for this thing.

Time to update my LinkedIn to say “AI Expert”.

A little collection of SVG tricks for Emacs

Over the last year or so I’ve ended up with various SVG functions that I think may be of some general utility, so I’ve separated them out into its own file and put it on Microsoft Github.

  • svg-opacity-gradient
  • svg-outline
  • svg-multi-line-text
  • svg-smooth-line

Some are pretty involved, but most are pretty straightforward. Of course, one can add an infinite number of these, and the more flexible the functions are, the more you can just end up “open-coding” them for your specific project, but I think these have some utility, so there you go.