76% Confidence

I’ve been looking at issues with page rotation with the LLM-based OCR thing, so I’ve been staring at the output a bit more. And I found the puzzling text above, which… er… didn’t seem likely? Anyway, the polygon points to this text:

Yes, it’s very low res, and it’s was set at a 90 degree angle, but…

Poor little LLM. There, there.

It does do a fine job with higher resolution scans, but it seems to break down pretty badly at random like this. For instance, from the same page:

And equally crappy scan:

Them’s the breaks, I guess — the traditional OCR outputs almost nothing but line noise on this page, so…

Low WAF: Emacs & IMDB, Once Again

The other day, I was happily ripping a new bunch of DVDs and blu rays that had arrived (so that I can actually watch them). Afterwards I pointed Emacs at the new directories to add metadata semi-automatically… only to find that the computer just said “no”.

Well, the code works by scraping imdb.com, and the HTML changes once in a while, so that’s no big worry. But looking at what I got, I got pretty puzzled:

HTTP/2 202 
server: CloudFront
date: Mon, 24 Nov 2025 23:58:24 GMT
content-length: 2388
x-amzn-waf-action: challenge
access-control-expose-headers: x-amzn-waf-action
x-cache: Error from cloudfront
via: 1.1 079d0a29fa76c3721f14a4132ec9e372.cloudfront.net (CloudFront)
x-amz-cf-pop: ARN52-P1
alt-svc: h3=":443"; ma=86400
x-amz-cf-id: Q6mGF4pMTpmxiyQJUsZN5m8RuTD9dsU0xpzpENImJVvn7jRzdLkVqA==

And that’s it. Just a HTTP 202, and some headers. But that x-amzn-waf-action… WAF probably doesn’t mean “wife acceptance factor” in this context, right? So I googled, and this is indeed Amazon’s way to fend off horrible people like me that just wants to automate fetching the name of the director of a movie without opening a browser.

Sorry, I misspelled “AI scrapers”.

Fair enough. But how does this actually work, and how trivial is it to work around?

Well, it’s a Javascript challenge. If you say this:

curl -D /tmp/h -H "Accept: text/html" 'https://www.imdb.com/find?q=If+Looks+Could+Kill&ref_=hm_nv_srb_sm'

You get the JS (shown in the image above). So basically, you have to spin up a more or less complete browser to fetch a web page from IMDB now. But *type* *type* *type*, et alors:

Now it works again — I can hit a on a newly ripped movie, and Emacs prompts me for likely matches, and I hit ret, and:

There. Release year, country, director and poster, all scraped. The only difference is that it now takes two seconds to fetch the data instead of half a second. That’s progress for you, I guess.

If you want to look at the resulting trivial code, it’s on Microsoft Github. (It’s just a trivial Selenium script.)

Like sand through the hourglass so are the days of our lives.

Book Club 2025: Not Me by Eileen Myles

I started reading Eileen Myles in 2019 — I’m not quite sure how I happened onto the book, but the first one I read was Chelsea Girls (which I read on my phone), and then I got some poetry collections (which I also read on my phone). I suspect that I may just have found the name of that first book intriguing? I’m always up for reading stuff about New York, and Manhattan in particular.

Anyway, I’ve slowly been reading my way through these books in approx. random order — and now there’s this collection, which is from the 80s.

And it’s pretty great. The first poem here is amazing, I think, and there’s a reason they put it first. But it’s all good. I love the 80s New Yorkiness of it all. It’s funny and it’s direct.

Not Me (1991) by Eileen Myles (buy used, 4.28 on Goodreads)

Random Comics

Here’s some comics I’ve read over the past… three weeks? Yes, I’ve really been slacking on my comics reading.

I’m learning French, so I’ve been buying masses of French comics… and then not actually reading them, because it’s hard. But I thought I should get my fesses in a gear and just get into it.

Fuck ze tourists by Maltaite & Zidrou is a pretty amusing look at mass tourism. It’s a collection of mostly three page storylets (featuring recurring tourists), and while it’s not exactly well-observed, it’s funny.

Here’s what tourism in Barcelona will look like in years to come, for instance. Seems likely.

This Yuichi Yokoyama book is fantastic — as a physical object, it’s just perfect.

As usual, it’s totally propulsive and engrossing.

It’s a collection of short vignettes, though, so there not that development… but it’s still exhilarating.

They explain that the last piece (the title piece) hasn’t been translated, because it’s not necessary — these are burning sounds.

And indeed, so it is.

Ace.

I bought this in Paris not realising that it was a translation from an American comic — Monsters was published more than a decade ago, but I hadn’t read it.

And… it’s very 90s autobio. It’s about the author having Herpes.

And while reading it, I was wondering whether Ken Dahl was taking the piss? It’s about the Total Angst that he has; feeling like a leper or something. But… like… everybody has Herpes, don’t they? So it was a confusing read — I was starting to wonder whether it was a satire on the form, or whether it was a metaphor for something or…

But then towards the end, his new girlfriend tells him that everybody has Herpes, so er like.

Wat.

But other than that, Dahl is a very talented artist, at least — the various horrifying Herpes sore drawings are amazing.

Reading this book was so easy for me that I started wondering whether I should read other comics that had been translated to French. The hardest part about reading French comics is how much slang there is in many of them, and plays on words, und so weiter. Perhaps translators use more formal French that’s easier to read? So to test my hypothesis, I next read this:

The first Corto Maltese book. It was originally written in Italian, so…

And indeed! I can read most of these pages without help from Google Translate! It’s a pretty wordy adventure, though, so it took me some days to get through it… my brain shuts down after reading French for more than an hour.

I’ve read this many times before in various translations, of course, but this is the first time I’ve read a version in colour. And the colouring is sensitive and well done, but I still prefer the original black and white.

It’s a lovely book — it’s the longest of the Corto Maltese albums, I think? And definitely not the best, but it’s still fantastic.

I got this from here.

It’s a huge newspaperish thing…

… with some comics, but like… eh. I wasn’t very taken with it.

I guess many things are sourced from “found items”? And there’s a long discussion with ChatGPT, and friends don’t make friends read LLM-generated text… but the thing seems to go out of its way to make things hard on the reader — printing things upside down, chopped up, and whatever — and while that can work, you have to instil a confidence in the reader that it’s going to be worth the work. And I had no such belief at all, so I started skipping toot de suite.

I had some problems with The Customer Is Always Wrong by Mimi Pond some years ago — I found it to be a pretty messy read.

This biography of the Mitford Sisters has the opposite problem — it reads without any resistance at all. It’s like listening to a voice-over on a documentary while images flutter endlessly to keep the viewer engaged.

And I do like the artwork, but I absolutely loathe that genre. And I have no interest in these Mitford sisters, so I ditched the book after 70 pages.

I’m sure it’ll end up on everybody’s Best Of list of 2025, and I congratulate Drawn & Quarterly on another hit.

And that’s it.

Book Club 2025: A Murder in Mayfair by Robert Barnard

What a horrible cover design!

Anyway, as usual with Barnard, this is a somewhat unusual mystery. And unusually for Barnard, he keeps the bloviating under control, so things meander along quite nicely.

The solution to the mystery, though, leaves more than a bit to be desired. It’s not a cheat, exactly, but… just not terribly exciting.

OK, next I should read something less mysteryish.

A Murder in Mayfair (1999) by Robert Barnard (buy new, buy used, 3.54 on Goodreads)