New Zine Orientations

I’ve been fiddling around with how kwakk.info deals with vertical pages. The vast majority of magazine and zine pages are oriented “correctly”, of course, so this is a minor problem — but sometimes pages are printed vertically, and then you have to turn your laptop sideways to read those pages, which isn’t er ideal.

The OCR I’m using now doesn’t actually report page orientation, but it can be computed by looking at the bounding boxes for the lines — if they tend to be mostly vertical, one can make an educated guess. I mean scientific estimation.

So now you get automatic rotations:

It seems to be working OK… This is a pretty marginal problem, though — far fewer than 1% of the pages seem to be vertical. But there you go.

Hm, I wonder whether there are any pages that are upside down!

Yeah, but only a handful.

Anyway.

76% Confidence

I’ve been looking at issues with page rotation with the LLM-based OCR thing, so I’ve been staring at the output a bit more. And I found the puzzling text above, which… er… didn’t seem likely? Anyway, the polygon points to this text:

Yes, it’s very low res, and it’s was set at a 90 degree angle, but…

Poor little LLM. There, there.

It does do a fine job with higher resolution scans, but it seems to break down pretty badly at random like this. For instance, from the same page:

And equally crappy scan:

Them’s the breaks, I guess — the traditional OCR outputs almost nothing but line noise on this page, so…

Low WAF: Emacs & IMDB, Once Again

The other day, I was happily ripping a new bunch of DVDs and blu rays that had arrived (so that I can actually watch them). Afterwards I pointed Emacs at the new directories to add metadata semi-automatically… only to find that the computer just said “no”.

Well, the code works by scraping imdb.com, and the HTML changes once in a while, so that’s no big worry. But looking at what I got, I got pretty puzzled:

HTTP/2 202 
server: CloudFront
date: Mon, 24 Nov 2025 23:58:24 GMT
content-length: 2388
x-amzn-waf-action: challenge
access-control-expose-headers: x-amzn-waf-action
x-cache: Error from cloudfront
via: 1.1 079d0a29fa76c3721f14a4132ec9e372.cloudfront.net (CloudFront)
x-amz-cf-pop: ARN52-P1
alt-svc: h3=":443"; ma=86400
x-amz-cf-id: Q6mGF4pMTpmxiyQJUsZN5m8RuTD9dsU0xpzpENImJVvn7jRzdLkVqA==

And that’s it. Just a HTTP 202, and some headers. But that x-amzn-waf-action… WAF probably doesn’t mean “wife acceptance factor” in this context, right? So I googled, and this is indeed Amazon’s way to fend off horrible people like me that just wants to automate fetching the name of the director of a movie without opening a browser.

Sorry, I misspelled “AI scrapers”.

Fair enough. But how does this actually work, and how trivial is it to work around?

Well, it’s a Javascript challenge. If you say this:

curl -D /tmp/h -H "Accept: text/html" 'https://www.imdb.com/find?q=If+Looks+Could+Kill&ref_=hm_nv_srb_sm'

You get the JS (shown in the image above). So basically, you have to spin up a more or less complete browser to fetch a web page from IMDB now. But *type* *type* *type*, et alors:

Now it works again — I can hit a on a newly ripped movie, and Emacs prompts me for likely matches, and I hit ret, and:

There. Release year, country, director and poster, all scraped. The only difference is that it now takes two seconds to fetch the data instead of half a second. That’s progress for you, I guess.

If you want to look at the resulting trivial code, it’s on Microsoft Github. (It’s just a trivial Selenium script.)

Like sand through the hourglass so are the days of our lives.

Book Club 2025: Not Me by Eileen Myles

I started reading Eileen Myles in 2019 — I’m not quite sure how I happened onto the book, but the first one I read was Chelsea Girls (which I read on my phone), and then I got some poetry collections (which I also read on my phone). I suspect that I may just have found the name of that first book intriguing? I’m always up for reading stuff about New York, and Manhattan in particular.

Anyway, I’ve slowly been reading my way through these books in approx. random order — and now there’s this collection, which is from the 80s.

And it’s pretty great. The first poem here is amazing, I think, and there’s a reason they put it first. But it’s all good. I love the 80s New Yorkiness of it all. It’s funny and it’s direct.

Not Me (1991) by Eileen Myles (buy used, 4.28 on Goodreads)