I’ve been looking at issues with page rotation with the LLM-based OCR thing, so I’ve been staring at the output a bit more. And I found the puzzling text above, which… er… didn’t seem likely? Anyway, the polygon points to this text:
Yes, it’s very low res, and it’s was set at a 90 degree angle, but…
Poor little LLM. There, there.
It does do a fine job with higher resolution scans, but it seems to break down pretty badly at random like this. For instance, from the same page:
And equally crappy scan:
Them’s the breaks, I guess — the traditional OCR outputs almost nothing but line noise on this page, so…

