I’ve been blogging about 80s comics lately, and a frustrating thing is how little information from that era that’s readily available when trying to do research.
Fortunately, The Comics Journal people have scanned all the issues of that excellent magazine… but its focus is more on, er, good comics, which leaves a lot undocumented.
But there’s Amazing Heroes, which was also published by Fantagraphics, but was a transparent cash grab servicing mainstream comics fans. So they interview mainstream people and do reviews and stuff.
Not to mention hero histories. Who can live without a recap of Son of Satan’s biography?
Anyway, this isn’t available anywhere. Not even on torrent sites. So I started wondering how much work it would be to just scan the issues myself, and then run the scans through some OCR software.
I considered getting a sheet feeding scanner and just unstapling all the issues… But my experience with cut feed scanners isn’t all that reassuring: If the pages are flimsy (and Amazing Heroes is all on newsprint) or slightly crumbled (and these are 30-40 year old issues) the scanner will often just jam up, which isn’t fun. And, besides, cutting up the issues seems so… dramatic..
So I got an Epson Workforce DS-50000 A3+ scanner, because extensive research (i.e., a three minute Google session) said that it could scan a 300DPI A3+ page in 4 seconds. And it’s supposedly supported in Linux.
And this all turns out to be true. It takes 3.5s from when I hit “enter” to when I have a double page spread in my Emacs. (I put the Emacs package on Microsoft Github.) My throughput seems to be around ten minutes per issue, and I can do this while getting drunk and watching TV, so it’s more relaxing than knitting, really.
See? So much fun! Just multiply by infinity.
(And… geez… I should dust the lens on that camera…)
Splitting the pages and post-processing the images and running them through the online ocr.space API (Emacs package on Microsoft Github) is done as a batch job later, so I don’t have to sit and wait for that to happen.
The ocr.space output’s nice: You get a JSON file that contains all the words and their pixel positions on each page, so doing highlights on the pages is really easy.
Such a brutal aesthetic, huh?! I have to admit that I made it while I was home sick, and I had a fever, so the design came to me in a nightmare.
I’m using Xapian as the search engine. It’s fast and nice and flexible.
Now, I don’t have the rights to Amazing Heroes. On the other hand, Google Books scans everything and they seem to survive, so… But I made a very conscious choice to make the kwakk.info (named after Mrs. Kwakk-Wakk, the George Herrimann busybody character from Krazy Kat) reader-unfriendly. You’re presented with the search results, and when you click on a result, you get the matching page and four surrounding pages. But no way to carry on reading. (And, no, you can’t just increase a counter to get the next page(s); I’m using High Security Encryption Techniques because I’m just evil that way.) The only way to get more pages is to search for something from the last page and then keep on that way, but that’s very annoying and horrible.
(I guess the web site isn’t very mobile friendly, but that’s not on purpose; just me being lazy.)
I sent an email to the Fantagraphics people and got a response that it’s been sent on to whoever’s in change of this stuff, but didn’t get much of a response, which is fair enough. I offered to give all the scans to Fantagraphics so that they can put them under a paywall and a more reader-friendly interface (as they have with The Comics Journal so that the search engine would be a funnel to get people to buy into that paywalled interface), but I guess they don’t give a fuck. Which is fine.
The offer still stands! Lots of nice high-res .pngs and lots of OCR’d text for you to use, Fanta peeps! You can get the issues in PDF format if you want! Just let me know!
Amazing Heroes is an endless stream of things you just have to know!
I hope this will be a useful resource for people to dig into 80s American comics history. And the web site itself is pretty flexible: If you have any other comics magazines that you have scanned (and you have the rights to put up on the net), I could add other sub-sites to the search engine interface, I think.