I was asked whether kwakk.info was only for English-language magazines and fanzines about comics… and yes, it was. But there’s no reason why it should be, really. So now it’s not.
Multi language search has its issues, though. Stemming (which is that thing where if you search for “house cats”, you also get pages matching “house cat” and so on) is complicated when doing multi language search, and there’s other details like that. But it doesn’t really matter much, because according to my rigorous statistics, over 98.7% of the searches are proper nouns.
That is, people search for “spider-man” and “gary kwapisz”, and not “house cats”.
The main problem is with the user experience: If you only speak English, and you’re searching for “spider-man”, and the top three hundred matches are from Argentinian magazines, you’re going to be annoyed.
That’s a general problem with these projects. You start off with something that’s very focused, and then you add more and more things, and then the project becomes too noisy to be of use.
So I’m defaulting the search to English language magazines, because I think that’s what most people would prefer. But I’ve added a new drop-down box that lets you choose the languages to include.
Of course, if you’re visiting a page that’s for a specific non-English language magazine, like Les Cahiers de la BD, you don’t have to do anything — it’ll give you the results without explicitly choosing the language.
And while I was futzing around with this, I started thinking about how it might be nice to have some sort of translation service built in at the site, but I’m not sure what would be the best way to implement that. In any case, it would require showing the OCR’d text explicitly, which I hadn’t made available before.
So I did that — there the new “thunderbolt” icon in the top right:
A thunderbolt expresses “show me the text”, right?
Right?
Right.
(One fascinating *ahem* thing about this is that it exposes how bad the OCR is. I’ve tried several systems, but they all have their issues. Some are a bit better than what I’ve used here in general, but completely fail with, for instance white-text-on-black (looking at you Tesseract), or don’t cough up info on where each individual word is in the images, which is needed to do highlighting (looking at you, easyocr).)
*time passes*
Actually… I implemented translation while I was at it.
So here’s a French magazine page…
And the “Translate” icon translates to English. See? Such useful!
Anyway, I’ve added (so far):
Alfonz: Enzyklopädie der Comics,
Alfred (Société Française de Bandes Dessinées),
BoDoï,
Casemate,
Le Collectionneur de Bandes Dessinées,
dBD,
El Wendigo,
Fumo di China,
Hop!,
Période Rouge and
Scarce.
As usual, I’ve tried to not add things that are available commercially, but if any of these shouldn’t be in the search index, let me know.