*closes tabs*
I think I’ve now done the major comics about magazines in French, Spanish, Italian and German for kwakk.info, but I may well have missed some; feel free to leave comments about interesting magazines and fanzines (of any language, really).
In this latest batch we have a handful of mags, but a lot of issues:
Dolmen, but not recent ones, because they’re still for sale at their web site, so only (ahem) #1-300,
Sapristi!, or possibly Saprist!, or possibly Sapristi (which, if I understand correctly, was a “swear word” invented to not be actually offensive, so you see it used all the time in French-language comics from the 50s),
Comicguía,
Zoo le Mag,
Reddition, and
Fumetto.
By the way, since there’s now translation on the site, while falling asleep last night, I wondered how much it would cost to just pre-translate everything so that you could search across all languages in English? Sounds vaguely useful to me, although virtually all searches are for “ogden whitney” and the like.
I just did a very quick calculation: There’s about 300MB worth of non-English OCR’d texts, if I’m counting correctly. The Google Translate API costs $20 per 1M characters, so let’s say 250M characters. That’s… (* 20 250) => $5000.
So… that’s not gonna happen.
I’m kinda surprised that it’s that expensive? That’s about 10x as much as I’d have guessed something like that would cost, really.
But now I’m going on vacation for a couple weeks, so there.
Sometimes those three letters change, though
(Ads from four consecutive issues of LOC magazine.)
Magazines about bandes dessinées, tebeos, fumetti and Comic-Bücher
I was asked whether kwakk.info was only for English-language magazines and fanzines about comics… and yes, it was. But there’s no reason why it should be, really. So now it’s not.
Multi language search has its issues, though. Stemming (which is that thing where if you search for “house cats”, you also get pages matching “house cat” and so on) is complicated when doing multi language search, and there’s other details like that. But it doesn’t really matter much, because according to my rigorous statistics, over 98.7% of the searches are proper nouns.
That is, people search for “spider-man” and “gary kwapisz”, and not “house cats”.
The main problem is with the user experience: If you only speak English, and you’re searching for “spider-man”, and the top three hundred matches are from Argentinian magazines, you’re going to be annoyed.
That’s a general problem with these projects. You start off with something that’s very focused, and then you add more and more things, and then the project becomes too noisy to be of use.
So I’m defaulting the search to English language magazines, because I think that’s what most people would prefer. But I’ve added a new drop-down box that lets you choose the languages to include.
Of course, if you’re visiting a page that’s for a specific non-English language magazine, like Les Cahiers de la BD, you don’t have to do anything — it’ll give you the results without explicitly choosing the language.
And while I was futzing around with this, I started thinking about how it might be nice to have some sort of translation service built in at the site, but I’m not sure what would be the best way to implement that. In any case, it would require showing the OCR’d text explicitly, which I hadn’t made available before.
So I did that — there the new “thunderbolt” icon in the top right:
A thunderbolt expresses “show me the text”, right?
Right?
Right.
(One fascinating *ahem* thing about this is that it exposes how bad the OCR is. I’ve tried several systems, but they all have their issues. Some are a bit better than what I’ve used here in general, but completely fail with, for instance white-text-on-black (looking at you Tesseract), or don’t cough up info on where each individual word is in the images, which is needed to do highlighting (looking at you, easyocr).)
*time passes*
Actually… I implemented translation while I was at it.
So here’s a French magazine page…
And the “Translate” icon translates to English. See? Such useful!
Anyway, I’ve added (so far):
Alfonz: Enzyklopädie der Comics,
Alfred (Société Française de Bandes Dessinées),
BoDoï,
Casemate,
Le Collectionneur de Bandes Dessinées,
dBD,
El Wendigo,
Fumo di China,
Hop!,
Période Rouge and
Scarce.
As usual, I’ve tried to not add things that are available commercially, but if any of these shouldn’t be in the search index, let me know.
Record Shoppin’
Two fun things in the haul this week:
Artaud by Pescado Rabioso, which is famous (probably) because of its shape, but also apparently:
It is considered Spinetta’s masterpiece and one of the most influential albums in Spanish rock.
I got these today, so I’ve only listened to it once, and it seems pretty cool? But the shape is the shapiest!
The other thing I got is this EP by Pile, which I got because the second side is an etching of a drawing by Nicole Rifkin (who posted about it and made me aware that it existed)! Niiiice.
(And the music seems good, too — like the other album, only had a chance to listen to it once…)