Close encounters of the jazz kind.
You learn something new every day
Downloading PDFs from Google Drive
While spelunking the web to look for magazines about comics to add to kwakk.info, I happened upon a site that linked to a lot of PDFs that were stored on Google Drive.
Then I noticed that these files were “protected” — this means that there’s no download button on Google Drive. So I thought “well, I won’t be adding these to the search engine, then, because the person who put them up here obviously wouldn’t want that”.
But I was also curios to see whether, you know… I could download them? Just curious! Because anything that’s available for a human eye to see can be downloaded, of course.
So I searched a bit, and I found at least a dozen web sites that had some variation on this theme: You go to the “protected” PDF, then open the Developer Console (i.e., F12), and then you paste in three screenfuls worth of Javascript, and then you have the PDF.
Or rather, what you have is a file like this:
But that’s fine… easy enough to extract from that. And this isn’t the “original PDF” — it’s a series of images as rendered by the web browser, so there’s some quality loss, but whatever.
Except that it didn’t work if you had a long PDF, because the JS in question created a ginormous string that contained all the image data, and the Javascript engines have max string lengths.
And besides, I thought it was yucky that you have to be all manual and stuff. So… I reached for Selenium!
I’ve used Selenium before, and I’ve used the Python interface to it. So I tried that on this laptop, and it crapped out immediately with a bunch of incomprehensible error messages. I tried searching for what they meant, and got a gazillion different answers depending on whether it was for version 103.0.001 or 103.1.004. As is usual when I encounter a Python library, I just gave up.
(What is it with Python library maintainers, anyway? Why is there zero interface stability? Is the culture in Python quarters to always be on the edge, bleeding?)
Instead I said:
npm install selenium-webdriver
And started typing away in the Node version of Selenium instead. I haven’t used Node in, what, a decade? Probably something like that, but after five minutes of scratching my head, I had it popping up a Firefox window and I was away.
The result is now on Microsoft Github. It uses basically the same technique as the link above, but is more, er, automated.
The interface is:
node drivedown.js URL
This saves the PDF as a directory of PNG files like ~/Download/drivedown/page-001.png etc, so that you can post-process that into the format of your choice, like PDF of CBR/CBZ.
Fun detail: This doesn’t work in headless mode, or if the screen is sleeping, because Google Drive checks whether the browser is actually viewing the document before it deigns to render anything. So if you’re downloading a lot, expect to have Firefox pop up windows on your screen a lot.
There’s tons of legitimate use cases for this, like when, er, you have protected PDF file, and you’ve er forgotten your Google password. Yeah! That’s the ticket!
Or you want to print the PDF out, because you want to have it on paper, because that’s disabled in “protected” PDFs, too.
This is the last one
This is literally part forty nine in a series of blog posts where I say that I’m now done adding things to kwakk.info, the comics research site.
Literally.
But this time I’m done for real, because:
That surely has to be enough.
My main problem is that I keep coming up with new avenues to identify comics mags when I go to sleep. I tried all the LLMs, asking them basically “do you know other magazines about comics other than the ones on this list” and then list the 181 titles already at Mrs. Kwakk’s house. And that did indeed yield a handful of titles, but mostly hallucinations.
But a couple of days ago I remembered that ebay exists, so I’ve been idly paging through this never ending list of magazines and fanzines, and that’s been quite er fruitful.
For instance, the Four Color Magazine — “For Comic Book Connoisseurs”? Heard of it? The LLMs hadn’t. It was a glossy mid-80s mag that only lasted a handful of issues, but it looks interesting.
And did you know that there was a circus fanzine back in the 50s? No? Now you know. But I digress.
And Comic Talk? A short-lived mid-90s mag.
And then there’s combo, a mid-90s glossy that lasted for about 30 issues…
… and even has a fan site dedicated to it. (The site looks dead-ish, though — none of the images on the site seem to work.)
And I was only able to find a single scanned issue at the places where they have scanned issues, so… Er… So there?
Fanfare looks kinda sorta relevant, but not hugely…
Anyway, now I’m done, so here’s the list of mags and fanzines and indices and promo stuff:
Advance Comics, Advanced Iron, Alter Ego v1, Amazing Heroes, aka, Amazing World of DC Comics, Arena Magazine, The Barks Collector, Batmania, BEM: The Comics News Fanzine, Borderline, Cartoonist PROfiles, Cascade Comix Monthly, Comic Book Marketplace, Comic Book Profiles, Comic-Con Magazine, Comic Heroes, The Comic Reader, Comic Shop News, Comic Talk, The Comic Times/Media Showcase, Comico Attraction, The Comics Buyer’s Guide, The Comics Buyer’s Guide Price Guide, Comics Collector, Comics Fandom Monthly, Comics Fandom Quarterly, Comics Feature, The Comics File, Comics Interview, Comics International, The Comics Journal, Comics Scene, Comics Source, Comics: The Golden Age, Comics Values Monthly, Comics World, Comixscene, Comix Wave, Charlton Bullseye, Charlton Spotlight, Dark Horse Insider, DC Coming Attractions, DC Nation, DC Releases, Diamond Previews, Diamond Previews Adult, Direct Currents, Ditkomania, Eclipse Extra, Factsheet Five, Fantaco Chronicles, Fantastic Fanzine, Fantasy Advertiser, Focus On…, Following Cerebus, Funnyworld, Graphic Story Magazine, Graphic Story World, FOOM, From The Tomb, Hero Illustrated, The Heroine Addict, Illustration Magazine, Indy Magazine, Inside Comics, Inside Image, Jack Kirby Quarterly, The Journal of Graphic Novels and Comics, K-A CAPA alpha, LOC: Fandom’s Forum, Marvel Age, Marvel Comics Index, Marvel Vision, Marvel Previews, Marvel: The Year in Review, Marvelmania, Mediascene, Near Mint, Nemo: The Classic Comics Library, Newfangles, Overstreet’s Comic Book Marketplace/Monthly, Overstreet’s Comic Book Price Guide, Overstreet’s FAN, Overstreet’s Price Update, Pacesetter: The George Perez Magazine, Protoculture Addicts, Rocket’s Blast & Comicollector, Squa Tront, The Telegraph Wire, WAP: Words & Pictures, Wizard Magazine, Xero, Spooky, The Warren Fanzine, The Comic Press Volume 2, Comic Crusader, The Golden Age, Comicology Fan Review, Illustrated Comic Collector’s Handbook, Horror from the Crypt of Fear, EC Fan Addict, Manga Newswatch, Wallace Wood Treasury, GASP!, Art Show, Fantastic Exploits, Nostalgia News, The Marvel Art Review, The Illustrated Comic Collector’s Handbook, Pittsburgh Fan Forum, The O’Neil Observer, Comic Art Convention, Comicscape, Wowee Kazowie!, The Journal of MODOK Studies, Two Decades of Comics – A Review, Fandom’s Agent, Mad World Of Marveldom, Comicon, George Perez Newsletter, Masquerader, Superheroes Confidential, Blue Boy Chronicles, The Marvel Art Review, The Fan’s Zine, Informative on Comics, The Wonderful World of Marvel, George Perez: Accent On The First E, The Collector, Marvel Chronicles, Seduction of the Innocent, MXE, Fanboy! Journal of Comics Fandom, Fandom Annual, Omniverse, Vault of Mindless Fellowship, Panels, Weirdom Volume 2, Etcetera Volume 1, Etcetera Volume 2, Silhuoette, 1986, The Wonderful World of Comix!, Guts, Legion Outpost, Legion Outpost II, The Comic World, The Duckburg Times, Comics Price Quotes, The Golden Age Comic Books Index, The Standard Catalog of Comic Books, 1000 Comics You Must Read, The Official Underground and Newave Comix Price Guide, Mile High Futures, Collectors Guide: The First Heroic Age, Comics & Gaming, The Monthly Crisis, Fangraphix, Crash, Collector’s Dream, The Imp, Comic Foundry Magazine, The Comics Grid: Journal of Comics Scholarship, The Charlton Comic Book Guide, The Will Eisner Companion, Combo, Fantacon Magazine, Fanfare, Four Color Magazine, Bud Plant Catalog, Kitchen Sink Pipeline, Fantagraphics Catalog, Marvel: Five Fabulous Decades Of The World’s Greatest Comics, Malibu Sunspots and SelfMadeHero Catalogue.