IMDB in Emacs; or, Honey, I Made an ORM

I’ve always been frustrated by the IMDB web pages. Usually when I’m looking at a director’s oeuvre, I’m not interested in all the shorts, videos and games the director has created, but just want a list of the movies.

When I’m looking at a specific movie, it’s often because I want to know who the character on the screen I’m watching is being played by, but the images are so tiny and low res that it’s impossible to guess who’s who.

And once I found myself wondering, “what’s the name of that actor that played in that film with her and her”, and I know who the latter two are, but not what film, so it would be nice to be able to do a cross reference kind of thing… right? RIGHT!?

OK, the last thing happened only once, but it was still something that I thought might perhaps have been vaguely useful. But, of course, I didn’t do anything about all this because well you know.

I did download the IMDB data set and do various things with it (but mostly based on grepping and ad-hoc searches), but that came to a complete stop when IMDB revamped their data exports. Instead of semi-human semi-readable files, it’s now basically a database dump. No longer can you just grep for “Isaach De Bankolé”, because in the new files, you first have to find his ID, and then you can grep for that, but that just gives you movie IDs, and…

Long story short, I made an Emacs mode for looking at the new-format data.

It’s based on this sqlite3 module for Emacs. My first attempt was based on just storing all the data in hash tables in Emacs, because I’ve got lots of RAM. I thought that since I routinely open multi-gigabyte buffers in Emacs, and that’s no problem, and at work, I routinely have multi-tens-of-gigabyte processes in Common Lisp servers, that should be no problem in Emacs either.

I had forgotten that Emacs’ garbage collector is, er, kinda primitive. It’s an old-style stop-and-copy collector. This is fine for huge buffers, because that’s basically just a handful of objects, no matter how large the buffer is. When you create a 200M element hash table, all those elements need to be mark-and-sweeped individually, and as I found halfway through implementing it, that makes Emacs pause for like ten minutes at a time.

So scratch that. I went for sqlite3 and I store the data on disk. But interacting directly with the sqlite3 database data is a drag, so I wrote a kind of ORM… well, it’s more of a PRM. Plist-Relational Mapping. Because strong typing is for dweebs.

I mean, professionals.

Anyway, the raw dataset from IMDB is about 3GB. The sqlite3 database is about 7GB, and takes an hour or two to create on my machine. (Before I found out that sqlite3 autocommits by default, I had like two transactions per second. By slapping a transaction around the import, I get a few thousands of inserts per second.)

The module has some bugs, but I’ve sent a pull request, so hopefully it gets fixed.  Or pull down my fork instead, which is here, and you’ll need it to do regexp searches.

If you want to play with this, you need a newish Emacs built with module support, the sqlite3 module, and imdb-mode.el.

And then just eval (imdb-download-and-create) and wait for a few hours.

The PRM works pretty much as you’d expect.

(imdb-select 'movie :mid "tt0090798") 
=> 
((:_type movie 
  :mid "tt0090798" 
  :type "movie" 
  :primary-title "Caravaggio" 
  :original-title "Caravaggio" 
  :adultp "N" 
  :start-year 1986 
  :end-year nil 
  :length 93))

See?

(pp (imdb-select 'crew :mid "tt0090798") (current-buffer)) 
((:_type crew :mid "tt0090798" :pid "nm0147599" :category "writer") 
 (:_type crew :mid "tt0090798" :pid "nm0413897" :category "writer") 
 (:_type crew :mid "tt0090798" :pid "nm0418746" :category "director") 
 (:_type crew :mid "tt0090798" :pid "nm0418746" :category "writer"))

Easy!

(imdb-select 'person :pid "nm0147599") 
=>
((:_type person 
  :pid "nm0147599" 
  :primary-name "Suso Cecchi D'Amico" 
  :birth-year 1914 
  :death-year 2010))

Anyway, that’s the low-level interface. Here’s what the user interface looks like:

And then “x” and the noise is gone!!!

Only the real movies! *gasp* It cannot be!

(I didn’t choose Fincher here because I like his films, but because he’s done so much junk that’s it’s impossible to use his IMDB page.)

But… when looking at some actors I knew pretty well, I soon noticed that not all the films that the actor appeared in were listed. Here’s Tilda Swinton most recent few years:

What’s going on?! No Doctor Strange? Could there be a bug in my code? That seems impossible? I mean, it’s my code!

But nope, the problem is with the IMDB dataset. The file that lists what films actors appear in, “title.principals.tsv”, isn’t a complete list of participants, but instead, as the name really sorta kinda implies, a list of the most important people in that film. That means that it lists directors, writers, the cinematographer, (some) producers and then a few actors. But never more than ten people per film.

This is really weird, because directors and writers are already in the “title.crew.tsv” file.

This made me sad until I realised that I could just resort to web scraping.

So now I use the data set as a base and then insert the missing things afterwards.

Doctor Strange!

And when I’ve resorted to web scraping for that, I can just scrape for actor images, too:

*sigh*

In short, IMDB doesn’t have a usable API, and IMDB no longer export enough data to do anything useful with that data. So I guess this Emacs mode will work until they tweak their HTML.

While implementing the scraping, IMDB suddenly went missing and I just got:

I then found out that I had er slightly miswritten the end of the recursion, so I was hitting imdb.com with dozens of invalid hits per second in several concurrent threads.  I AM SORRY IMDB.

They unblocked me after an hour.

Here’s what it looks like in action:

(Click on the embiggen symbol to embiggen so that you can see what’s going on.)

I’m like, selecting a few things and then toggling what bits to see (acting/directing/all/shorts).

Exciting!

And now it’s feature complete, too, so I’ve definitely saved a lot of time by writing this.

Er…  Oh, yeah.  I was going to implement intersections (i.e., list movies that have several specific people involved.  So let’s see which films Tilda Swinton did with Luca Guadagnino:

Easy peasy.  Or the more complete, unfiltered version:

Heh.  I think “Dias de cine” and stuff are just Italian long-running movie news shows that both of them have appeared on?  Or in the same episode?  Hm…  probably the latter, because each TV episode has its own unique ID.

 

Innovations in Emacs Touch Interfacing

I’ve long attempted to hack some touch interfaces for laptops in non-keyboard configurations.

The sad thing is that there aren’t really any good solutions in GNU/Linux. If you want to be able to respond to more complex events like “two finger drag”, you have to hack GTK and use Touchégg, and then it turns out that doesn’t really work on Wayland, and then most of the events disappeared from the X driver, and then…

In short, the situation is still a mess. And then my lug-around-the-apt-while-washing-TV-laptop died (ish), so I had to get a new one (a Lenovo X1 Yoga (2nd gen (which I had to buy from Australia, because nobody would sell it with the specs I wanted (i.e., LTE modem if I wanted to also take it travelling (the 3rd gen has an LTE modem that’s not supported by Linux))))):

And now, with Ubuntu 18.04, everything is even worse, and I’m not able to get any multi finger events at all! All the touch events are just translated into mouse events! Aaaargh!

After despairing for an eternity (OK, half a day), I remembered another touch interface that I quite like: The Perfect Reader.

It’s a bit hard to tell here, but the idea is that you divide the screen into areas, and then you just tap one of the areas to have the associated action happen.

Surely even Linux can’t have fucked up something so basic: It must be possible to get that kind of access.

And it’s possible! Behold!

Er… What’s going on on the other side of the backyard?

Eeek! Kitten! Go back inside!

That’s not a safe place to play! … *phew* It sat down, and turned around and went back inside. *heart attack averted*

ANYWAY!

The idea is that there’s one action grid overlay when Emacs is in the forefront, and another when the mpv video player is.  All the events go via Emacs, though, which controls mpv via the mpv socket interface.  (And, by the way, I have to say that I’m really impressed with mpv.  It has all the commands you want it to have.  The documentation is somewhat lacking, though.)

Here’s a demo:

Basically, I’m just reading the output from libinput-debug-events (which outputs everything that goes through /dev/input/event* (so you have to add your user to the input group to access them)), and then execute things based on that. libinput is allegedly the new hotness, and replaces libev and the synaptics X driver, and is supposed to be supported on both Wayland and Xorg, so hopefully this attempt at an interface will last a bit longer than the previous ones.

I wrote the controlling thing in Emacs, of course, and you can find it on Github. I’ve been using an Emacs-based movie/TV viewer since 2004, and I’m not giving up now! So there.

Flashair, Emacs and Me

My blogging methodology is that I 1) open an Emacs Message buffer, write stuff, and then 2) take pictures of stuff (mostly comics), wanting to have those images appear right where I’m typing. This is a solved issued with Flashair, PyFlashAero and watch-directory.el, but I thought that it sucked that there were so many moving parts.

And besides, PyFlashAero didn’t always do the right thing, and you have to specify so much stuff…

So I wanted to bring it all into Emacs for less fuss. You know this makes sense.

I had a third generation Toshiba Flashair card (W-03), and the problem is that it’s just too slow for my approach, which is basically to look into all directories on the card. It’s s-l-o-w. So I gave up in the project. PyFlashAero was written that way for a reason.

Half a year passed, and then I somehow was made aware that Toshiba had launched a new generation of their product, and it promised 3x faster WIFI speeds and a brand new and faster CPU.

So I got a card:

And I would like to say that it was an immediate success, but it definitely wasn’t. I could download the directory indices nice and fast, but whenever I tried to download an image, it stopped after 0 (zero) bytes. So it seemed like that card had a problem reading itself, basically, and would just hang whenever I tried requesting some data from the (exfat) file system on the card.

But! There was a new firmware W4.00.02, and I had W4.00.00. I installed the new firmware, and presto! It is teh work!

Look, I can snap pics of myself here I’m typing this stuff on the couch:

And it appears in the buffer within a second or two after I snap it! It’s a new paradigm! And it’s untouched by filthy unclean Pythonic hands; it’s all pure Emacs.

The range of the Flashair W-04 also seems improved… the old one had to be within a few meters of my laptop for the laptop and the Flashair to be able to communicate whereas the W-04 seems to be able to communicate over, er, more meters. Here, I went out into the hall and snapped a pic and this image was here in this buffer when I got back:

It’s magic!

But when I went to the kitchen and snapped a pic there, nothing showed up here, so it’s not that magical.

Anyway, here’s the Emacs source code.  You probably need a newish Emacs version for it to work.

meme x giffy

The other week I was tinkering with editing GIF animations in Emacs, and then I started wondering: Can this be any more ridiculous?

Yes.

So it’s a mashup of the Emacs meme mode and the new GIF animation code.  I spent most of the time on this wondering whether I could somehow make one or the other a minor mode before giving up and just mashed the code into meme.el, which can be found on Github.

The main challenge here was figuring out how to make this fast enough.  The first naive implementation just created an SVG 25 times a second with all the data in it, and Emacs just isn’t fast enough to print a ~2MB XML structure that often.  (Not to mention when animating bluray screen grabs: The SVG structure is then about 7MB.)

So I cheated and pre-computed the screengrab bits, and then plonked down those bits into the printed XML structure.  Which made it fast enough even for bluray animations on my rather spiffy machine; your mileage may vary.  If your machine is too slow, you may have to pre-downscale the screengrabs the animations are based on.

Exporting to GIF and MP4 is supported if you have ImageMagick “convert” and “ffpmeg” installed.

 

Of course you should be able to make animated GIFs in Emacs

I was wondering what a convenient production process for GIFs from movies would be like, so I hacked my hacked version of mplayer a bit more.  Nothing major, since it already has all the functionality, but it doesn’t group continuous screenshots by name, which makes picking out the animations afterwards awkward.

There’s probably a gazillion GIF editors out there already, but since the things you typically want to do with an animation (trim start/end, adjust speed and how many frames to skip) are kinda trivial, it seemed more convenient to just write a mode in Emacs.  So I did.

It uses the ImageMagick “convert” command to actually stitch the images together in the end after you’ve done the edit, so it’s not a pure Emacs-only solution.

And here’s the result:

I’m sure this is going to turn out to be really useful some day!

Editing Movie Posters From Emacs

I was waiting for some people to drop by yesterday to pick up a sofa, and I started thinking about how nice it would be to pull down movie posters automatically and perhaps put some text or border or something on them.

Instead of sensibly looking for an API for this kind of stuff, I wondered whether I could just quickly alter the imbd.el library, since it already had some imdb parsing functionality, and I have all films tagged with imdb IDs.

The short answer is “no, that can’t be done without insane hacks”, and the longer answer is, “I did it anyway”.

The imdb website uses a Javascript thing to display a “carousel” of images, and parsing Javascript with regexps and stuff isn’t something that can be recommended.

But it works right now, and will probably break when imdb does the smallest possible change to their code, but hey, whatevs.

To alter the images (as an example, I’ve added that red border with the wrong date up there). I wrote a teensy little library that just creates an SVG image, plonks the poster JPG into the SVG and then adds the border and text.  Composing images like this via SVG in Emacs is incredibly easy, especially since you can “live edit” the images by altering the SVG programmatically.

And then the couch people finally arrived.

Emacs Non-Flickering Patch

Earlier today, Daniel Colascione merged his double-buffering Emacs display patch, and I was interested in seeing whether it reduced flickering when viewing animated GIFs on my problematic main machine.

And it sure does:

First you see an Emacs from five hours ago displaying a GIF, and it is flicker-o-rama.  Then I switch to a brand new Emacs with Daniel’s patch, and it is completely flicker-free.

Now, that the flicker was there in the first place is probably due to me not bothering to figure out what settings the Nvidia driver needs to … work better: On my laptop, there was no flickering even without Daniel’s fix.  So your mileage will vary, but it’s obviosly a major step forward, flicker wise, on some machines.  Thanks, Daniel.