Even more about Quoted-Printable mis-decoding

I wasn’t going to do further posts about the equals signs in the Epstein files — I mean, it really feels pretty… icky… to be doing technical analysis and speculation about something as horrific as this — but today I saw a perfect example of what I was just speculating about in the second blog post about this.

To recap: I think my explanation for the substitutions of equals signs for characters sounds plausible. (I’m not going through that again here, read the link above for details of the imagined, buggy algorithm used on these emails.)

But I didn’t have a really convincing explanation for the left-over =A0s (etc.) in the emails. That’s the way non-ASCII characters are encoded by Quoted-Printable, but if the buggy algorithm looked exactly like what I proposed, then all those would have been fixed.

Instead it looked like some of them were correctly decoded and some not. Here’s a typical example:

The above clearly originated in =C2=A0, which is UTF-8 for NON-BREAKING SPACE, which is natural to use here. I fabulated that perhaps the algorithm had a off-by-one error that made it skip every other encoded character, leaving =C2, decoded as an invalid 1 byte sequence, and =A0 undecoded.

And today I happened on an example that shows exactly that! That was, obviously, originally She=E2=80=99s, which is supposed to decode to She’s — that apostrophe there is RIGHT SINGLE QUOTATION MARK, which has a three byte UTF-8 encoding.

So — it’s indeed skipping every other encoded non-ASCII byte here, leading me to believe the algo is something along the lines of:

(let ((string "She=E2=80=99s got")
      (start 0))
  (while (string-match "=\\([A-F0-9][A-F0-9]\\)" string start)
    (setq string
          (concat (substring string 0 (match-beginning 0))
                  (format "%c" (string-to-number
                                (match-string 1 string) 16))
                  (substring string (match-end 0))))
    (setq start (match-end 0)))
  string)

Which will indeed produce She�=80�s got. (The error here isn’t a simple off-by-one error, but a miscount by not realising that the string gets shorter as you’re decoding bytes one-by-one.)

I mean, obviously whoever made this error didn’t program in Emacs Lisp, but the principle is the same in most programming languages.

So there you go. I think that’s plausible, at least. Even if it’s a pretty astoundingly incompetent to do, and it lines up with my suspicions that whoever implemented this was used to working on a single byte character set (like Windows CP1252) where this problem rarely can be observed.

Oh, and by the way:

I originally assumed that these emails had had their text/plain parts processed, but I think it’s more likely that it’s the text/html parts instead. They’ve been processed to remove all the HTML tags, of course, but where the buggy end-of-line Quoted-Printable algorithm has struck, you can see </span> having been turned into =/span>, which again later means that the HTML stripper hasn’t been able to do its work in turn.

OK, now I’m done.

Too Many Album Versions Man

When you get to a certain age, and you buy a lot of albums, you sorta inevitably end up with several versions of the albums you like the most, don’t you? Or is that just me?

For instance, Talking Heads released eight studio albums in their time, and I have… 24 things from them in /music/repository/Talking Heads/. The situation is even worse with David Bowie, who released 26 studio albums, and I have 166 things in /music/repository/David Bowie/.

(Yes, I rip everything to flac and play from the computer.)

This over-shopping happens for a variety of reasons — with Talking Heads it’s because later versions of the albums have been fucked around with so much in “remastering” that I find them annoying, so I’ve ended up buying several old versions to find a good one.

(For the ’77 album, you have to buy a vinyl from the first few years — in later versions, banjos suddenly appears on several tracks. I’ve read that people have similar problem on streaming services, too? They have a favourite album, but then all of a sudden it’s swapped out with a remaster? What a total, absolute nightmare! *gasp* *shock*)

For David Bowie, it’s just that I’ve got a ton of live albums, and box sets, and singles and singles and singles…

Anyway, the end result is that I’m spending an inordinate amount of time to find the “real” albums for these artists… so I wondered whether I could just quickly hack up a categorisation system that would cover my use case here? Here’s what I came up with:

I typed away a bit, and:

Tada! Now I can listen to the five first Talking Heads albums without futzing around a bit. (I pretend the last three albums don’t exist.)

(Yeah yeah yeah.)

But what about ol’ Bowie?

WHAT A NIGHTMARE

That’s better. But I found when I was doing tagging up the albums, there’s certain EPs and live albums that I consider essential, so I didn’t tag up David Live here, for instance, so I’m a bit inconsistent. But whatevs! I’m gonna save hours and hours!

Fortunately, there’s only a couple handful of artists that I have this problem with, so I don’t have to tag a lot…

Heh, I’ve got 91 Coil things — with an extended bootleg “archive” version of each EP, for instance… OK, there’s more than a handful of artists that need tagging. *rolls up t-shirt sleeves*

Fortunately, this new system means that I can buy even more totally marginal albums, and then just hide them away after listening to them a couple of times.

Didn’t Peter Gabriel also do a German version of another album!? Now that I don’t have to skip them every time I want to play the first four albums… *gasp*

Hey! That’s me!

A Norwegian computer magazine contacted me last week about publishing a translation of my blog post about RFC2045. So I translated, and now it’s here.

It’s the same text, basically, but with added jokes. Very appropriate for the subject matter I’m sure.

The hardest thing to translate was “rock döts”. We workshopped the translation quickly on irc and landed on “röcktödlar”…

… which is a word the world has never seen before! I’m shocked! But also proud.

There’s a soft copyright strike thing now?

Oh, some context — ten years ago I decided to watch as many Ingmar Bergman things I could lay may hands on, which turned out to be 87 things, so I landed on this design for the blog series:

Most of those things I watched were his well-known and fully available films, but he’d also done a large number of stage plays, and there’d been a number of short films, TV plays, and short documentaries about Bergman, and these were really hard to come by.

I found a guy in France (I think; still not sure) that sold bootleg DVD-Rs of this stuff — he must have been collecting for decades, and that helped a lot. I also found stuff on the torrents, and on other blogs, and in the end, people who had recorded stuff from the TV in the 80s also sent me things.

Anyway, after watching all this marvellous stuff, I wondered whether I should do something with this treasure trove… but I didn’t. Then I saw that the bootleg guy had totally disappeared off the face of the internet, and his DVD-R burning business with it.

So I thought the time was probably come to just upload all the things that were not commercially available anywhere to Youtube, as “The Bergman Channel”. And so I did, and I was surprised that the channel survived the copyright strikes, but it did — mainly because the main copyright holder to Bergman’s work is Svenska filmindustrier, who have mainly just claimed copyright for the things in Sweden. So if you’re in Sweden, half of The Bergman Channel is blocked.

But today I got this:

The interesting thing is this:

What to do next
[…]
* Delete your video. If you remove your video before 7 days are up, your
video will be off the site, but your channel won’t get a copyright strike.

It’s a “soft” copyright strike? I’ve never seen one of those before. Are they new? Anyway, it’s nice, because if you get three normal copyright strikes, your entire channel disappears. So thanks — I’ve now deleted “Karins ansikte”. Is it available elsewhere now? I haven’t paid attention…

While I was logged into the channel again for the first time in at least four years, I had a look at the stats:

Hey, total view time is 17K hours! Nice. Looks like the most-watched Bergman thing is the long-lost made-for-TV film Rabies. I mean, you can understand why it’s long-lost — the video quality is kinda bad, and if they don’t have a better source than this, I guess they’d feel bad about releasing it commercially. But like I said, I haven’t really checked whether these things have become available by now…

Watch them all here before they disappear. By this rate (one going missing every four years), that’ll take only 128 years…

I’ve got original pages by Carol Swain!!!

Carol Swain started selling original art a couple of months ago, and I snapped up this three page story, Jig and reel. Each page is about 30x42cm big. They’re so gorgeous!

*calms down a bit*

I think I should get them framed… I wanna have them on the wall. Hm… one long frame or three separate frames and arrange them like a triptych? Hm…