I wasn’t going to do further posts about the equals signs in the Epstein files — I mean, it really feels pretty… icky… to be doing technical analysis and speculation about something as horrific as this — but today I saw a perfect example of what I was just speculating about in the second blog post about this.
To recap: I think my explanation for the substitutions of equals signs for characters sounds plausible. (I’m not going through that again here, read the link above for details of the imagined, buggy algorithm used on these emails.)
But I didn’t have a really convincing explanation for the left-over =A0s (etc.) in the emails. That’s the way non-ASCII characters are encoded by Quoted-Printable, but if the buggy algorithm looked exactly like what I proposed, then all those would have been fixed.
Instead it looked like some of them were correctly decoded and some not. Here’s a typical example:
The above clearly originated in =C2=A0, which is UTF-8 for NON-BREAKING SPACE, which is natural to use here. I fabulated that perhaps the algorithm had a off-by-one error that made it skip every other encoded character, leaving =C2, decoded as an invalid 1 byte sequence, and =A0 undecoded.
And today I happened on an example that shows exactly that! That was, obviously, originally She=E2=80=99s, which is supposed to decode to She’s — that apostrophe there is RIGHT SINGLE QUOTATION MARK, which has a three byte UTF-8 encoding.
So — it’s indeed skipping every other encoded non-ASCII byte here, leading me to believe the algo is something along the lines of:
(let ((string "She=E2=80=99s got")
(start 0))
(while (string-match "=\\([A-F0-9][A-F0-9]\\)" string start)
(setq string
(concat (substring string 0 (match-beginning 0))
(format "%c" (string-to-number
(match-string 1 string) 16))
(substring string (match-end 0))))
(setq start (match-end 0)))
string)Which will indeed produce She�=80�s got. (The error here isn’t a simple off-by-one error, but a miscount by not realising that the string gets shorter as you’re decoding bytes one-by-one.)
I mean, obviously whoever made this error didn’t program in Emacs Lisp, but the principle is the same in most programming languages.
So there you go. I think that’s plausible, at least. Even if it’s a pretty astoundingly incompetent to do, and it lines up with my suspicions that whoever implemented this was used to working on a single byte character set (like Windows CP1252) where this problem rarely can be observed.
Oh, and by the way:
I originally assumed that these emails had had their text/plain parts processed, but I think it’s more likely that it’s the text/html parts instead. They’ve been processed to remove all the HTML tags, of course, but where the buggy end-of-line Quoted-Printable algorithm has struck, you can see </span> having been turned into =/span>, which again later means that the HTML stripper hasn’t been able to do its work in turn.
OK, now I’m done.


















