Technical Analysis

I thought I could use my prodigious financial know-how to do an analysis of Emacs Open Bugs chart.

First of all, we have clearly defined positive trend channels reaching back to 2009, broken by a period of recession.

But lately, this growth has been curbed and we’ve seen a clear development of a resistance line at 4100. It looked like we’d broken out of it in 2018, but then we have a reversal, but judging on this clear resistance, we’ll soon see a bounce-back.

Zooming even closer into the last couple of years, we see the emergence of a linear Fibonacci reversal trend coupled with a reverse Head & Shoulders® pattern complete with magnificent bubbles appearing and not the least hint of dandruff sloughing off of the chart, complete with a 10% (450 bugs) reduction due to me procrastinating a lot over the past month and a half and going through old bug reports, but now there’s a two week vacation from the vacation coming up.

This can only mean one thing, and I don’t have to spell it out for you, I think: Sell, sell, S-E-L SELL!

Your Emacs Statistics Service

I’ve been plugging away at the Emacs bug database the last month or so. I’m the kind of person who gets obsessed with something for a time and then I do something completely different the next month. So I’m away from Emacs developments for months on end (and this time it’s basically been a couple of years Due To Circumstances), but when I’m back and hacking away, I’m always wondering:

Are things getting better or are things getting worse?

Things are the same! The chart showing the number of open Emacs bugs has been remarkably flat over the last three years; three cheers for the Emacs maintainers and all their helpers!

But is this because there’s fewer bugs reported?

Hm… No, that red line (opened bugs) is almost a straight line, so it’s about fixing things and closing bug reports at a steady pace. (Interactive charts here.)

So let’s look at the number of commits and contributors:

Surprisingly enough the number of commits per month hasn’t been very large; about 300 per month. (The red line is when Emacs moved to git from bzr.) Or perhaps that’s not surprising; the fewer commits, the fewer new bugs?

In a longer perspective:

And here are the equivalent charts for the number of contributors:

Pretty steady at about 35 per month. (The recent spike is just me spelunking through the bugs database and applying a bunch of older patches that were submitted and not applied at the time they were submitted.) Or as the Emacs bug tracker graphs put it:

And in other Emacs-related news, I thought it was really amusing to see The New York Times using Emacs as an example when educating people about commit messages, and it was doubly amusing that the author used a commit message of mine when doing so.

So now I can say that I’m a New York Times-published author.

If it wasn’t for the small detail that that would be grossly misleading.

In conclusion:

Here’s a picture of some potatoes.

Towards a Cleaner Emacs Build

I’m planning on getting back into Emacs development after being mostly absent for a couple of years. One thing that’s long annoyed me when tinkering with the Lisp bits of Emacs is the huge number of compilation warnings. The C parts of Emacs were fixed up at least a decade ago, but this is what compiling the Lisp directory looks like:

For me, this has been a source of having to go slower when coding: I make a change, look at the output from the compilation window, and then do a double take when I see some warning about something I didn’t think I had touched.

And then it turns out to be an old warning about something completely different.

The number of warnings in an Emacs build has been fluctuating, but sort of growing. There were 440 Warning: lines output by the build process, totalling 1800 lines on stderr. That’s kinda a lot to look at when doing a build.

There’s a number of reasons that the Emacs build looks like this, but the most important is perhaps the somewhat unique way the Emacs Lisp code has traditionally been developed: Many of the major modules have been maintained out-of-tree, and often support a huge number of Emacs versions dating back to the 1980s. Not to mention XEmacs.

This leads to there being conditional calls to code that doesn’t exist in modern Emacs, and code that doesn’t use new calling conventions.

The other is that, well, Emacs has a long history, but the Emacs Lisp language is evolving constantly, what with lexical binding and all. What was good code in 1993 now uses outmoded idioms, and these idioms trigger compilation warnings.

The development situation has changed somewhat over the last few years: Now most of the code in the Emacs tree is developed in the Emacs git repository, and the external packages are instead distributed using the Emacs package system. So the half-external/half-internal development isn’t as big an issue any more (although there are still (very) significant packages developed this way, like CC mode).

And XEmacs compatibility isn’t a major issue any more for many people.

So I thought now was the time to roll up my t-shirt sleeves and get stuck in to the code and get organisised.

90% of the warnings took 10% of the time: They were easy syntactic changes to bring code up to date with the new Emacs Lisp standards. The next 9% took 90% of the time. And then the last 1% took another 90% of the time.

So there was a lot of questions asked and some new tests implemented to ensure that the changes didn’t break anything.

And a lot of questions answered by all the smart people on emacs-devel.

But now it’s over! That is, there’s one single Warning: left, and that’s being pondered.

The total output from a “make bootstrap” is down from 5200 lines to 2900 lines (on this machine; it may vary somewhat), which is a 40% reduction. Looking at Emacs compiling now is a calmer experience.

I also added some new progress messaging in parts where a single section takes so long that it looks like it’s crashed or something, so it’s not purely a “get rid of lines” project.

Virtually all of the warnings fixed were valid warnings (i.e., they were about things we’d rather not see in the Emacs Lisp code), but some warnings were false positives. For instance, Emacs has a method to mark functions as obsolete, and then you get a warning if you load that code, which is a nice way of letting users know that something is going to disappear in a few years. (And Emacs has a very conservative removal policy; obsolete functions are kept around for like a decade.)

But functions are sometimes obsoleted in groups, so you may have one obsolete function calling another obsolete function in that group, and that will issue a compilation warning… and it shouldn’t.

So we’ve introduced a new macro

(with-suppressed-warnings ((obsolete an-old-function))
  (an-old-function :foo :bar))

to make the byte compiler know that we know about this, and not issue any warnings about that. (And there are similar things about the number of arguments to the functions.)

I had to use the macro about a dozen places, which isn’t a lot, percentage wise.

I hope the somewhat less daunting compilation output will help developers, old and new, to get more stuff done. At least a little bit.

Ununicode

I’ve been messing around to see whether running a WordPress installation is fun or not (spoilers: it’s really not), and all of a sudden my test blog articles had turned a strange shade of non-UTF-8.

For instance, some texts I had quoted used that strange apostrophe in “it’s”, and that had turned into “it’s”.

Now, that sequence of characters (which are Unicode code points 0xE2, 0x20AC and 0x2122) bears no resemblance to the code point for ’, which is 0x2019. But the UTF-8 for ’ is #xE2 #x80 #x99, and that’s the clue: In Windows Code Page 1252, the Euro sign in #x80 and the TM sign is #x99, so what I had on my hands was UTF-8 interpreted as CP1252, and then output as UTF-8 by WordPress.

*phew*

I wondered whether any series of calls to `{en,de}code-coding-region’ coupled with `string-{as,to}-unibyte’ would possibly allow me to un-destroy the text, but that made my head hurt, so I wrote undecodify.el and put it on Microsoft Github.

(undecodify "it’s") => "it’s"

It’s trivial, but at least that fixed the blog articles.

Now I just have to wait for the next thing to go wrong with WordPress…

Working with X in Emacs

While tweaking the Emacs-based screensaver, it began to become clear that I just didn’t have access to a sufficient number of X events. In particular, I want to be able to wake the screen up by hitting the shift key, and I just could see any way to get at that event.

So I asked on the Emacs mailing list and Stefan Monnier replied:

I’m taking his word for it that it’s simple, but I didn’t want to get into low-level Emacs hacking at this time, so I wondered whether I could just speak xcb directly.

There’s an excellent Emacs library for talking to X, but the problem is that there’s not a lot of documentation or example code. I wanted to pop up a (transparent) window, get any user events, and then do stuff based on that (i.e., stop the screensaver).

So I googled for “make-instance ‘xcb:CreateWindow”:

Five hits! *gulp*

So I cheerily tried the first one, and the code there is very nice and understandable. But it took me some time to figure out how all the parts work together by looking at the code, the xcb.el library, and the official X documentation.

The good news is that there’s an almost 1:1 correspondance between the X C-level function and the xcb.el library. xcb.el doesn’t do any C stuff: It just talks to the X server over a network connection, which is a nice solution, because it meanst that you don’t have to compile any support into Emacs or use the C module layer.

So now I got it to work! You can exit the screensaver with any relevant X event.

But to help other Emacs hackers that may in the future want to do other xcb things, I’ve factored out the bits to create a window and then do stuff based on actions. I think it’s basically as minimal as it can be and still demonstrate the basics of how this stuff works. Some of the code is crabbed from cheerilee.

It’s not difficult stuff, because xcb.el is very nice. You just push objects to X and you get events back, and it feels like a very natural way to work in Emacs with these concepts.

Go forth and X.

A New Eval Server For Emacs

Emacs has a mechanism for client/server communication (and remote eval) that’s simultaneously too insecure and too secure at the same time.

Here’s the extremely convenient way to start a server:

(setq server-use-tcp t
      server-host (system-name)
      server-name (concat "foo-" (system-name)))
(server-start)

This will create a file (if called on a machine named “stories”)
called ~/.emacs.d/server/foo-stories with the following content:

192.168.1.53:41929 4021
fH?D+M=u=r@N1O^L-`c"c_GYUj%zj,,hc&9QGF+0}0;c}M3>Evc2SdH_N\`SSV\t

If you then say (on the command line)

$ emacsclient --server-file=foo-stories --eval "(tellstick-switch 0)"

the emacsclient binary will read that file, find the IP address and port number there, connect to that address, output the second line (which is a security cookie), and then the form to be evalled. The server will check that the cookie is what it’s expecting, and then eval whatever it’s asked to eval, and will return the result.

Now, for this to work, the host that you’re running the emacsclient on has to have access to that file, which means that (in practice) the entire thing is based on a shared NFS or sshfs setup, which makes it impossible (or at least extremely awkward) to use in many situations. So it’s too secure.

On the other hand, if you do have that cookie, you can make the server eval anything. And the communication is in clear text, so you can man-in-the-middle as much as you’d like. So it’s really very insecure and can’t be used any other place other than a totally locked-down, trusted network.

So Emacs should have a different way to do an eval server. Specs:

  • Don’t rely on a shared file system
  • Communication should be encrypted
  • The server should be able to limit the number of exposed endpoints

      So, ideally starting the server should look something like:

      (start-eval-server "lights" 8100 '(turn-on-lights turn-off-lights))

      where the first parameter is the name of the service, the second is the port number it should listen to, and the third parameter is a list of functions it will allow the caller to have evaluated.

      The client will typically look like:

      (eval-at "lights" "stories" 8100 '(turn-on-lights kitchen))

      For encryption, we could either do TLS or something home-brewed. And as all crypto experts tell us: Always roll your own crypto, so we’ll go with the latter one. But most of all because we’d really only be doing self-signed certificates, so TLS gives us a lot a problems without giving us security.

      But we could rely on an ssh-like setup with pinned self-signed certificates to give the client security, and client certificates to give the server security, but it’s… difficult to see how that could be made to work without a lot of handholding. And Emacs doesn’t have built-in tools to create certificates, anyway…

      Instead we’ll do symmetric encryption with a shared key. We’ll have ~/.authinfo entries like:

      machine lights port 8100 password s0pers3cret

      So… we’re not relying on a shared file system, but we are relying on pre-shared keys.

      I’ve put the results on github, and in addition to the Emacs client/server bits, I’ve also added a shell script that relies on openssh to do the encryption.

      $ ./eval-client lights stories 8710 '(+ 5 4)'
      9
      $ ./eval-client lights stories 8710 '(+ f 4)'
      Error: Got error message from server:
      (number-or-marker-p f)

      Seems to work!

      The client-server communication is plist-based, so they pass around things like:

      I think AES-256-CBC with PKCS#7 padding is a reasonable cipher to use for this type of thing, but I’m not an encryption expert. Does anybody have any input on that bit?

      The sexp specifies what cipher we’re using, so the protocol should be backwards/forwards compatible if we move to a different cipher in the future. The server should use the same cipher the client used, or if that’s not supportet any more, return an error.

      The error message is also encrypted, of course.

      Another design niggle: I’ve made the eval-at function throw errors when the server says an error has occurred, and if the error occurred when evalling the form, eval-at will throw the same error. Like:

      (eval-at "lights" "stories" 8710 '(+ t 2))
      -> (wrong-type-argument . "(number-or-marker-p t)")

      Does that make sense? I mean, eval-at itself didn’t have a wrong-type argument: It made a network connection, squirted some data over, and got some data back. So it was a success! But…

      What do you think?

      When I find some time (and after using this for a while to see whether it makes sense), I’ll probably propose adding this to Emacs and deprecate the old emacsclient stuff.