So You Want To Run Your Own Mail Server…

Whenever the subject of running your own mail server comes up, there’ll always be two people who chime in.

The first will say “No, don’t do it! It’s a virtually impossible thing to do these days!”

The second will say “Don’t listen to that guy! It’s trivial! I just installed one and I had no problems!”

Both are right, and both are wrong: It’s very much possible to run your own MTA (Mail Transport Agent), but there’s a lot of steps required if this is to be successful (where “successful” is here defined as “Gmail won’t put your email in the spam folder”). Every one of these steps are trivial, but you have to suffer through many TLAs and ETLAs concepts that there’s really no point in knowing anything about.

I mean, just to run a mail server.

So I’ve written a script that does it all for you. Other than the DNS changes that you’ll have to do, but it’ll tell you exactly what you have to put where.

The script requires a Debian or Ubuntu host, and it’ll install a whole lot of stuff. If you already have a partial mail server setup, it may break the server. Nothing’s guaranteed.

After running it, you’ll have a host with (ETLAs incoming) the exim4 SMTP server (with TLS, DKIM signing, ClamAV and Spamassassin), Dovecot IMAP (with TLS), SPF, DMARC and certificates from Let’s Encrypt.

If you’re a spammer (I mean “marketing expert”), please go away. This is not about being able to send out newsletters; it’s for people who want to run their own mail servers for whatever reason (for instance, to avoid the monoculture of Gmail).

tl;dr:

Download the script

curl -O https://raw.githubusercontent.com/larsmagne/make-mta/master/make-mta.sh

and then run it. Follow the instructions, and you’re done.

Longer version:

Running a mail server doesn’t take much work: After it’s up it, it’s probably going to stay up. However, I’m going to go through all through all the steps the script does, so that if you’re interested you can understand how it all fits together.

This is going to be overly detailed: I’m going to go through everything, point by point, from provisioning a server to configuring your mail client. Prepare to scroll.

The assumptions are: You own a domain, and you want to send out mail from that domain, and you want to receive mail for that domain. In my case, I have the domain “eyesore.no”, and I’ll be using that throughout all the examples.

First of all, you need a name for the server. “mta” is nice, so I’ll call mine “mta.eyesore.no”. Now that the difficult part is over (“there’s only two difficult things in IT: Naming, caching and off-by-one errors”), you need a server.

Any hosting provider is fine: Let’s go with DigitalOcean. After you’ve gotten an account there, click “New Project”, give it a name, and then “Create”.

Create a Ubuntu LTS image; the cheapest one they have. An MTA requires virtually no resources, but if you expect to keep a lot of incoming mail on your IMAP server, you may want one with more storage than 25GB.

These days, it’s nice to have IPv6 support, but it doesn’t really matter much.

You need a way to log in. I strongly recommend adding your public ssh key here, but you can also use a password.

And here’s where you enter that name you came up with earlier.

Then create the server (or “droplet” as DigitalOcean cutesely calls it).

Within a few seconds, your server is created, and that number is the IPv4 address of the server (and the unhelpfully shortened IPv6 address).

Now go to your DNS provider (and you’ll be doing a lot of things here, so keep this window open) and add some resolving.

I’m using Cloudflare, because it’s… nice?

That’s an “A” record for the IPv4 address…

… and an AAAA record for IPv6.

Now you can ssh to the server:

Download the script

curl -O https://raw.githubusercontent.com/larsmagne/make-mta/master/make-mta.sh

and then run it:

This will do basic stuff like “apt upgrade”. Nice to have the system somewhat up-to-date.

This will install and enable the ufw firewall, and open the ports we need to run an MTA (ssh, smtp, imaps, http). fail2ban is also installed.

The script will then try to figure out the host name based on the IP address. If it can’t, then it means that reverse DNS wasn’t added automatically, and you have to add a PTR DNS record for the IP address (both IPv4 and IPv6, if you have that). All MTAs need to have reverse IP records, otherwise many MTAs will refuse to accept mail from the server.

This will start a standalone http server and use the Let’s Encrypt servers to acquire a TLS certificate. Answer the questions it asks.

Next, it’ll fetch and configure exim itself (along with SpamAssassin and clamav), and set up DKIM. (DKIM is a way to sign mail so that others can verify that it comes from your mail server.)

This server accepts mail for you, so you need a way to read it. IMAP is the preferred way for most. This IMAP server will also use the Let’s Encrypt certificates to authenticate the connection.

We’re done! That is, we only have to make these DNS changes that the script has summarised. So let’s do them, one by one, in the Cloudflare interface.

First, the DKIM public key. This will allow other MTAs to verify that an email came from this MTA.

Then the SPF data. This says that your MTA is allowed to send mail from your domain.

Then the DMARC data. This says that you should do with failures from DKIM and SPF. (Note that this DMARC policy is very relaxed; you may want to make it more strict.)

Finally, add an MX record for the domain. This means that all incoming mail for the domain will go to this mail server.

Now the server is all set up… except that you probably need a user that’ll receive email there.

You may want to choose a different name (if you have a different name).

Let’s see what the mails look like now:

Mail-Tester is a convenient tool to test how it all went.

Perfect!

Even Gmail can’t complain about your mail now.

Well… unless they do. They’ll probably find some other hoops to run through any day now, but for the moment you should be OK. Oh, and if you’ve provisioned a server that happened to get the IP address of a previous notorious spammer, you may find that your mail gets tagged as spam, anyway. In practice, if you use a reputable hosting service, this isn’t a big problem, but it can happen. If that’s the case, try again and get a new IP address.

Anyway, it might be helpful to also show how to connect to the MTA, right? As an example, here’s Evolution:

Basic info.

IMAP on port 993 (with TLS).

Outgoing email on port 465, with TLS and authentication. Done!

The reason I started thinking about writing this script (and blog post) is twofold: I think it’s a shame that we’re devolving from a decentralised infrastructure for email to a centralised one (i.e., Gmail). We can all see where Gmail is heading — towards a total silo, but it hasn’t quite gotten there yet, because they still have to interoperate somewhat with the rest of the world.

The other is that it’s really annoying that (apparently) nobody has done this before. It’s not like any of the things that the script does is difficult: It’s just that if you don’t know what you’re supposed to do, you can spend hours Googling around, and you’ll mostly get outdated information. For instance, if you search for “exim authentication”, you’ll find this official-looking page that gives horrendous tips like “You also need to add the Debian-exim user into the shadow group, so as to give Exim access to /etc/shadow via PAM.”

No! You should absolutely not let the exim process read the /etc/shadow file, because that’s a way to escalate bugs in the exim server.

And so on.

The script is on Microsoft Github, and pull requests are very welcome. I’m sure there’s many things that can be improved.

Outgoing DKIM and exim4

So, I sent an email to my sister, and I didn’t hear back. After exchanging some SMS-es, it turns out my mails went to the spam box on Gmail.

Rude!

That’s a new development for my MTA (quimby.gnus.org), so I tried poking around seeing whether I’d ended up in a blacklist or something. But, no, apparently Gmail now sends you to the spam hole if you don’t have DKIM on your outgoing email, and the moon is in the wrong alignment with Saturn.

So I thought “well, enabling outgoing DKIM on exim4 on Debian is surely just one command?” Right?

After googling around, I found… nothing. I did find a bunch of howtos, but they were obviously outdated because the referred to stuff that no longer exist.

So if you’re in my situation, and really don’t want to know anything about DKIM beyond AAARGH GMAIL THINKS I”M A SPAMMER, I’ve written a teensy script that should get you up and going.

At least at the time of writing, and with the current version of Debian Stable. (I’m writing this at on March 22nd, 2020, at 23:43. If you’re trying to set this up any time later than that, you probably have to tweak stuff a bit.)

And look what Mail Tester says now:

Gawrsh.

Of course, my email is now so authenticated and secure that if my MTA goes down, I’ll never be able to send a single mail ever again.

The Mysteries of WordPress

I moved to a self-hosted WordPress last week, and importing the images failed, so I had to do that manually. (That is, with rsync.)

Everything seemed to work fine, but then I noticed that loading the images of some of the older pages seemed to take a long time. Like, downloading megabytes and megabytes of data.

Time to do some debuggin’.

I’ve been a WordPress.com user for almost a decade, and I have avoided actually looking at the WordPress mechanisms as much as I can. But I did know that WordPress rescales images when you add them to the media library:

So uploading that dsc00863.jpg file results in all those different files being made, and since that hadn’t happened during my migration, I tried the Media Sync plugin, which is apparently designed just for my use case. I let it loose on one of the directories, and all the scaled images were dutifully made, but… loading the page was just as slow.

*sigh*

I guess there’s really no avoiding it: I have to read some WordPress code, which I have never done, ever, in my life. And my initial reaction to reading looking at the code can best be described as:

AAAAARGH!!!! IT”S THE MOST HORRIBLE THING EVER IN THE HISTORY OF EVER!

It’s old-old-style PHP, which is an unholy mix of bad HTML and intermixed PHP, with 200-column-wide lines. I had imagined that WordPress was … you know, clever, or something. I mean, it’s what the internet is built on, and then it’s just… this?

Anyway, I started putting some debugging statements here and there, and after a surprisingly short time, I had narrowed down what adds srcset (with the scaled images) to the img elements:

And I started to appreciate the WordPress code: Sure, it’s old-fashioned and really rubs me the wrong way with its whitespace choices, but it’s really readable. I mean, everything is just there: There’s no mysterious framework or redirection or abstract factories.

The code above looks at the HTML of an img tag, and if the class (!) of the img contains the string “wp-image-“, then that’s how it identifies the image in the database, and uses that to look up the metadata (sizes and stuff) to make the srcset attribute.

You may quibble and say that stashing that data in the class of the img is a hacky choice, but I just admire how the Automattic programmers didn’t do a standup where that went:

“well, what if we, in the future, change ‘wp-image-‘ to be something else? And then the regexp here will have to be updated in addition to the code that generates the data, so we need to encapsulate this in a factory factory that makes an object that can output both the regexp and the function to generate the string, otherwise we’re repeating ourselves; and then we need a configuration language to allow changing this on a per-site basis, and then we need a configuration language generator factory in case some people want to store the ‘wp-image-‘ conf in XML and some in YAML, and then”

No. They put this in the code:

preg_match( '/wp-image-([0-9]+)/i', $image, $class_id )

Which means that somebody like me, who’s never seen any WordPress code before, immediately knows what had to be changed: The HTML of the blog posts has to be changed when doing the media import, so that everything’s in sync. Using the Media Sync plugin is somewhat meaningless for my use case: It adds the images to the library, but doesn’t update the HTML that refers to the media.

So, what to do… I could write a WordPress plugin to do this the right way… but I don’t want to do that, because, well, I know nothing about WordPress internals, so I’d like to mess that up as little as possible.

But! I’ve got an Emacs library for editing WordPress articles. I could just extend that slightly to download the images, reupload them, and then alter the HTML? Hey? Simple!

And that bit was indeed trivial, but then I thought… “it would be nice if the URLs of the images didn’t change”. I mean, just for giggles.

This is basically what the image HTML in a WordPress source looks like. The images are in “wp-content/uploads/” and then a year/month thing. When uploading, your image lands in the current month’s directory. How difficult would it be to convince WordPress to make it save it to the original date via the API?

I grepped a bit, and landed on mw_newMediaObject() in class-wp-xmlrpc-server.php, and changed the above to:

And that’s it! Now the images go to whatever directory I specified in the API call, so I can control this from Emacs.

WordPress doesn’t like overwriting files, of course, so if asked to write 2017/06/foo.jpg, and that already exists, it writes to 2017/06/foo-1.jpg instead. Would it be difficult to convince WordPress otherwise?

No! A trivial substitution in wp_upload_bits() in functions.php was all that it took.

With those in place, and running an Emacs on the same server that WordPress was running (for lower latency), Emacs reuploaded (and edited) all the 2K posts and 30-40K images in a matter of hours. And all the old posts now have nice srcsets, which means that loading an old message doesn’t take 50MB, but instead, like… less than that…

It’s environmentally sound!

(Note: I’m not recommending anybody doing these alterations “for real”, because there’s probably all kinds of security implications. I just did them, ran the reupload script, and then backed them out again toot sweet.)

Anyway, my point is that I really appreciate the simplicity and clarity of the WordPress code. It’s unusual to sit down with a project you’ve never seen before, and it turns out to be this trivial to whack it into doing your nefarious bidding.

This Is A Test

This blog has been hosted on WordPress.com for many a year. It has, all in all, been a very pleasant experience: It feels like the uptime has been at least 110%, and most everything just works.

The problems with using that solution is that it’s very restrictive. There’s so many little things you just can’t do, like adding Javascript code (for which I’m sure many people are grateful), or customising the CSS in a convenient way.

I’ve worked around the shortcomings of the platform, but the small annoyances have piled up, and this weekend I finally took the plunge.

The reason for doing it now instead of later was that WordPress.com seemed to experience a hickup a couple of days ago, and I thought that instead of bugging support with the problem, I’d just take it as an opportunity to get moving. The problem was that the admin pages suddenly started taking 15 seconds to load. I checked it out in the browser debugger, and it was the initial “GET /” thing that took 15.something seconds, but only if I was logged in. So they obviously had an auth component that was timing out, and falling back to a backup thing (and it’s been fixed now).

But I clicked “export”, created a new VM at DigitalOcean, and got importing.

And… it failed. It got a bit further every time, downloading all the media from the old blog, but then failed with “There has been a critical error on your website. Please check your site admin email inbox for instructions.”.

After doing that for about ten times (and no email), I checked the export XML file, and what did I find?

*sigh*

So I got a new export file (after waiting 15 seconds), and ran the import again… and it failed again the same way. So that wasn’t the problem after all?

I blew the VM away, started from scratch again, and this time skipped doing the import of the media, and that worked perfectly.

To do the media, I just scripted something do download all the images, and then I rsynced it over to the new instance. Seems to work fine, even if the images aren’t in the “media library” of WordPress, but I never cared about that anyway…

It’s even possible to copy over subscribers and stats from the old WordPress.com instance, but that requires help from the Automattic support people. And I’m flabbergasted at how efficient they are: I had two requests, and each time it took them less than five minutes to do the request and get a response. I’ve never seen customer support, I mean Happiness Engineering, that efficient before; ever. It almost made me regret doing the entire move to self-hosted blogging…

Anyway. This is a test! If this post is posted, the new WordPress instance works.

Search Index Cleanliness Is Next To Something

Allegedly, 30% of all web pages are now WordPress. I’m guessing most of these WordPress sites aren’t typical blog sites, but there sure are many of them out there.

Which makes it so puzzling why Google and WordPress don’t really play together very well.

Lemme just use on of my own stupid hobby sites, Totally Epic, as an example:

OK, the first hit is nice, because it’s the front page. The rest of page one in the search results is all “page 14”, “category” pages and the like, none of which are pages that anybody searching for results are interested in.

The worst of these are the “page 14” links: WordPress, by default, does pagination by starting at the most recent article, and then counts backwards. So if you have a page length of five articles, the five most recent articles will be on the first page, then the next five articles are on “page 2”, and so on.

You know the problem with actually referring to these pages after the fact: What was once the final article on “page 2” will become the first article on “page 3” when the blog bloviator writes a new article: It pushes everything downwards.

So when you’re googling for whatever, and the answer is on a “page 14” link, it usually turns out not to be there, anyway. Instead it’s on “page 16”. Or “page 47”. Who knows?

Who can we blame for this sorry state of affairs? WordPress, sure; it’s sad that they don’t use some kind of permanent link structure for “pages”. Instead of https://totally-epic.kwakk.info/page/5/, the link could have been https://totally-epic.kwakk.info/articles/53-49/; i.e., the post numbers, or https://totally-epic.kwakk.info/date/20110424T042353-20110520T030245/ (a publication time range), or whatever. (This would mean that the pages could increase or shrink in size if the bloviator deletes or adds articles with a “fake” time stamp later, but whatevs?)

Can we also blame Google? Please? Can we?

Sure. There’s a gazillion blogs out there, and they basically all have this problem, and Google could have special-cased it for WordPress (remember that 30% thing? OK, it’s a dubious number) to rank these overview pages lower, and rank the individual articles higher. Because it’s those individual pages we’re interested in.

This brings us to a related thing we can blame Google for: They’re just not indexing obscure blogs as well as they used to. Many’s the time I’m looking for something I’m sure I’ve seen somewhere, and it doesn’t turn up anywhere on Google (not even on the Dark Web; i.e., page 2 of the search results). Here’s a case study.

But that’s an orthogonal issue: Is there something us blog bleeple can do to help with the situation, when both Google and WordPress are so uniquely useless in the area?

Uneducated as I am, I imagined that putting this in my robots.txt would help keep the useless results out of Google:

User-agent: *
Disallow: /author/
Disallow: /page/
Disallow: /category/

Instead this just made my Google Search Console give me an alert:

Er, OK. I blocked it, but you indexed it anyway, and that’s something you’re asking me to fix?

You go, Google.

Granted, adding the robots.txt does seem to help with the ranking a bit: If you actually search for something now, you do get “real” pages on the first page of results:

The very first link is one of the “denied” pages, though, so… it’s not… very confidence-inducing.

Googling (!) around shows that Google is mostly using the robots.txt as a sort of hand-wavy hint as to what it should do because the Calironia DMV added a robots.txt file in 2006.

It … makes … some kind of sense? I mean, for Google.

Instead the edict from Google seems to be that we should use a robots.txt file that allows everything to be indexed, but include a


directive in the HTML to tell Google not to index the pages insead.

Fortunately, there’s a plugin for that. But googling for that isn’t easy, because whenever you’re googling for stuff like this you get a gazillion SEO pages about how to get more of your pages on Google, not less. Oh, and this plugin seems even better (that is, it allows you to control what pages to noindex more pretty well).

So I added this to that WordPress site on March 5th, and I wonder how long it’ll take for the pages in question to disappear from Google (if ever). I’ll update when/if that happens.

Still, this future is pretty sad. Instead of flying cars we have the “Robots “noindex,follow” meta tag” WordPress plugin.

[Edit one week later: No changes in the Google index so far.]

[Edit four weeks later: All the pagination pages now no longer show up in Google if I search for something (like “site:totally-epic.kwakk.info epic”), so that’s definitely progress. If I just search for “site:totally-epic.kwakk.info” without any query items, then they’ll show up anyway, but I guess that doesn’t really matter much, because nobody does that.]