Noindex Redux

A month ago, I wondered whether there was any way to make those useless WordPress overview pages (i.e., category, author and “page X” pages) go away from search index results.

To recap, whenever I’m looking for something, Google has a tendency to return a result pointing to “page 35” of somebody’s blog, but when I go to “page 35”, what I’m looking for isn’t there any more, because it’s now on “page 49”.

To illustrate, here’s a search for a term that appears only once on this blog:

Yes, Google returns the blog post (“Small Change”), but as the final result, after two overview pages where you probably won’t find “rs232” when you click on those links.

So I added “noindex” entries for the overview pages on March 5th. It’s now been more than a month, so what does things look like now?

It’s better! The blog post (“Small Change”) is now the first result, and there’s only one overview page included in the result. So perhaps in another month or so, Google will have re-fetched all the pages and removed that, too.

(Note that Google isn’t that good at counting. Ten, two… who cares!)

Now, if only WordPress were to make “noindex, follow” the default on all the overview pages, then the world would be a (very very slightly) better place.

The Campaign Against Link Rot

This blog has been going for a while, and more and more of the very, very useful external links (ahem) now point to sites that have disappeared, or that have rearranged all their internal links.

This is sad.

I wondered whether there was a tool that’d just point all the broken links to archive.org, and there doesn’t seem to be. But the support over at the Broken Link Checker WordPress plugin seem to be on the ball, so perhaps there will be?

However, based on my extensive research (i.e., pointing about fifty broken links at the Wayback Machine and seeing what happened, about one third had results like):

And other things, like being blocked by robots.txt or… whatever.

So I was idly wondering… would it be possible to just… cache? Whatever we’re linking to? In WordPress?

And the answer is “no”, of course, because creating a mirror of a web page is Trey Difficult. Not to mention a security nightmare. But then it occurred to me that we’d get almost there by just grabbing a screenshot of the page at the time a blog article is written, and then just stash that in the media library! It’s not as good as having the actual text and stuff, but it’s something. You can at least read it. It’s a low-cost, 85% solution to an annoying problem.

But what UX to use to display these captures? Footnotes? A side bar? And then a smart suggestion from irc: What about a hover thing?

So now, 90 minutes later: Tada! Here’s a link to the FSF that should be cached at the time of posting, and stored in my WordPress library. You should be able to get a hovering popup that you can click on to see the (very long) PNG.

I have not implemented this as a WordPress plugin, but in ewp instead, and with some added JS and CSS on the blog. It should be a plugin instead, of course, but I don’t have the stamina to write PHP any more. I tried googling for such a plugin, but I couldn’t find anything.

So: If anybody thinks this is a good idea, please do go ahead and write a WordPress plugin that does this thing.

It basically calls cutycapt (or any other headless web “screen capture” application), and then adds some JS on the “mouseenter” event of the link.

Does this seem useful? Annoying? Confusing?

The Google Audit

As I’m sure you remember perfectly, in 2012 (!) I did something silly (no really): I scripted a teensy thing that would check what was playing on the stereo, and then search Youtube for a video that matched that as best it could (based on artist name, track title and the length of the track), and then just play the video (without sound).

To use as the background for the tiny USB monitor in the hallway that displays the weather forecast.

YES I KNOW.

It’s stupid, it’s frivolous, and it consumes Youtube resources without Google earning any of them sweet, sweet ad dollars, so I was expecting it to be shut down, or I’d just grow tired of it, or…

I mean, it’s… all kinds of stupid? Right? I agree completely. No argument there.

Over the years, there’s been some restrictions: Rate limiting on the API, and rate limiting for the website itself, and I had to fill out some forms about what I’m using it for, and… But basically, it’s been doing its stupid little thing.

Cut to August 2019:

Dear YouTube API Developer,

We are currently conducting a mandatory compliance review of your YouTube Data API Project. The review is to assess your compliance to our YouTube API Services Developer Policies (link) and to learn about how our service is being used.

At your convenience in the next seven (7) business days, please complete and submit the following information :

1 A fully functional demo account, including a username and password with which we may access your API Client. The demo account you provide will be used only for compliance inspection and the credentials will not be shared.

2 A fully completed Youtube API Audit Form

3 Screenshots of how your API Client and its users access and use the YouTube API Services

4 Documents relating to your implementation, access and use of YouTube API Services

I got the lovely email above, and I assumed that this was a very clumsy phishing attack. I mean… a demo account? With a password? Could it be more obvious?

So I ignored it, and then got further emails, and after the third “third and final notice” (I think?) I looked closer at the emails and confirmed that the address was really from @youtube.com, without any Unicode homographs, and it’s DKIM signed, and…

IT”S A REAL EMAIL FROM GOOGLE! I couldn’t believe it.

But I finally answered, and got a response from:

Which was also real! And not a phishing attack. It asked:

Regarding project key usage:

The given alphanumeric text [1 only] cannot be deciphered. Please provide us with a list of valid project keys associated with your API Client.

In order to check the project key for your API Client please login to Google API Console. After logging in go to IAM & admin -> settings -> project key.

OK, so I did that, and:

So there’s no project key? I wondered why they couldn’t just, like, look up this stuff themselves. And particularly since there’s no “project key” (whatever that is)… They should know already? Is this phishing after all? Are all those characters in @google.com really ASCII? They are.

After a few attempts at making it understood that I’m not running a web site; there’s no login; there’s no users: There’s just a stupid script running on my hallway computer, they asked to see a screencast of how it works.

Meanwhile, in the middle of all this, they stopped my access to the API, so I had to substitute a hard-coded video to play:


So… I guess… I’ll just wait…

Misunderstand me correctly: I’m not complaining or anything. I’m just… bemused. I mean, it’s just a stupid, fun little thing, and if Google says “er, perhaps don’t do that with the API?” then that’s fine. It’s their API. And I don’t envy those poor people working on the “dispute resolution” team. They probably have a script they’re running through to see whether the next Cambridge Analytica is doing something nefarious with the Youtube data (or at least have a way of saying, during the next Senate hearings, that they are doing something about that), and dealing with pissant hobbyists just using their APIs for fun is… probably not that fun?

It’s just… There’s a sort of disconnect. Whoever came up with this audit thing obviously didn’t have an option for “4c) Not using the API for anything that can even be audited because it’s just stupid”, which I think is probably 70% of the use cases. Because people do stupid shit.

So I’m amused. Bemused?

Am I Bemildred? I think I may be Bemildred. (He’s the one on the right.)

Meanwhile, I can’t have the background of the monitor all blank and stuff. So I’ve substituted it with this wonderfully glitched broken torrent download:

Uses less bandwidth, too.

So You Want To Run Your Own Mail Server…

Whenever the subject of running your own mail server comes up, there’ll always be two people who chime in.

The first will say “No, don’t do it! It’s a virtually impossible thing to do these days!”

The second will say “Don’t listen to that guy! It’s trivial! I just installed one and I had no problems!”

Both are right, and both are wrong: It’s very much possible to run your own MTA (Mail Transport Agent), but there’s a lot of steps required if this is to be successful (where “successful” is here defined as “Gmail won’t put your email in the spam folder”). Every one of these steps are trivial, but you have to suffer through many TLAs and ETLAs concepts that there’s really no point in knowing anything about.

I mean, just to run a mail server.

So I’ve written a script that does it all for you. Other than the DNS changes that you’ll have to do, but it’ll tell you exactly what you have to put where.

The script requires a Debian or Ubuntu host, and it’ll install a whole lot of stuff. If you already have a partial mail server setup, it may break the server. Nothing’s guaranteed.

After running it, you’ll have a host with (ETLAs incoming) the exim4 SMTP server (with TLS, DKIM signing, ClamAV and Spamassassin), Dovecot IMAP (with TLS), SPF, DMARC and certificates from Let’s Encrypt.

If you’re a spammer (I mean “marketing expert”), please go away. This is not about being able to send out newsletters; it’s for people who want to run their own mail servers for whatever reason (for instance, to avoid the monoculture of Gmail).

tl;dr:

Download the script

curl -O https://raw.githubusercontent.com/larsmagne/make-mta/master/make-mta.sh

and then run it. Follow the instructions, and you’re done.

Longer version:

Running a mail server doesn’t take much work: After it’s up it, it’s probably going to stay up. However, I’m going to go through all through all the steps the script does, so that if you’re interested you can understand how it all fits together.

This is going to be overly detailed: I’m going to go through everything, point by point, from provisioning a server to configuring your mail client. Prepare to scroll.

The assumptions are: You own a domain, and you want to send out mail from that domain, and you want to receive mail for that domain. In my case, I have the domain “eyesore.no”, and I’ll be using that throughout all the examples.

First of all, you need a name for the server. “mta” is nice, so I’ll call mine “mta.eyesore.no”. Now that the difficult part is over (“there’s only two difficult things in IT: Naming, caching and off-by-one errors”), you need a server.

Any hosting provider is fine: Let’s go with DigitalOcean. After you’ve gotten an account there, click “New Project”, give it a name, and then “Create”.

Create a Ubuntu LTS image; the cheapest one they have. An MTA requires virtually no resources, but if you expect to keep a lot of incoming mail on your IMAP server, you may want one with more storage than 25GB.

These days, it’s nice to have IPv6 support, but it doesn’t really matter much.

You need a way to log in. I strongly recommend adding your public ssh key here, but you can also use a password.

And here’s where you enter that name you came up with earlier.

Then create the server (or “droplet” as DigitalOcean cutesely calls it).

Within a few seconds, your server is created, and that number is the IPv4 address of the server (and the unhelpfully shortened IPv6 address).

Now go to your DNS provider (and you’ll be doing a lot of things here, so keep this window open) and add some resolving.

I’m using Cloudflare, because it’s… nice?

That’s an “A” record for the IPv4 address…

… and an AAAA record for IPv6.

Now you can ssh to the server:

Download the script

curl -O https://raw.githubusercontent.com/larsmagne/make-mta/master/make-mta.sh

and then run it:

This will do basic stuff like “apt upgrade”. Nice to have the system somewhat up-to-date.

This will install and enable the ufw firewall, and open the ports we need to run an MTA (ssh, smtp, imaps, http). fail2ban is also installed.

The script will then try to figure out the host name based on the IP address. If it can’t, then it means that reverse DNS wasn’t added automatically, and you have to add a PTR DNS record for the IP address (both IPv4 and IPv6, if you have that). All MTAs need to have reverse IP records, otherwise many MTAs will refuse to accept mail from the server.

This will start a standalone http server and use the Let’s Encrypt servers to acquire a TLS certificate. Answer the questions it asks.

Next, it’ll fetch and configure exim itself (along with SpamAssassin and clamav), and set up DKIM. (DKIM is a way to sign mail so that others can verify that it comes from your mail server.)

This server accepts mail for you, so you need a way to read it. IMAP is the preferred way for most. This IMAP server will also use the Let’s Encrypt certificates to authenticate the connection.

We’re done! That is, we only have to make these DNS changes that the script has summarised. So let’s do them, one by one, in the Cloudflare interface.

First, the DKIM public key. This will allow other MTAs to verify that an email came from this MTA.

Then the SPF data. This says that your MTA is allowed to send mail from your domain.

Then the DMARC data. This says that you should do with failures from DKIM and SPF. (Note that this DMARC policy is very relaxed; you may want to make it more strict.)

Finally, add an MX record for the domain. This means that all incoming mail for the domain will go to this mail server.

Now the server is all set up… except that you probably need a user that’ll receive email there.

You may want to choose a different name (if you have a different name).

Let’s see what the mails look like now:

Mail-Tester is a convenient tool to test how it all went.

Perfect!

Even Gmail can’t complain about your mail now.

Well… unless they do. They’ll probably find some other hoops to run through any day now, but for the moment you should be OK. Oh, and if you’ve provisioned a server that happened to get the IP address of a previous notorious spammer, you may find that your mail gets tagged as spam, anyway. In practice, if you use a reputable hosting service, this isn’t a big problem, but it can happen. If that’s the case, try again and get a new IP address.

Anyway, it might be helpful to also show how to connect to the MTA, right? As an example, here’s Evolution:

Basic info.

IMAP on port 993 (with TLS).

Outgoing email on port 465, with TLS and authentication. Done!

The reason I started thinking about writing this script (and blog post) is twofold: I think it’s a shame that we’re devolving from a decentralised infrastructure for email to a centralised one (i.e., Gmail). We can all see where Gmail is heading — towards a total silo, but it hasn’t quite gotten there yet, because they still have to interoperate somewhat with the rest of the world.

The other is that it’s really annoying that (apparently) nobody has done this before. It’s not like any of the things that the script does is difficult: It’s just that if you don’t know what you’re supposed to do, you can spend hours Googling around, and you’ll mostly get outdated information. For instance, if you search for “exim authentication”, you’ll find this official-looking page that gives horrendous tips like “You also need to add the Debian-exim user into the shadow group, so as to give Exim access to /etc/shadow via PAM.”

No! You should absolutely not let the exim process read the /etc/shadow file, because that’s a way to escalate bugs in the exim server.

And so on.

The script is on Microsoft Github, and pull requests are very welcome. I’m sure there’s many things that can be improved.

Outgoing DKIM and exim4

So, I sent an email to my sister, and I didn’t hear back. After exchanging some SMS-es, it turns out my mails went to the spam box on Gmail.

Rude!

That’s a new development for my MTA (quimby.gnus.org), so I tried poking around seeing whether I’d ended up in a blacklist or something. But, no, apparently Gmail now sends you to the spam hole if you don’t have DKIM on your outgoing email, and the moon is in the wrong alignment with Saturn.

So I thought “well, enabling outgoing DKIM on exim4 on Debian is surely just one command?” Right?

After googling around, I found… nothing. I did find a bunch of howtos, but they were obviously outdated because the referred to stuff that no longer exist.

So if you’re in my situation, and really don’t want to know anything about DKIM beyond AAARGH GMAIL THINKS I”M A SPAMMER, I’ve written a teensy script that should get you up and going.

At least at the time of writing, and with the current version of Debian Stable. (I’m writing this at on March 22nd, 2020, at 23:43. If you’re trying to set this up any time later than that, you probably have to tweak stuff a bit.)

And look what Mail Tester says now:

Gawrsh.

Of course, my email is now so authenticated and secure that if my MTA goes down, I’ll never be able to send a single mail ever again.