Clownin’

I’ve had my servers in my employer’s data room since 1997, but (since that company doesn’t exist any more), I had to make some changes. I had planned on doing some coloc thing locally here, so I bought some semi-spiffy new servers.

But then I changed my mind. It all just seemed too much work: I mean, doing appointments for installing and/or fixing stuff, and I finally just went with renting servers here and there.

So yesterday I went and collected my servers and put them into storage, where they’ll probably remain until they’re too old to be useful and I can throw them away.

But where’s all my serverey stuff now? In the clown.

First of all, I put my WordPress sites on DigitalOcean $5 virtual machine instances. I can’t properly express how easy and straightforward that process is, but this guy can:

Even the API and the docs are so well built that it feelslike a pet project I found on Github. Where is all of the corporate nonsense cluttering up the API? Where is the overengineered factory templates where I have to set up a bunch of services using a totally different API before I can start my first VM? Why are the docs so straightforward and in one place in one format? This hardly feels like enterprise software at all.

Once you get past the slightly cutesy naming convention (“Droplets” and stuff), it’s all so easy and unconfusing. I went with pre-rolled WordPress images, and it comes with UFW firewall, fail2ban and certbot already set up. It’s perfect! And by that I mean, it’s exactly like it would have been if I’d done it myself. Except my image would also have Emacs pre-installed, of course.

(The reason I want WordPress on separate VM instances is that I assume that they’ll eventually be hacked. It’s WordPress, after all; the CMS with the most insane maintenance model imaginable.)

For my real servers, I went with Hetzner. Because it’s in Europe. My main server (my MTA and all my pet projects, of which there are many), is in Helsinki, and runs at €87 per month. Of that, €53 is the disk, which is a weird pricing model, since disks are inexpensive, but I guess that’s how they make money? It’s a physical server, because if it’s a VM, it’s probably hacked already, what with all the new Intel bugs that shows up every two weeks. The Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz server itself is just €34…

For the news.gmane.io server, I had to go more expensive. It’s a tradspool NNTP server, which means that every article takes one file. This is rather slow on spinning rust, so I had to find a configuration that could do that over an SSD RAID. Total cost of that is (+ 189 114 86) => €389 per month. Because it’s a AMD EPYC 7551P 32-Core Processor with 132MB RAM. A two core machine with 16MB RAM would have been fine, but Hetzner doesn’t have that in their lineup. This server is in Germany.

Hetzner seems fine. Getting the machines built and installed to my specifications took a day or so; if you can use one of the standard configurations, you can get one on the hour, apparently. The web interface is old-fashioned and clunky, but it gets the job done. DigitalOcean has a much better web.

But what about backup? I briefly considered just rsyncing everything home, and that would have been no problem. The problem is that if I ever need to use that backup, my upstream is teensy, so re-establishing a new server out there somewhere would take forever. (The Gmane spool is about 5TB.)

So I needed backup somewhere, and I chose OVH, because they’re another European company… and their interfaces are pretty primitive. For instance, when installing the initial image, it was all stuff like:

and

which would hang for hours until I reloaded the web page. It doesn’t give you a lot of confidence. But, what the hey, it’s just backup, anyway. And it’s $175 per month.

And… I just tried logging on to the OVH web site, and it said my credentials were invalid. And then when doing password recovery… I’m not getting any email (after waiting for 15 minutes).

*sigh*

So, no, I wouldn’t really recommend OVH much, but the server itself works fine, and I get about 200Mbps when doing backups from the server in Germany to the OVH server in London. (Gotta be spread out geographically! For no reason!)

None of the servers seem to have any hidden bandwidth fees or anything, which is definitely not the case with the big American players (AWS and the like), where figuring out how much it all is going to cost is a full time job.

Having physical access to the servers definitely feels a lot safer: If I screw something up and the servers then won’t even boot, I can always fix that if I have physical access to them. If I screw up these servers, I do have some limited console access (the Helsinki Hetzner one seems to require that I have somebody there physically attach something to the server first!), but it’s definitely not the same as having access. So having very up-to-date backups is the name of the game, so that I can move to a new server fast-ish if the old one it unrecoverable. The Gmane news spool has continuous backup of the articles (they’re being fed out to the backup server with a couple of seconds delay), so nothing should be lost there, but it’ll take some time to rsync it all to a new server, I guess.

Anyway!

Modern life, man.

news.gmane.org is now news.gmane.io

As previously discussed, the gmane.org domain was no longer viable, and the NNTP server has now moved to news.gmane.io.

Likewise, mailing list subscriptions have been moved from m.gmane.org to m.gmane-mx.org.

As of this writing, neither service is up, because I’m doing the final resync before restarting the services on a new server. I expect the services to be back up again about 21:00 GMT (January 15th 2020), so don’t panic before that time.

DNS changes may also take some time to propagate.

[Edit at 15:30 GMT: I had misremembered how long the rsync took, so we’re now live six hours ahead of schedule. This Shouldn’t Possibly Happen. I mean, a computer project not being late. Anyway, both Gmane and Gwene feeds are now processing, but the news-to-mail bits aren’t up yet.]

OK, service announcement done, so I thought I’d write a bit about what happened the last week:

First of all, thank you for all the nice comments and PVT EMAILs of support. I wasn’t quite sure whether to continue running the NNTP server, but getting some feedback helps.

So then I started moving 15K mailing lists from a subscription on gmane.org to the new domain, and that was… er… interesting? The process works like this:

The Gmane configuration is a file that has one of these entries per list:

gmane.test   gmane-test@quimby.gnus.org
  Testing the Gmane hierarchy
  mailman gt-gmane-test
  validated=2020-01-13
  transfer=done
  crosspost-posting=no

I’ve got an Emacs mode to do a the maintenance work (subscribing, unsubscribing and the like), so I utilised that to write a function that would send out two unsubscription messages (because lists may be subscribed as @gmane.org or @m.gmane.org for historical reasons and there’s no record of which one) and one subscription message for the new @m.gmane-mx address. This bit is fully automated, so I could just sit there watching Emacs send out messages at a somewhat speedy clip. (Well, sending out all the messages took 12 hours in total due to how it’s done: It’s actually doing RPC via NNTP so that the messages are sent directly from the MTA instead of from me, because that looks less spammy.)

(While looking at Emacs doing this bit, I watched Witcher, which was surprisingly entertaining… in parts, and really, really tedious for the rest of the time. And, since it was Netflix, it looked cheap and shoddy.)

I did this in 1K batches, because when I’ve triggered this bit, all messages for the group in question go to a special gmane.admin group, so that I can see all the error messages and stuff, but most importantly: The “reply to confirm subscription” messages, which I then have to respond to (from Gnus). That’s semi-scriptable, but when the “please reply” message is in Chinese, I have to kinda guess.

Then after that, the “Welcome to foo” messages start pouring in, and again, handling the ones in English is fine, but then there’s all the other languages. I know Willkommen, and I can guess at bem-vinda and bienvenu, but Japanese is not my forte, so more guessing is involved.

So this took two days, and for the second day I watched both seasons of Fleabag (the first one is really fun and original and weird, and in the second one they removed everything that was interesting about it and made it into a normal boring dramedy, which explains why it’s on all the “best of” lists of 2019, and even won the Golden Globe. Well played!).

Fun bits from this process: If you’re sending out mail to email addresses that may not longer exist, you’ll end up being branded a spammer, but that wasn’t really much of a surprise.

What did surprise me was that Sourceforge has made it impossible to sign up to mailing lists via email, so we either had to abandon the 2K Sourceforge lists (I know! So many! I had no idea) or do something… semimanually. So I wrote a little bit to open Firefox on each list URL, which put the unique gmane-mx.org into the X selection, so doing each list was “Super-s Right-Mouse TAB TAB SPACE TAB SPACE RET Super-TAB”, and I could do one in five seconds without moving my hands from the keyboard (it’s got build in mouse buttons).

But then it turns out that if you do that a lot, Sourceforge will sic the old-fashioned “click on the three palms” captcha on you. Which took the throughput way, way down.

So for the first half of the Sourceforge lists, I utilised the power of crowdsourcing, and sent off 100 lists each to people who volunteered to do this mindless and boring bit. (Some came back for seconds!) Thank you all again for volunteering.

For the second half, I discovered the wonderful world of captcha solvers, and after signing up with a free wit.ai API key, it worked pretty reliably. So I did most of the second day myself, since it just added a new “TAB RET” bit to each list. (I watched the Alan Bennett at the BBC box set for this part. It’s quite extraordinary. I particularly loved the one with Mrs Bucket in the hospital… and the one with the photographer in the churchyard… and the one with the guy who retires, is unhappy about the retirement, and then gets a stroke and dies.)

Uhm… Any other observations? Ah, right; the IETF MTAs rate-limited me to about one email per ten minutes, so it took four days for all the un/resubscription messages to get through, and some probably just timed out, so I should do another sweep of those bits.

OK… that’s it? Now I just have to wait for the rsync and the DNS changes, which is why I’m going on for this long.

Let’s hear it for TV and wine; two invaluable companions when doing boring semi-manual labour.

Entering the Clown

I’ve always been the self hosting kind of guy (i.e., old), but with recent changes I’m trying to simplify and move things around.

I’m not quite sure where I’ll end up with my main server(s) yet, and I’m testing out various things, but for my one self-hosted WordPress instance, I thought I could try something clownish.

I’ve used WordPress.com for this blog since forever, and I’m probably not going to change that, because it’s nice and easy and I don’t have to do any admin work at all. However, hosting on WordPress.com has some severe limitations, like not being able to put s into the pages, or use my own Javascript to make things… more fun.

So earlier this year I installed WordPress on my main server, and… It was just kinda painful to try to make that even remotely secure? I mean, you have to give the web server write access to its own PHP files? I mean, that’s the number one thing you’re never ever supposed to do? That means that any tine error in any plugin could lead to a convenient sploit on your server?

Trying to mitigate that was just a mess, so I thought that I’d at least move that blog somewhere else, so when I move my main server, I can forget about doing WordPress on the new location.

So I, at random, went with DigitalOcean, because it looked so simple: It even has a one-click way to install a new “droplet” (which is their cutesy name for a virtual server) with a complete WordPress install, with UFW (firewall) and basically everything… just there.

And it worked!

Reader, I am now in the clown with my Pacific Comics blog.

It was very painless. It took a while for it to import the media from my old self-hosted blog, but everything else worked way better than I had expected. And best of all, it’s pretty buzzword free: It’s just a virtual server that I can ssh into and do whatever, if I should so choose.

I feel so modern! Just a decade or two after everybody else!

Linux Can 4K @ 60 Haz

I tried getting 4K @ 60Hz using Intel built-in graphics, and I failed miserably.

Rather than spend more days on that project (yes, this is the year of Linux on the TV, I Mean Desktop), I bought a low-profile Nvidia card, since there are several people on the interwebs that claim to have gotten that to work.

It’s brand new, cheap and fanless, too: A Geforce GT 1030, which is Nvidia’s new budget card, launched last month, I think.

It’s finnier than Albert and takes up several PCI Express slots.

However, that’s not really a problem in this media computer: Lots and lots of space for it to spread itself out over. Just one PCI Express slot, though.

But it’s on the long side: If I had any hard disks installed in this machine, I would have had to get creative. Instead I just removed that HD tray thing.

But! There’s two … capacitors down there where the PCI Express “key” thing. Like just quarter millimetre too much to the right…

I bent them ever so gently over and I managed to get the card in. How utterly weird.

SO MUCH DRAMA!

Anyway: Mission complete. This card has a DVI plug in addition to the HDMI, but I’m not going to use that, so I just left it with the protective rubber.

See? Lots of space. Of course, it would have been better to remove the cooler completely and hook it up via heat pipes to the chassis, but… that’s like work.

But did this solve my problems? After installing Nvidia’s proprietary drivers (apparently Nouveau doesn’t support the GT 1030 yet, since it’s a Kepler card)…

Yes! 3840×2160 @ 59.95 Hz, which is pretty close to 60Hz. Yay!

Of course, I have no 4K material on my computer, so the only thing that’s actually in 4K now is my Emacs-based movie interface. Here’s whatsername from Bewitched in 2K:

Eww! How awful! Horrible!

See! 4K! How beautiful!

(Let’s pretend that the entire difference isn’t in the different moire patterns!)

*phew*

And the Futura looks much better in 4K too, right?

Right?

This was all worth it.

One Thing Leads To Another

In the previous installment, I got a new monitor for my stereo computer.

I thought everything was fine, but then I started noticing stuttering on flac playback. After some investigation, it seems as if X is on (and displaying stuff on this new, bigger monitor), and there’s network traffic, then the flac123 process is starved for time slices, even if the flac123 process is running with realtime priority.

*sigh*

Now, my stereo machine is very, very old. As far as I can tell, it’s from 2005, and is basically a laptop mainboard strapped into a nice case:

(It’s the black thing in the middle.) But even if it’s old, its requirements hadn’t really changed since I got it: It plays music and samples music and routes music to various rooms via an RME Multiface box. So I was going to use it until it stopped working, but I obviously can’t live with stuttering music and I didn’t want to spend more time on this, so I bought a new machine from QuietPC.

There’s not a lot inside, so I put the external 12V pad into the case. Tee hee. Well, thermally that’s probably not recommended, but it doesn’t seem to be a problem.

Nice heat pipes!

Look how different the new machine is! Instead of the round, blue LED power lamp, it’s now a… while LED power lamp. And it’s about 2mm wider than the old machine, but you can’t tell unless you know and then it annoys the fuck out of you.

OOPS!

Anyway, installation notes: Things basically work, but Debian still haven’t fixed their installation CDs to work on machines with NVMe disks. When it fails to install grub, you have to say:

mount --bind /dev /target/dev 
mount --bind /dev/pts /target/dev/pts 
mount --bind /proc /target/proc 
mount --bind /sys /target/sys 
cp /etc/resolv.conf /target/etc 
chroot /target /bin/bash 
aptitude update a
ptitude install grub-efi-amd64 
update-grub 
grub-install --target=x86_64-efi /dev/nvme0n1

Fixing this should have been kinda trivial and warranted fixing, wouldn’t you think? But they haven’t, and it’s been that way for a year…

Let’s see… anything else? Oh, yeah, I had to install a kernel and X from jessie backports, because the built-in Intel graphics are too new for Debian Stale. I mean Stable. Put

deb http://ftp.uio.no/debian/ jessie-backports main contrib

into /etc/apt/sources.list and say

apt -t jessie-backports install linux-image-amd64 xserver-xorg-video-intel

although that may fail according to the phase of the moon, and I had to install linux-image-4.9.0-0.bpo.2-amd64 instead…

And the RME Multiface PCIe card said:

snd_hdsp 0000:03:00.0: Direct firmware load for multiface_firmware_rev11.bin failed with error -2

I got that to work by downloading the ALSA firmware package, compiling and installing the result as /lib/firmware/multiface_firmware_rev11.bin.

Oh, and the old machine was a 32 bit machine, so my C programs written in the late 90s had hilarious code like

(char*)((unsigned int)buf + max (write_start - block_start, 0)

that no longer happened to work (by accident) on a 64 bit machine. And these programs (used for splitting vinyl albums into individual songs and the like) are ridiculously fast now. The first time I ran it I thought there must have been a mistake, because it had split the album by the time I had released the key ordering the album to be split.

That’s the difference between a brand new NVMe disk and a first generation SSD. Man, those things were slow…

And the 3.5GHz Kaby Lake CPU probably doesn’t make things worse, either.

Vroom vroom. Now I can listen to music 10x faster than before. With the new machine, the flac files play with a more agile bassline and well-proportioned vocals, with plenty of details in a surefooted rhythmic structure: Nicely layered and fairly large in scale, but not too much authority or fascism.

Also: Gold interconnects.

October 5th

Dear Diary,

today the LSI MegaRAID SAS 9240-8i card finally completed building the RAID5 set over five 1TB Samsung SSDs.  It only took about 18 hours.

So time to do some benchmarking!  I created an ext4 file system on the volume and wrote /dev/zero to it.

DSC00890

Err…  40 MB/s?  40MB/s!??!  These are SATA3 6Gbps disks that should have native write speeds of over 400MB/s.  And writing RAID5 to them should be even faster.  40 MB/s is impossibly pathetic.  And the reading speed was about the same.

If I hadn’t seen it myself, I wouldn’t have believed it.

Dear Diary, I finally did what I should have done in the first place: I binged the card.  And I found oodles of people complaining about how slow it is.

This is apparently LSI’s bottom-rung RAID card.  One person speculates that it does the RAID5 XOR calculations on the host side instead of having it implemented on the RAID card.  That doesn’t really account for how incredibly slow it is, though.

I think LSI just put a lot of sleep() calls into the firmware so that they could have a “lower-end” card that wouldn’t compete with the cards they charge a lot more money for.

I went back to the office and reconfigured the SSDs as JBOD, and then I created an ext4 file system on one of them, and then wrote /dev/zero to it, just to see what the native write and read rates are:

DSC00891

DSC00892

Around 400 MB/s.  It’s not astoundingly good, but it’s not completely pitiful, either.  These disks should do over 500 MB/s, but…

Then I created a soft RAID5 over the disks.  How long would it take to build it?

DSC00893

80 minutes.  That’s better than 16 hours, but it seems a bit slow…

Turns out there’s a SPEED_LIMIT_MAX that needs to be tweaked.

DSC00894

With that in place, I get 300 MB/s while building the RAID.  51 minutes.

DSC00895

DSC00896

And it sustains until it’s done, which it has while I was typing this.

Now to check the real performance…

Making the file system was really fast.  It peaked at 1GB/s.  Writing a huge sequential file to the ext4 file system gives me around 180 MB/s.  Which isn’t fantastic, but it’s acceptable.  Reading the same sequential file gives me 1.4 GB/s!  That’s pretty impressive.

It’s incredible that the LSI MegaRAID SAS 9240-8i is 6x-25x slower than Linux soft RAID.  Even if the card offloads the XOR-ing to the host CPU, it still doesn’t explain how its algorithms are that much slower than md’s algorithms.

Anyway: Avoid this card like the plague.  It’s unusable as a RAID card if you need reasonable performance.  40 MB/s is not reasonable.

October 4th

Dear Diary,

today was the day I was going to install a new SSD RAID system for the Gmane news spool.  The old spool kinda ran full four months ago, but I kept deleting (and storing off-spool) the largest groups and waddled through.

I had one server that seemed like a good fit for the new news server: It had 8 physical disk slots.  But the motherboard only had six SATA connectors, so I bought an LSI MegaRAID SAS 9240-8i  card.

DSC00850
Installing the 2.5″ SSDs in a 3.5″ adapter.  Five of the disks
DSC00851
So screwed. I mean, so many screws
DSC00853
I decided to add 2x 4TB spinning mechanical disks to the remaining two slots for, uhm, a search index perhaps?
DSC00854
Oops. I forgot to film the unboxing
DSC00849
Ribbed for your coldness
DSC00852
A tower of SSDs
DSC00857
All seven disks installed in their Supermicro caddies

DSC00859
Look at that gigantic hole, yearning to be filled.
DSC00860
Uhm… these captions took an odd turn back there…
DSC00861
I pull the server out of the rack and totally pop its top
DSC00862
Look! Innards!
DSC00863
Err… is that a photo of the RAID card? I think it might be…
DSC00864
All wired up
DSC00865
The disk backplane had six slots already hooked up, so I viciously ripped three of the connectors out
DSC00866
And plugged five of the connectors from the RAID card back in
DSC00867
And then my biggest fans were reinstalled. Thank you thank you

Now the hardware was all installed, but getting to the LSI WebBIOS configuration was impossible.

DSC00871

I hit Ctrl+H repeatedly while booting, but I always just got the Linux boot prompt instead.  I binged around a bit, and it turns out that if any other non-RAID disks are present in the system, the WebBIOS won’t appear at all.

So I popped the three spinning disks out of the machine, and presto:

DSC00872

I configured the RAID5 over the five 1TB SSDs.  This would give me about 4TB, which is twice what the current news spool has.

However, building the RAID seems to take forever:

DSC00874

WTF?  2% in 20 minutes?

These are SATA3 SSDs.  Read/write speed is over 500MB/s.  That means that reading or writing a single disk should take under 30 minutes.  Since the card can access all the disks independently, computing the RAID5 XOR shouldn’t take more than that.

But let’s be generous.  Let’s say it has to read the disks sequentially, and write the parity disk sequentially.  That’s 2.5 hours.

Instead it’s going to take 16 hours.  WTF is up with that?  Does the LSI MegaRAID SAS 9240-8i have the slowest CPU ever or something?

That’s just unbelievably slow.

Diary, I’m going to let it continue building the RAID and then do some speed tests.  If it turns out that it’s this slow during operation, I’m going to rip it out and just do software RAID.