HTML, but not too HTML

When writing blog posts, I use ewp, an Emacs package to administrate WordPress. It offers an editing mode based on the revolutionary idea of just writing HTML.

Everything is cyclical in computing, so people move between writing things in raw HTML and using arcane and unholy systems, mostly based on some Markdown dialect. I understand the frustrations: It feels like there should be something that’s less annoying than using some WYSIWYG tool that invariably freaks out and ruins your post, or typing all that annoying HTML yourself, or using Markdown and then having to have some kind of build step.

In my opinion, Markdown is fine for writing README files, but if you’re writing blog posts, it just gets in the way. A blog post is mainly just paragraphs like the one I’m typing (and you’re reading) now, which is just text with no markup. Or there’s some slight formatting for emphasis or the like, but honestly, there’s not much difference between the HTML and Markdown versions for that.

Markdown is nice for headings and code snippets, but doesn’t really offer much useful for blog posts. And the things that blog posts need, which is images/screenshots and links: Markdown doesn’t help you much there.

Is that really better than the HTML version? And what, then if you need more stuff in the link?

It just gets worse and worse — what if you need to put more data into the links? The nice thing about HTML is that it’s well-formed and not very hacky — the more cruft you add to the HTML, the more unreadable it gets — but linearly. Markdown makes the easy stuff trivial, and the difficult stuff worse. (Here’s there the Greek chorus of “but you can just write HTML in Markdown” comes in, but that’s worse than just writing HTML in the first place.)

So: I write HTML, and Emacs takes care of displaying the images I’m linking to, so a blog post looks like this while I’m writing:

(To digress: I’ve noted over the years the many, many posts on HackerNews about statically generated blogs, and people have more fun spending time tinkering with their setups than actually writing blog posts, and that’s fine. But I’ve noted that virtually none of these systems have a mechanism for dealing with images in a natural way — because that’s just kinda hard. The nearest you get is “then you just create an S3 bucket and put the image there, and then you go to the AWS console to get the URL, and then you paste that into the Markdown here. See? PROBLEM SOLVED!!!” That’s why blog posts from all these people (random example) are almost always just walls of text.)

Anyway, here’s my problem:

YIKES! WHAT THE… Yes, I hear you.

To protect myself a bit against link rot, ewp screenshots everything I link to automatically. So on the blog, you can just hover over a link to see (and read, if you want to) what I was linking to at the time, and that will survive as long as my blog survives (while most of the things I’m linking to disappear, apparently).

But that means that I have to stash that data somewhere, and I stashed it in the links, which means that the HTML then becomes unreadable.

This is Emacs, however. What about just hiding all that junk?

Yes, that’s the same paragraph with the links hidden. And if I want to edit the links themselves, I can just hit TAB on the bracket:

And TAB again to hide:

Note that the links and stuff are still present in the Emacs buffer, so the normal Emacs autosave functions work perfectly, and there’s no danger of losing any data.

Similarly, the image HTML in WordPress can be pretty messy:

Because images have extra classes with their IDs, and you can click on images to get the full sizes, so they’re (almost always) wrapped in an <a>. Now, when writing articles, Emacs displays the images instead of the HTML, so we don’t see all that cruft anyway, but when editing image heavy articles, it can take some time to fetch the images, and we don’t want to be staring at junk like that while waiting for the images to arrive.

So let’s hide them like this:


And TAB can be used to cycle through the three different forms:

I think that looks kinda pleasant to work with…

Anyway, I think that’s as far as I want to go with hiding the HTML-ness of things. I mean, the temptation here is to start going in a more WYSIWYG direction, and translating <b>…</b> into bold text and all that sort of stuff, but… I’m more comfortable just looking at the tags?

So there you go: In the “just write HTML/no don’t write HTML” wars, I’m on “just write HTML but have the editor hide some of the worst of the cruft” tip.

Book Club 2025: Station Eternity by Mur Lafferty

“Despite this solve…”

Oh my gerd! This is awful! On a sentence by sentence basis, this is gruelling. “The kettle screamed its achievement of boiling water and Adrian jerked it off the element, wincing.” This is torture.

I got to page 30 before giving up, because the concept here sounds like it could be fun. So I checked Goodreads:

There are bad books. There are books that just aren’t for me. But this is the biggest train wreck of a book in recent memory.

Right:

Almost immediately, what we might have imagined as the main story thread on board is mostly thrust aside in favour of long, dull flashbacks filling us in on the characters on the shuttle back on Earth and their relationships to Mal. None of these characters are very interesting and it all feels like a massive distraction. Except that, as it turns out, the murder itself gets pretty much forgotten.

So… I’m ditching this. I mean, I like reading trash, but the trash has to be somewhat well written, at least.

Station Eternity (2021) by Mur Lafferty (buy new, buy used, 3.7 on Goodreads)

Book Club 2025: In the Company of Cheerful Ladies by Alexander McCall Smith

I’ve got a strange kind of cold this week — it doesn’t seem to get better or get worse, but just remains at a stage of me feeling slightly cruddy, so I’m picking books to read that go down easily. I’m not up for reading anything challenging.

I bought this book in 2005 and then didn’t read it. I’ve read the previous books in the series, but I guess I just decided that I’d read enough of these books? And then bought one more anyway, but never read it. But now I’ve brewed a cup of bush tea, so let’s go.

(I think I started drinking that stuff after I read the first book in this series?)

I think that people should write stories featuring whatever people they want to dream up. But it’s sometimes hard to completely disregard the paternalistic tone the white author here takes with his African characters. It’s often like… “eeeh?” (Such erudite critique.)

But this book sure goes down easy. We’re presented a number of low stakes mysteries, and most of them are solved. I tend to think that the author has a tendency to forget all the plot strands?

As is alluded to in this review of the next book in the series.

I did indeed enjoy reading this book while hacking slightly, but I don’t feel compelled to buy any further entries in this series. I see that there’s now 24 or these books? Geezes. Perhaps reading one of these books every 20 years is enough?

In the Company of Cheerful Ladies (2004) by Alexander McCall Smith (buy used, 4.09 on Goodreads)

Spam, Spam, WordPress and Spam

I was puttering around looking at WordPress spam, so I wondered just how much I get. So I altered the WordPress Statistics for Emacs package to grab spam comments, too, so that I see what I have, and…

It’s about 160 spam comments per day across the blogs. These are all caught by Akismet Anti-Spam, so they don’t really bother me, but I was just curious.

(No, really!)

From what I can see, the spam seems to be coming in waves — a few dozen Russian-language spams, and then a few dozen from Hairstyles VIP, and then several dozens that seems to be crypto scammers, and so on.

People are presumably paying for these bot spam campaigns, but since they’re all caught by Akismet, it doesn’t look like they’re getting their money’s worth? Possibly?

Anyway, the moral here is: It’s impossible to run a WordPress site (with comments enabled) without using Akismet.

Comment Spam Is Annoying Part XIV

I was looking at the WordPress statistics just now, and I saw that an old, obscure post had suddenly gotten popular.

Looking at the details, all the hits are from different IP addresses, and all the visitors come via Google! Over a 15 minute period!

So that’s obviously not real — it’s a botnet of some kind. The botnet is using a wide variety of User-Agent strings, all mapping to real browsers and not automated systems. And these are all using a (headless) browser, because the stats are triggered from Javascript, so just loading the pages doesn’t lead to a “view”.

And sure enough, looking at the “spam” tab in the comments overview, there’s a whole bunch of spam comments in this period. But oddly enough, there’s about 500 spam comments (on this article alone) over a two day period, and the other 464 comments did not trigger the stats counter. So… one gang of spammers are using a full (headless) browser, while the other spammers are being more efficient? I dunno.

(And also, all the IP addresses are different — but presumably they’re using a proxy that VPNs to random client IP address, so that doesn’t tell us anything.)

Anyway. Just another annoyance, and I guess there’s not much I can do to filter out traffic like this. (Looking at the Jetpack Stats, they also fail to identify this as bot traffic.) I just thought this was extremely mildly interesting? But whatchagonnado.

OK, back to reading books.