I just got jabbed again today, so I thought it was time to start going out to catch some concerts again. It’s been just a … year … since the last time, and meanwhile my Concerts in Oslo concert listing web scraper aggregation service hasn’t received a lot of love.
I mean — everything’s been shut down, so it’s been too depressing to contemplate.
So — CSID is based on web scraping, so naturally when people change the HTML of their sites, things need tweaking. (This happens less often than you’d think.) However, many venues have ditched their own web sites and just list stuff on Facebook…
And therein lies my tale.
Look at this event page:
It’s from my logged-in Firefox — it lists upcoming events as you’d expect. But my web scraper isn’t logged in, and wasn’t getting any data any more. Here’s why:
That snap is taken from an un-logged-in Firefox, and it only lists past events. It doesn’t even mention that there are any upcoming events. This apparently changed over the last few months?
Facebook wants to silo All The Information.
Well, I took that as a personal insult, so I wrote up a Selenium script which logs in, and then navigates to all the event pages, clicks “show more” until I get all the events, and then saves the DOM. The resulting script is on Microsoft Github. (Note: I don’t actually know Python, so I typed away randomly until I had something that vaguely worked.)
CSID also harvests little snippets to describe the events, and that bit of the script worked fine… if I ran it from Norway. The server it’s really running on is in Germany, and you can’t look at individual event pages there at all without logging in.
Well, for all I know, perhaps German law requires you to log in to Facebook before you can look at information about a concert? It’s possible!
And those snippets aren’t that useful, anyway, but… sure is annoying. Well, I could rewrite those bits to also use Selenium to log in, but… gah…
Anyway, CSID went from this sad state:
To this glory:
I didn’t know that apparently absolutely everything had opened again! PARTY!
Well, OK, I should probably wait a week until my 5G reception has gotten better.