Twiddling Youtube; or, I mean, Innovations in Machine Learning

larsmagne23

7 years ago

I mean, we’ve all been annoyed when we set up our USB monitor in our hallway that displays weather data, and then we decided to show videos from Youtube that somehow relate to the music that’s playing our apartment; we’ve dreamed of having something like the following catch our eyes when passing by on the way to the kitchen.

Oh, what a marvellous dream we all had, but then it turned out that most of the videos that vaguely matched the song titles turn out to be still videos.

So many still photo videos. So very many.

I mean, this is a common problem, right? Something we all have?

Right?

Finally I’m writing about something we all can relate to!

So only about five years after first getting annoyed by this huge problem, I sat down this weekend and implemented something.

First I thought about using the video bandwidth of the streaming video as a proxy for how much liveliness there is in a video. But that seems error prone (somebody may be uploading still videos in very HD and with only I-frames, and I don’t know how good Youtube is as optimising that stuff), and why not go overboard if we’re dipping our feet into the water, to keep the metaphor moist.

So I thought: Play five seconds of a video, taking a screenshot every second, and then compare the snapshots with ImageMagick “compare” and get a more solid metric, and I can then check whether bandwidth is a good proxy, after all.

The “compare” incantation I’m using is:

compare -metric NCC "$tmp/flutter1.jpg" "$tmp/flutter2.jpg" null:

I have no idea what all the different metrics mean, but one’s perhaps as good as another when all I want to do is detect still images?

So after hacking for a bit in Perl and Bash and making a complete mess of things (asynchronous handling of all the various error conditions and loops and stuff is hard, boo hoo, and I want to rewrite the thing in Lisp and use a state machine instead, but whatevs), I now have a result.

Behold! Below I’m playing a song by Oneohtrix Point Never, who has a ton of mad Youtube uploaders, and watch it cycle through the various hits until if finds something that’s alive.

Err… What a magnificent success! Such relevance!

Oh, shut up!

*mumble*

But let’s have a look at the data (I’m storing it using sqlite3 for convenience) and see whether videos are classified correctly.

I’m saying that everything that “compare” gives a rating of more than 0.95 is a “still image video”. So first of all we have a buttload of videos with a metric of 0.9999, which is very still indeed.

0.9999	yAZrDkz_7aY	36170
0.9999	yCNZVvP7cAE	150241
0.9999	yai4bier1oM	128630
0.9999	yt1qj-ja5yA	476736
0.9999	yxWzoYQb5gU	244076
0.9999	z1YKfu5sD24	723392
0.9999	z28HTTtJJEE	372014
0.9999	zOirMAHQ20g	574614
0.9999	zWxiVHOJVGU	70909

But the bitrates vary from 36kbps to 723kbps, which is a wide range. So let’s look at the ones with very low metrics:

0.067	slzSNsE7CKw	1359008
0.1068	m_jA8-Gf1M0	2027565
0.1208	7PCkvCPvDXk	1702924
0.1292	zuDtACzKGRs	3969219
0.1336	VHKqn0Ld8zs	1607430
0.1603	Tgbi3E316aU	1877994
0.2153	ltNGaVp8PHI	506771
0.2192	j14r_0qotns	683650
0.2224	dhf3X6rBT-I	1715754
0.2391	WV4CQFD5eY0	416458
0.2444	NdUZI4snzk8	2073374

Very lively!

These definitely have higher mean bitrates, but a third of them have lower bitrates than the highest bitrated (that’s a word) still videos, so my guess was right, I guess. I guess? I mean, my hypothesis has proven to be scientifically sound: Bitrates aren’t a good metric for stillness.

And finally, let’s have a peek at the videos that are around my cutoff point of 0.95 (which is a number I just pulled out of, er, the air, yeah, that’s the expression):

0.9384	t5jw3T3Jy70	802643
0.9454	5Neh0fRZBU4	1227196
0.9475	ygnn_PTPQI0	1907749
0.949	XYa2ye4GPY8	84848
0.9501	myxZM9cCtiE	1202315
0.9503	lkA9BRDWKco	297490
0.9507	mz91Z2aRJfs	203855
0.9512	IDMuu6DnXN8	358156
0.9513	bsFRMTbhOn0	198332
0.9513	v6CKHqhbos8	1686790
0.9514	3Y1yda0YfQs	1012911

Yeah, perhaps I could lower the cutoff to 0.90 or something to miss the semi-static videos, too, but then I’d also miss videos that have large black areas on the screen.

Hm… and there’s also a bunch of videos that it wasn’t able to get a metric on… I wonder what’s up with those.

1	pIBEwmyIwLA	349057
1	pzSz8ks1rPA	108422
1	qmlJveN9IkI	83383
1	srBhVq3i2Zs	1651041
1	tPgf_btTFlc	111953
1	uxpDa-c-4Mc	691684
1	uyI3MBpWLuQ	45383

And some it wasn’t able to play at all?

0	3zJkTILvayA	0
0	5sR2sCIjptY	0
0	E44bbh32LTY	4774360
0	FDjJpmt-wzg	0
0	U1GDpOyCXcQ	0
0	XorPyqPYOl4

Might just be bugs from when I was testing the code, though, and those are still in the database. Well, no biggie.

You can find the code on Microsoft Github, but avert your eyes: This is bad, bad code.

Anyway, the fun thing (for me) is that the video monitor will get better over time. Since it stores these ratings in the sqlite3 database and skips all videos with high metrics, I’ll wind up with all action all the time on the monitor, and the player doesn’t have to cycle through all the still-video guesses first.

See? The machine learns, so this is definitely a machine learning breakthrough.