Site icon Random Thoughts

Search Ranking Tweaks on kwakk.info

The search engine on kwakk.info (the comics research site) uses a very simple ranking algorithm. For instance, if you want to see if anybody has talked about Batman in conjunction with (Dirty) Harry, you might lazily type batman harry… but then the first four hits don’t really talk about that.

That’s because the search engine first finds all pages that have batman and harry, and then ranks them by how many instances of both these words there are. And that’s it.

But the search engine does allow using operators like ADJ and NEAR to say “just give me results where these words are (respectively) after one another (with some words in between) or just near each other in general”.

No users can be expected to know that, so I wondered whether I could just do three queries and then smush the results together. Tada:

Now the search engine first does ADJ, then NEAR, and finally (as before) AND, and uses that to give a better ranking. The number of results is the same as before — the only thing that’s affected is how the results are ordered.

Now hopefully with more relevant stuff towards the top.

This makes searches a bit slower… and I may have screwed something up, because the rewrite wasn’t altogether trivial, so let me know if you see any oddities.

Oh, and:

So close to 11K!!!

Exit mobile version