A lot of websites today track your clicks within their pages and then come up with recommendations for you. Amazon does it, Netflix deos it, several music websites do it (Spotify, Pandora, last.fm), even the San Diego Public Library does it. The idea is to get you to stay on the website longer, buying something more, or increasing advertising revenue or, in the last case, just getting you to use the library more, and perhaps read more.

The recommendation algorithm that I see in action most often is actually YouTube’s, and it gives me a perverse sort of pleasure to constantly notice that, in fact, it sucks!

If you think for a second about what a recommendation algorithm really does, it is trying to label your preferences. Maybe you use Amazon to buy cookbooks and cooking equipment, and someone else uses it mostly for children’s books and toys. So it makes sense to tell you about related cookbooks, and not children’s books, and the other way around for the other person. Maybe on last.fm you listen to Indian fusion groups, so it makes sense to recommend to you Prem Joshua, and not, say Paul Simon.

So far so good. The problem arises when someone likes a mixed bag of things. A good fraction of my time on YouTube is spent listening to choral music. In December of last year, I went overboard listening to a particular segment of the Hallelujah chorus — it was the culmination of the rising trumpet line at “king of kings” that I was looking for. I found several recordings of the complete Messiah oratorio on YouTube and diligently found the phrase within them and compared across recordings and generally had a ball. Net conclusion? YouTube is completely convinced that Messiah is the only thing I want to hear, ever. Not just choral music, but Messiah, specifically.

Come January, I was looking for a Marathi song, called “bagalyanchi maal phule”, and when I found it and started listening to it, what does YouTube put in the sidebar? John Rutter’s Candlelight Carol and, of course, Messiah. So I started to track how soon it would change that. If I kept listening to Indian music, eventually it would stop recommending choral music, correct? Well, perhaps. But that is unlikely. What is far more likely is that I’ll keep switching across genres and subgenres of music and maybe even throw in some movies, sport videos, or funny videos. And this is when the fun/headache begins.

The point is that my YouTube tastes (and everybody else’s too, I imagine) can probably be grouped into not one, but several different categories and subcategories. What the recommendation algorithm has to do is to first determine these categories, perhaps over many months of tracking user behaviour (i.e. by accumulating a massive number of clicks) and possibly doing a kind of clustering algorithm. And then, when it comes time to recommend, it has to determine what the user is in the mood for right now, which would be based on the clicks over, say, the last few minutes. Those clicks can help decide which category of the several is of interest currently, and recommendations would be based on that.

Thus the problem requires at least two different time-scales of operation. The long term memory of clicks is need to have an accurate user profile and the short term filter is needed so you are not still recommending Messiah from the overdose two months ago. This wouldn’t be perfect, because the user may in fact be looking for something altogether new, in which case the history isn’t useful anyway, and if that is detected, maybe the recommendation algorithm should be temporarily turned off, so as not to utterly annoy the user!

It seems that this is a machine-learning problem that is getting a lot of attention lately, not just for YouTube, but also for a lot of other websites, but like a lot of other machine-learning problems that are also getting a lot of attention lately (e.g. finding cats!), it feels like an improvement over the status quo should be more readily available. Or perhaps there is something about the difficulty of the problem that I am missing altogether. In any case, something worth looking in to.

See below for what YouTube is currently recommending when I play “bagalyanchi maal phule.” Christopher Hitchens and The King’s Singers. Oh well.