Comments on: Netflix Weirdness

By: Kaleberg

Kaleberg — Fri, 28 Nov 2008 03:17:23 +0000

My impression is that SVD is just a starting point. There is lots of interesting stuff you can do once you have the basic decomposition. For example, you can do optimal rotations with things like the varimax algorithm. You can also do subspace selection which when combined with rotation can help pick out the signal from the noise. Of course, I’ve only used SVD for remote sensing analysis, so what the hell do I know other than I’ve yet to buy something Amazon has recommended for me.

By: andrew cooke

andrew cooke — Fri, 28 Nov 2008 02:37:48 +0000

You can get an intuitive feel for how missing data are a problem by considering how you might implement a naive calculation: calculate the average point; discard that dimension; repeat. Both the calculation of the average and the reduction in dimension become non-linear processes when you have missing data (the missing data have to be ignored; they cannot be simply treated as zero). That would strongly suggest that traditional efficient solutions based on linear matrix operations will not work.

Which is what various peopled have said above, really.

By: Scott Martens

Scott Martens — Thu, 27 Nov 2008 16:47:38 +0000

I haven’t looked at any of the research, but I’ll bet the next step was to use Latent Dirichlet Allocation, which is very good at topic clustering in documents and would, therefore, be a pretty obvious candidate for the task. Then, just iterate over the standard set of co-clustering techniques. I saw one one using a simulated annealing algorithm that did pretty good with topic clustering in newspapers.

Look: stats dorks are pretty thin on the ground and machine learning junkies are freaks right up there with, say, people who understand the proof to Fermat’s last theorem. These are not easy algorithms. It takes either talent, hard work, or both to acquire them, and you can spend your whole life gainfully employed in statistics and not really ever run into SVD or its applications to general learning problems.

By: Abby

Abby — Tue, 25 Nov 2008 15:54:55 +0000

TheDeadlyShoe wrote: And if thereâ€s just one person managing movies for the whole family, I donâ€t think that person would ask other people what they thought of the movie they watched and then offer up appropriate ratings. That one person would probably just offer their own thoughts in terms of ratings. But that iis exactly what I did with our family's Netflix account. I rated the movies based on how the primary viewer perceived them so that the recommendation system would suggest more movies that each individual might like. Putting in my own rating of movies I hadn't even wanted to watch in the first place would have made it harder for me to find more movies for the rest of my family.

By: Thomas

Thomas — Mon, 24 Nov 2008 19:56:52 +0000

It seems there are two problems:

– computationally, you need an SVD for a large matrix. This isn’t trivial (the power algorithm (as in PageRank) gets the eigenvector for the largest eigenvalue, but it gets harder for the other eigenvalues). Note that the R packages that Kieran mentioned don’t do SVD, they solve (penalized) linear systems.

– you also have missing data, which is less standard. With PageRank there is no missing data: every page either links or does not link to every other page. With Netflix you don’t even have a training sample where everyone has rated every movie. The matrix isn’t sparse in the usual linear algebra sense of being mostly zero, it is 99% missing. He didn’t really do this with any explicit models, he just minimized the approximation error at the observed values. This isn’t completely novel – multidimensional scaling with missing distances has been done that way – but I haven’t seen it done on this scale.

By: Zamfir

Zamfir — Mon, 24 Nov 2008 16:43:47 +0000

Kieran, I do not think those standard methods would work on matrices with missing data (which is not the same as having zeros at those places), but I might be wrong. Matlab at least does not seem to have an inbuild method for gappy SVD.

From googling around a bit I get the impression that the algorithm of Simon Funk, or at least its application to these kinds of problems, was in fact new.

By: Martin James

Martin James — Mon, 24 Nov 2008 15:05:32 +0000

Kieran,

In the real business world almost nobody looks at the literature and even fewer know any math. The difference between a good answer and the best answer is vary rarely important enough to justify the expense of hiring people that know math.

For example, with the financial crisis, those with sophisticated models and those with no models ended up pretty much the same place.

By: Kieran Healy

Kieran Healy — Mon, 24 Nov 2008 14:38:34 +0000

Zamfir - Interesting. I guess I am still amazed that Netflix didn't do this themselves. Not only are the methods in the existing literature, the software tools are, too. R, for instance, has several packages (like this one and especially this one) for handing operations on large, sparse matrices.

By: Ginger Yellow

Ginger Yellow — Mon, 24 Nov 2008 13:23:45 +0000

" You can split up queues between accounts on the fly, so thereâ€s no reason for different people to use the same account." Maybe not, but they do. People are stubborn like that.

By: Zamfir

Zamfir — Mon, 24 Nov 2008 10:36:44 +0000

The guy has a blog where he explains his methods: http://sifter.org/~simon/journal/index.html

I read only a few small bits, bsed on those as it seems “using SVD” is just a starting point, not the trick itself. Trick one is to use a method that can deal with those large, sparse matrices, by some iterative algorithm that approximates the SVD. This is apparently a straightforward application of existing literature.

The other part is determining what to do with the empty spots, movies that are not reviewed by all people. This is probably where the algorithm is really original, with a lot of heurisitic ideas about assumed probability distributions, where a movie’s score is somehow averaged between the actually observed scores and some a priori assumption about the score. I haven’t looked into the details here.

By: john holbo

john holbo — Mon, 24 Nov 2008 04:43:30 +0000

Ah, sorry I missed that, JSE. Yes, DeadlyShoe, what you say makes sense, too.

By: TheDeadlyShoe

TheDeadlyShoe — Mon, 24 Nov 2008 03:53:07 +0000

I really don’t think family accounts are a problem. You can split up queues between accounts on the fly, so there’s no reason for different people to use the same account. And if there’s just one person managing movies for the whole family, I don’t think that person would ask other people what they thought of the movie they watched and then offer up appropriate ratings. That one person would probably just offer their own thoughts in terms of ratings.

By: nitpicking

nitpicking — Mon, 24 Nov 2008 03:27:26 +0000

Not to split hairs, but a common factor model (using something like maximum likelihood estimation) would be more appropriate here, because we can imagine that there are unique factors at play in addition to the common ones (i.e., there is variance that is unique to particular movies, beyond the common factors in the solution). PAF or ML would deal with this better.

By: JSE

JSE — Mon, 24 Nov 2008 02:31:09 +0000

See my #10 above.

By: John Holbo

John Holbo — Mon, 24 Nov 2008 01:53:29 +0000

The article didn’t discuss one thing I’ve wondered about – and seen discussed in other pieces on the prize. Many accounts have several users. So mom is watching “Sex in the City” and dad is watching “Hellboy 2” and the little girls are watching “Barbie Swan Lake” and the account gives them all 5 stars, and now you’ve got this bogus ‘why do Hellboy and Barbie go together?’ non-problem. Has Netflix done anything to correct for this? By, say, subdividing members of an account household?