<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Netflix Weirdness</title>
	<atom:link href="http://crookedtimber.org/2008/11/23/netflix-weirdness/feed/" rel="self" type="application/rss+xml" />
	<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/</link>
	<description>Out of the crooked timber of humanity, no straight thing was ever made</description>
	<lastBuildDate>Sun, 27 May 2012 07:27:35 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: Kaleberg</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259622</link>
		<dc:creator>Kaleberg</dc:creator>
		<pubDate>Fri, 28 Nov 2008 03:17:23 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259622</guid>
		<description>My impression is that SVD is just a starting point. There is lots of interesting stuff you can do once you have the basic decomposition. For example, you can do optimal rotations with things like the varimax algorithm. You can also do subspace selection which when combined with rotation can help pick out the signal from the noise. Of course, I&#039;ve only used SVD for remote sensing analysis, so what the hell do I know other than I&#039;ve yet to buy something Amazon has recommended for me.</description>
		<content:encoded><![CDATA[	<p>My impression is that <span class="caps">SVD</span> is just a starting point. There is lots of interesting stuff you can do once you have the basic decomposition. For example, you can do optimal rotations with things like the varimax algorithm. You can also do subspace selection which when combined with rotation can help pick out the signal from the noise. Of course, I&#8217;ve only used <span class="caps">SVD</span> for remote sensing analysis, so what the hell do I know other than I&#8217;ve yet to buy something Amazon has recommended for me.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: andrew cooke</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259621</link>
		<dc:creator>andrew cooke</dc:creator>
		<pubDate>Fri, 28 Nov 2008 02:37:48 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259621</guid>
		<description>You can get an intuitive feel for how &lt;em&gt;missing&lt;/em&gt; data are a problem by considering how you might implement a naive calculation: calculate the average point; discard that dimension; repeat.  Both the calculation of the average and the reduction in dimension become non-linear processes when you have missing data (the missing data have to be ignored; they cannot be simply treated as zero).  That would strongly suggest that traditional efficient solutions based on linear matrix operations will not work.

Which is what various peopled have said above, really.</description>
		<content:encoded><![CDATA[	<p>You can get an intuitive feel for how <em>missing</em> data are a problem by considering how you might implement a naive calculation: calculate the average point; discard that dimension; repeat.  Both the calculation of the average and the reduction in dimension become non-linear processes when you have missing data (the missing data have to be ignored; they cannot be simply treated as zero).  That would strongly suggest that traditional efficient solutions based on linear matrix operations will not work.</p>

	<p>Which is what various peopled have said above, really.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: Scott Martens</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259590</link>
		<dc:creator>Scott Martens</dc:creator>
		<pubDate>Thu, 27 Nov 2008 16:47:38 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259590</guid>
		<description>I haven&#039;t looked at any of the research, but I&#039;ll bet the next step was to use Latent Dirichlet Allocation, which is very good at topic clustering in documents and would, therefore, be a pretty obvious candidate for the task.  Then, just iterate over the standard set of co-clustering techniques.  I saw one one using a simulated annealing algorithm that did pretty good with topic clustering in newspapers.

Look: stats dorks are pretty thin on the ground and machine learning junkies are freaks right up there with, say, people who understand the proof to Fermat&#039;s last theorem.  These are not easy algorithms.  It takes either talent, hard work, or both to acquire them, and you can spend your whole life gainfully employed in statistics and not really ever run into SVD or its applications to general learning problems.</description>
		<content:encoded><![CDATA[	<p>I haven&#8217;t looked at any of the research, but I&#8217;ll bet the next step was to use Latent Dirichlet Allocation, which is very good at topic clustering in documents and would, therefore, be a pretty obvious candidate for the task.  Then, just iterate over the standard set of co-clustering techniques.  I saw one one using a simulated annealing algorithm that did pretty good with topic clustering in newspapers.</p>

	<p>Look: stats dorks are pretty thin on the ground and machine learning junkies are freaks right up there with, say, people who understand the proof to Fermat&#8217;s last theorem.  These are not easy algorithms.  It takes either talent, hard work, or both to acquire them, and you can spend your whole life gainfully employed in statistics and not really ever run into <span class="caps">SVD</span> or its applications to general learning problems.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: Abby</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259443</link>
		<dc:creator>Abby</dc:creator>
		<pubDate>Tue, 25 Nov 2008 15:54:55 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259443</guid>
		<description>TheDeadlyShoe wrote:
&lt;i&gt;And if there’s just one person managing movies for the whole family, I don’t think that person would ask other people what they thought of the movie they watched and then offer up appropriate ratings. That one person would probably just offer their own thoughts in terms of ratings.&lt;/i&gt;
But that iis exactly what I did with our family&#039;s Netflix account. I rated the movies based on how the primary viewer perceived them so that the recommendation system would suggest more movies that each individual might like. Putting in my own rating of movies I hadn&#039;t even wanted to watch in the first place would have made it harder for me to find more movies for the rest of my family.</description>
		<content:encoded><![CDATA[	<p>TheDeadlyShoe wrote:<br />
<i>And if there&#8217;s just one person managing movies for the whole family, I don&#8217;t think that person would ask other people what they thought of the movie they watched and then offer up appropriate ratings. That one person would probably just offer their own thoughts in terms of ratings.</i><br />
But that iis exactly what I did with our family&#8217;s Netflix account. I rated the movies based on how the primary viewer perceived them so that the recommendation system would suggest more movies that each individual might like. Putting in my own rating of movies I hadn&#8217;t even wanted to watch in the first place would have made it harder for me to find more movies for the rest of my family.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: Thomas</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259364</link>
		<dc:creator>Thomas</dc:creator>
		<pubDate>Mon, 24 Nov 2008 19:56:52 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259364</guid>
		<description>It seems there are two problems:

- computationally, you need an SVD for a large  matrix.  This isn&#039;t trivial (the power algorithm (as in PageRank) gets the eigenvector for the largest eigenvalue, but it gets harder for the other eigenvalues).   Note that the R packages that Kieran mentioned don&#039;t do SVD, they solve (penalized) linear systems.  

- you also have missing data, which is less standard. With PageRank there is no missing data: every page either links or does not link to every other page. With Netflix you don&#039;t even have a training sample where everyone has rated every movie.  The matrix isn&#039;t sparse in the usual linear algebra sense of being mostly zero, it is 99% missing.  He didn&#039;t really do this with any explicit models, he just minimized the approximation error at the observed values.   This isn&#039;t completely novel - multidimensional scaling with missing distances has been done that way  - but I haven&#039;t seen it done on this scale.</description>
		<content:encoded><![CDATA[	<p>It seems there are two problems:</p>
 &#8211; computationally, you need an <span class="caps">SVD</span> for a large  matrix.  This isn&#8217;t trivial (the power algorithm (as in PageRank) gets the eigenvector for the largest eigenvalue, but it gets harder for the other eigenvalues).   Note that the R packages that Kieran mentioned don&#8217;t do <span class="caps">SVD</span>, they solve (penalized) linear systems.
 &#8211; you also have missing data, which is less standard. With PageRank there is no missing data: every page either links or does not link to every other page. With Netflix you don&#8217;t even have a training sample where everyone has rated every movie.  The matrix isn&#8217;t sparse in the usual linear algebra sense of being mostly zero, it is 99% missing.  He didn&#8217;t really do this with any explicit models, he just minimized the approximation error at the observed values.   This isn&#8217;t completely novel &#8211; multidimensional scaling with missing distances has been done that way  &#8211; but I haven&#8217;t seen it done on this scale.
 ]]></content:encoded>
	</item>
	<item>
		<title>By: Zamfir</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259352</link>
		<dc:creator>Zamfir</dc:creator>
		<pubDate>Mon, 24 Nov 2008 16:43:47 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259352</guid>
		<description>Kieran, I do not think those standard methods would work on matrices with missing data (which is not the same as having zeros at those places), but I might be wrong. Matlab at least does not seem to have an inbuild method for gappy SVD.

From googling around a bit I get the impression that the algorithm of Simon Funk, or at least its application to these kinds of problems, was in fact new.</description>
		<content:encoded><![CDATA[	<p>Kieran, I do not think those standard methods would work on matrices with missing data (which is not the same as having zeros at those places), but I might be wrong. Matlab at least does not seem to have an inbuild method for gappy <span class="caps">SVD</span>.</p>

	<p>From googling around a bit I get the impression that the algorithm of Simon Funk, or at least its application to these kinds of problems, was in fact new.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: Martin James</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259343</link>
		<dc:creator>Martin James</dc:creator>
		<pubDate>Mon, 24 Nov 2008 15:05:32 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259343</guid>
		<description>Kieran,

In the real business world almost nobody looks at the literature and even fewer know any math.  The difference between a good answer and the best answer is vary rarely important enough to justify the expense of hiring people that know math.  

For example, with the financial crisis, those with sophisticated models and those with no models ended up pretty much the same place.</description>
		<content:encoded><![CDATA[	<p>Kieran,</p>

	<p>In the real business world almost nobody looks at the literature and even fewer know any math.  The difference between a good answer and the best answer is vary rarely important enough to justify the expense of hiring people that know math.</p>

	<p>For example, with the financial crisis, those with sophisticated models and those with no models ended up pretty much the same place.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: Kieran Healy</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259338</link>
		<dc:creator>Kieran Healy</dc:creator>
		<pubDate>Mon, 24 Nov 2008 14:38:34 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259338</guid>
		<description>Zamfir - Interesting. I guess I am still amazed that Netflix didn&#039;t do this themselves. Not only are the methods in the existing literature, the software tools are, too. R, for instance, has several packages (like &lt;a href=&quot;http://cran.r-project.org/web/packages/elasticnet/index.html&quot; rel=&quot;nofollow&quot;&gt;this one&lt;/a&gt; and especially &lt;a href=&quot;http://cran.r-project.org/web/packages/Matrix/index.html&quot; rel=&quot;nofollow&quot;&gt;this one&lt;/a&gt;) for handing operations on large, sparse matrices.</description>
		<content:encoded><![CDATA[	<p>Zamfir &#8211; Interesting. I guess I am still amazed that Netflix didn&#8217;t do this themselves. Not only are the methods in the existing literature, the software tools are, too. R, for instance, has several packages (like <a href="http://cran.r-project.org/web/packages/elasticnet/index.html" rel="nofollow">this one</a> and especially <a href="http://cran.r-project.org/web/packages/Matrix/index.html" rel="nofollow">this one</a>) for handing operations on large, sparse matrices.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: Ginger Yellow</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259335</link>
		<dc:creator>Ginger Yellow</dc:creator>
		<pubDate>Mon, 24 Nov 2008 13:23:45 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259335</guid>
		<description>&quot; You can split up queues between accounts on the fly, so there’s no reason for different people to use the same account.&quot;

Maybe not, but they do. People are stubborn like that.</description>
		<content:encoded><![CDATA[	<p>&#8221; You can split up queues between accounts on the fly, so there&#8217;s no reason for different people to use the same account.&#8221;</p>

	<p>Maybe not, but they do. People are stubborn like that.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: Zamfir</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259331</link>
		<dc:creator>Zamfir</dc:creator>
		<pubDate>Mon, 24 Nov 2008 10:36:44 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259331</guid>
		<description>The guy has a blog where he explains his methods: http://sifter.org/~simon/journal/index.html

I read only a few small bits, bsed on those as it seems &quot;using SVD&quot; is just a starting point, not the trick itself. Trick one is to use a method that can deal with those large, sparse matrices, by some iterative algorithm that approximates the SVD. This is apparently a straightforward application of existing literature.

The other part is determining what to do with the empty spots, movies that are not reviewed by all people.  This is probably where the algorithm is really original, with a lot of  heurisitic ideas about assumed probability distributions, where a movie&#039;s score is somehow averaged between the actually observed scores and some a priori assumption about the score. I haven&#039;t looked into the details here.</description>
		<content:encoded><![CDATA[	<p>The guy has a blog where he explains his methods: <a href="http://sifter.org/~simon/journal/index.html" rel="nofollow">http://sifter.org/~simon/journal/index.html</a></p>

	<p>I read only a few small bits, bsed on those as it seems &#8220;using <span class="caps">SVD</span>&#8221; is just a starting point, not the trick itself. Trick one is to use a method that can deal with those large, sparse matrices, by some iterative algorithm that approximates the <span class="caps">SVD</span>. This is apparently a straightforward application of existing literature.</p>

	<p>The other part is determining what to do with the empty spots, movies that are not reviewed by all people.  This is probably where the algorithm is really original, with a lot of  heurisitic ideas about assumed probability distributions, where a movie&#8217;s score is somehow averaged between the actually observed scores and some a priori assumption about the score. I haven&#8217;t looked into the details here.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: john holbo</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259329</link>
		<dc:creator>john holbo</dc:creator>
		<pubDate>Mon, 24 Nov 2008 04:43:30 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259329</guid>
		<description>Ah, sorry I missed that, JSE. Yes, DeadlyShoe, what you say makes sense, too.</description>
		<content:encoded><![CDATA[	<p>Ah, sorry I missed that, <span class="caps">JSE</span>. Yes, DeadlyShoe, what you say makes sense, too.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: TheDeadlyShoe</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259328</link>
		<dc:creator>TheDeadlyShoe</dc:creator>
		<pubDate>Mon, 24 Nov 2008 03:53:07 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259328</guid>
		<description>I really don&#039;t think family accounts are a problem.  You can split up queues between accounts on the fly, so there&#039;s no reason for different people to use the same account.  And if there&#039;s just one person managing movies for the whole family, I don&#039;t think that person would ask other people what they thought of the movie they watched and then offer up appropriate ratings.  That one person would probably just offer their own thoughts in terms of ratings.</description>
		<content:encoded><![CDATA[	<p>I really don&#8217;t think family accounts are a problem.  You can split up queues between accounts on the fly, so there&#8217;s no reason for different people to use the same account.  And if there&#8217;s just one person managing movies for the whole family, I don&#8217;t think that person would ask other people what they thought of the movie they watched and then offer up appropriate ratings.  That one person would probably just offer their own thoughts in terms of ratings.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: nitpicking</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259327</link>
		<dc:creator>nitpicking</dc:creator>
		<pubDate>Mon, 24 Nov 2008 03:27:26 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259327</guid>
		<description>Not to split hairs, but a common factor model (using something like maximum likelihood estimation) would be more appropriate here, because we can imagine that there are unique factors at play in addition to the common ones (i.e., there is variance that is unique to particular movies, beyond the common factors in the solution).  PAF or ML would deal with this better.</description>
		<content:encoded><![CDATA[	<p>Not to split hairs, but a common factor model (using something like maximum likelihood estimation) would be more appropriate here, because we can imagine that there are unique factors at play in addition to the common ones (i.e., there is variance that is unique to particular movies, beyond the common factors in the solution).  <span class="caps">PAF</span> or ML would deal with this better.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: JSE</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259325</link>
		<dc:creator>JSE</dc:creator>
		<pubDate>Mon, 24 Nov 2008 02:31:09 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259325</guid>
		<description>See my #10 above.</description>
		<content:encoded><![CDATA[	<p>See my #10 above.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: John Holbo</title>
		<link>http://crookedtimber.org/2008/11/23/netflix-weirdness/comment-page-1/#comment-259324</link>
		<dc:creator>John Holbo</dc:creator>
		<pubDate>Mon, 24 Nov 2008 01:53:29 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/?p=8607#comment-259324</guid>
		<description>The article didn&#039;t discuss one thing I&#039;ve wondered about - and seen discussed in other pieces on the prize. Many accounts have several users. So mom is watching &quot;Sex in the City&quot; and dad is watching &quot;Hellboy 2&quot; and the little girls are watching &quot;Barbie Swan Lake&quot; and the account gives them all 5 stars, and now you&#039;ve got this bogus &#039;why do Hellboy and Barbie go together?&#039; non-problem. Has Netflix done anything to correct for this? By, say, subdividing members of an account household?</description>
		<content:encoded><![CDATA[	<p>The article didn&#8217;t discuss one thing I&#8217;ve wondered about &#8211; and seen discussed in other pieces on the prize. Many accounts have several users. So mom is watching &#8220;Sex in the City&#8221; and dad is watching &#8220;Hellboy 2&#8221; and the little girls are watching &#8220;Barbie Swan Lake&#8221; and the account gives them all 5 stars, and now you&#8217;ve got this bogus &#8216;why do Hellboy and Barbie go together?&#8217; non-problem. Has Netflix done anything to correct for this? By, say, subdividing members of an account household?</p>
 ]]></content:encoded>
	</item>
</channel>
</rss>

