Or “I thought Science was a serious peer-reviewed publication…”
A study published today in Science by Facebook researchers using Facebook data claims to examine whether adult U.S. Facebook users engage with ideologically cross-cutting material on the site. My friend Christian Sandvig does an excellent job highlighting many of the problems of the piece and I encourage you to read his astute and well-referenced commentary. I want to highlight just one point here, a point that in and of itself should have stood out to reviewers at Science and should have been addressed before publication. It concerns the problematic sampling frame for the study and how little prominence it gets in the publication (i.e., none, it’s all in the supplemental materials).
Sampling is crucial to social science questions since biased samples can have serious implications for a study’s findings. In particular, it is extremely important that the sampling methodology be decoupled from the substantive questions of interest in the study. In this case, if you are examining engagement with political content, it is important that sampling not be based on anything related to users’ engagement with politics. However, that is precisely how sampling was done here. I elaborate below, but in sum, although the study boasts 10 million plus observations, only seen in the supplementary materials is the fact that only a tiny percentage (single digits) of Facebook users were eligible to make it into the sample in the first place. These are folks who explicitly identify their political affiliation on the site, i.e., people who probably have a different relationship to politics than the average user. They are also relatively active users based on another sampling decision, again, something confounded with the outcome of interest, i.e., engagement with political materials.
Not in the piece published in Science proper, but in the supplementary materials we find the following:
All Facebook users can self-report their political affiliation; 9% of U.S. users over 18 do. We mapped the top 500 political designations on a five-point, -2 (Very Liberal) to +2 (Very Conservative) ideological scale; those with no response or with responses such as “other” or “I don’t care” were not included. 46% of those who entered their political affiliation on their profiles had a response that could be mapped to this scale.
To recap, only 9% of FB users give information about their political affiliation in a way relevant here to sampling and 54% of those do so in a way that is not meaningful to determine their political affiliation. This means that only about 4% of FB users were eligible for the study. But it’s even less than that, because the user had to log in at least “4/7 days per week”, which “removes approximately 30% of users”.
Of course, every study has limitations. But sampling is too important here to be buried in supplementary materials. And the limitations of the sampling are too serious to warrant the following comment in the final paragraph of the paper:
we conclusively establish that on average in the context of Facebook, individual choices (2, 13, 15, 17) more than algorithms (3, 9) limit exposure to attitude-challenging content.
How can a sample that has not been established to be representative of Facebook users result in such a conclusive statement? And why does Science publish papers that make such claims without the necessary empirical evidence to back up the claims?
Can publications and researchers please stop being mesmerized by large numbers and go back to taking the fundamentals of social science seriously? In related news, I recently published a paper asking “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites” that I recommend to folks working through and with big data in the social sciences.*
Full disclosure, some of my work has been funded by Facebook as well as Google and other corporations as well as foundations, details are available on my CV. Also, I’m friends with one of the authors of the study and very much value many of the contributions she has made to research.
[*] Regarding the piece on which I comment here, FB users not being nationally-representative is not an issue since the paper and its claims are only concerned with Facebook use.