It may be the age of big data, but since big data tend to come from those who are already using digital media, such data sets tend to lack information about non-users and those who don’t engage in certain activities online. I make this case in detail about data derived from social network sites in my paper called Is Bigger Always Better? published in the ANNALS of the American Academy of Political and Social Science.* That paper mainly focuses on those who are already connected, but even among Internet users, I find that data derived from social media tend to bias against the less privileged and the less skilled as such folks are less likely to be on those sites. This is a problem when more and more studies about social behavior and potentially policy decisions are made based on information that automatically excludes certain populations.

Today (3/31/16) the US Federal Communications Commission votes on broadband subsidies for low-income households. Yes, making home broadband more affordable is likely a necessary condition for getting more Americans online. However, it is not sufficient. My colleague Ashley Walker and I analyzed data from an FCC study administered in 2009 on both users and non-users, finding that people who are more concerned about their personal data being stolen are more likely to be non-users, results that hold true when controlling for other potentially related factors such as age and education. The issue here is not about price, it’s about privacy concerns. Other research I and others have conducted (some of it reviewed here*) shows that lack of Internet skills is often an impediment to using digital media and using it in ways from which people may benefit. Again, it’s not simply core infrastructural access that’s a problem.

Why are Ashley and I using data from 2009? Because shockingly no federal agency has collected nationally-representative data about Americans’ Internet uses since then. The Census used to be in the business of gathering such data, but at this point it only does so about very basic connectivity questions. The approach seems penny-wise and pound-foolish. Sure, gathering such data is expensive, but it is a drop in the bucket compared to spending over $2 billion dollars on broadband subsidies without having sound evidence on how that will actually improve a more diverse group of Americans using the Internet in helpful ways.

I also have a piece on Huffington Post about all this.

[*] If you can’t access it, feel free to send me a note for a preprint copy.