We need more data on people’s – as in everybody’s – Internet uses

by Eszter Hargittai on March 31, 2016

It may be the age of big data, but since big data tend to come from those who are already using digital media, such data sets tend to lack information about non-users and those who don’t engage in certain activities online. I make this case in detail about data derived from social network sites in my paper called Is Bigger Always Better? published in the ANNALS of the American Academy of Political and Social Science.* That paper mainly focuses on those who are already connected, but even among Internet users, I find that data derived from social media tend to bias against the less privileged and the less skilled as such folks are less likely to be on those sites. This is a problem when more and more studies about social behavior and potentially policy decisions are made based on information that automatically excludes certain populations.

Today (3/31/16) the US Federal Communications Commission votes on broadband subsidies for low-income households. Yes, making home broadband more affordable is likely a necessary condition for getting more Americans online. However, it is not sufficient. My colleague Ashley Walker and I analyzed data from an FCC study administered in 2009 on both users and non-users, finding that people who are more concerned about their personal data being stolen are more likely to be non-users, results that hold true when controlling for other potentially related factors such as age and education. The issue here is not about price, it’s about privacy concerns. Other research I and others have conducted (some of it reviewed here*) shows that lack of Internet skills is often an impediment to using digital media and using it in ways from which people may benefit. Again, it’s not simply core infrastructural access that’s a problem.

Why are Ashley and I using data from 2009? Because shockingly no federal agency has collected nationally-representative data about Americans’ Internet uses since then. The Census used to be in the business of gathering such data, but at this point it only does so about very basic connectivity questions. The approach seems penny-wise and pound-foolish. Sure, gathering such data is expensive, but it is a drop in the bucket compared to spending over $2 billion dollars on broadband subsidies without having sound evidence on how that will actually improve a more diverse group of Americans using the Internet in helpful ways.

I also have a piece on Huffington Post about all this.

[*] If you can’t access it, feel free to send me a note for a preprint copy.



krippendorf 03.31.16 at 2:31 pm

The faddishness of social-media based studies has led to enormous wasted effort and money. Just as a recent example, take the recent study that shows that Facebook users are more likely to have the same occupation as a parent than they are to have different occupations from a parent. See, e.g., here .

There are already dozens of sociological studies — even if “big data scientists” and most economists choose to ignore them, and certainly choose not to cite them — that shown that occupations are “inherited” across generations. These studies use nationally representative data, and pay close attention to things like sample weighting and selection biases and conceptual issues such as whether or not to parse out the effects of changes in the occupational structure between one generation and the next in estimating levels and patterns of occupational mobility or immobility.

The question, then, is, what do we really learn about social mobility from analyses of the small fraction of parent-child pairs who are on Facebook, and who list both occupations on their Facebook profiles?

The answer: Not much. If the estimates of intergenerational inheritance from Facebook are the same as those obtained with nationally representative data, we have learned nothing new about social mobility that we didn’t know before. If the estimates differ, we have also learned nothing useful about social mobility, because the Facebook-based estimates are polluted with selection biases and can’t be trusted.

Fans of these sorts of studies say that they are revolutionizing social science. More often than not, they are reinventing the wheel, and their updated version is square.

But, studies using Facebook or Twitter are very fashionable, it’s comparatively easy to get research funding to study them, and it’s comparatively easy to publish them as long as they are sent to reviewers who have also drunk the Koolaid. Administrators get to build new research centers, which is much better for their careers than investing the same amount of resources in existing ones. And, the (mostly male) data scientists who do this kind of work get to hang out with the higher status people in the computer science department, rather than those lowly social scientists. So it goes.


Dean C. Rowan 03.31.16 at 4:19 pm

Questions re: the 2009 FCC study: Did it rely on a “big data”-like corpus, or was it a more traditional survey of randomly sampled individuals? How does Pew’s work fit into this? Don’t they collect “nationally-representative data about Americans’ Internet uses”?


Eszter Hargittai 03.31.16 at 7:07 pm

Dean, details about the FCC study’s methods are at the end of this report:

If you click through to the Huffington Post piece you’ll see me linking to several Pew reports. If you look at my Is Bigger Always Better? paper, you’ll see me analyzing Pew data and referencing their findings. Pew has been incredibly helpful with their data.

That said, Pew has very limited resources and their surveys only collect so much background information about their respondents (as little beyond age, gender, race/ethnicity and SES is usually of interest to them or within reach of their budgets) and only has so many Internet use questions in any one study. In my paper referenced above, I also use another data set – my own – to address some of the shortcomings in theirs. So yes, Pew is a fantastic resource, but they have limits and we also can’t just rely on a private organization like that to continue offering relevant data.


Witt 04.01.16 at 2:15 am

Eszter, I’m very grateful that you continue to lift up this issue in a rigorous and sustained way. There are so many facets to it, and so many implications for fields ranging from public health to college access to early childhood care and much more.

A few links that may be of interest to other commenters:

The Census Project is a dogged group of advocates who keep trying to keep the Census Bureau alive through insane Congressional hearing after insane Congressional hearing. If you value the kind of information the Census Bureau collects, it is well worth supporting them (or even just subscribing to their occasionally-sent, wonderfully tart newsletter).

And there is some really interesting data on computer skills in the PIAAC (Program for International Assessment of Adult Competencies), a cross-national comparison of skills among people age 16-65 conducted by the OECD.

The PIAAC measures basic skills like reading and math, but also a third category that they refer to as “Problem-Solving in Technology-Rich Environments” which is a cumbersome way to say “Can this person find basic information on a web page?” (See sample test questions.)

US PIAAC results are summarized in Time for the US to Reskill?, and substantially more information is available for researchers at PIAACgateway.com. There you can also find detailed information on what the dataset does and does not include (e.g., who isn’t included in the “Problem-Solving in Technology-Rich Environments” because they didn’t even have enough computer skills to take the test).


John Quiggin 04.01.16 at 8:51 am

@kippendorf. Economists are guilty of imperialism to be sure. But the authors of this piece are physicists I think


cassander 04.02.16 at 10:41 pm

>over $2 billion dollars on broadband subsidies without having sound evidence on how that will actually improve a more diverse group of Americans using the Internet in helpful ways

Of course, the idea of simply not subsidizing everything under the sun never seems to occur to anyone (except, of course evil, hard right reactionary extremists like myself) despite the fact that where there is good evidence, it usually shows that these subsidies achieve something between “nothing” and “opposite result claimed.” See, for example, the many forms of farm subsidy intended to help family farms that almost invariably end up subsidizing agribusiness.

>But, studies using Facebook or Twitter are very fashionable

I’d say that they’re fashionable precisely because they’re easy to get data for, often data that’s already conveniently organized. It’s much easier, to say nothing of cheaper and less boring, to send Facebook a few emails and get a spreadsheet than it is to mail people surveys then painstakingly hand process them.

Comments on this entry are closed.