Appropriate empirical evidence?

by Eszter Hargittai on July 3, 2006

An image of a man who is definitely not a college student (certainly not traditionally aged) accompanies an article called “Men Assume Sexual Interest When There May Be None” in a recent piece by a HealthDay reporter, a piece that’s been published on various Web sites. (In case of link rot, I’ve placed a screen shot here.)

In the sixth paragraph of the piece we find out that the study is based on 43 male and 43 female college students aged 18-22. That is the only part of the article where the participants are referred to as college students. Otherwise, the entire piece is about the behavior of men and women generally speaking.

There are several fields that base a good chunk of their empirical research on studies of students.* This is usually done due to convenience. And perhaps regarding some questions, age and educational level do not matter. But the issue is rarely addressed directly. In many instances it seems problematic to assume that a bunch of 20-year-olds in college are representative of the entire rest of the population. So why write it up that way then? At best, in the conclusion of a paper the authors may mention that future studies should/will (?) expand the study to a more representative sample, but these studies rarely seem to materialize.

This is one of my biggest pet peeves when it comes to certain types of scholarship. And I do mean scholarship. Because it is not just the journalistic reports that make the leap. The academic articles themselves use that kind of language. It is part of a larger question that’s been of interest to me for a while now: Historically, how have various fields settled on what is acceptable empirical evidence in their domain and what are the appropriate modes of analysis? Papers that get into top journals in one field wouldn’t even make it off the editor’s desk for review in another field due to the data and methods used. But then when it comes to reporting findings to the public, it all becomes one big general pool of work where the methods and the validity of the findings don’t seem to matter anymore.

[*] Note that recently I have been doing studies on college students myself. First, I have a concrete substantive reason for doing so (they are the most highly network-connected age group, which helps to control for regular use). Second, when I write up the work, I never draw huge generalizations about all users. I always report on “college students” or “study participants”. I do not simply conclude that whatever I find about college students is representative of all Internet users. It would be wrong to do so.

{ 28 comments }

1 Brendan 07.03.06 at 11:38 am: As you know, or should, if ‘tight’ rules about who is used in experiments or studies were used, then almost all of psychology (as an empirical science) would disintegrate. Psychologists try desperately hard to find universal rules of behaviour, common to all human beings, but almost all of their studies are carried out on college students in Europe, Australia/New Zealand, or the States/Canada. Psychologists almost invariably assume (unless they are specifically cross-cultural psychologists) that this is a genuinely random cross-section of the community, and that conclusions drawn from this tiny subpopulation are can be unproblematically applied to everyone who has ever lived, in any culture or period of history.
2 etat 07.03.06 at 11:52 am: Eszter, is it the heat? Your link to the article is corrupt. You want us to go here, which now has a different set of stories, but instead, I get a 404.
Second, FURL works really well for storing the article – sans images. The Viagra ad wouldn’t show up in a FURL, but I’m not sure it’s relevant anyway.

That said, I agree with your assessment, that some scholarly articles have similar ways of presenting findings. I would add that the story as presented is indistinguishable from the pop psychology that pervades magazines aimed at gullible women.
3 Eszter 07.03.06 at 11:55 am: I’m well aware that Psychology is one of those fields where a lot of the work is done this way. There are other fields as well. Historically, they may have grown out of Psych and perhaps that’s why the methods are acceptable.

By the way, I should have linked to this in the post, here is the abstract of the piece from which the article I mention drew its material. The abstract talks about men and women only, not once is there a mention of college students in particular.
4 Randolph Fritz 07.03.06 at 12:25 pm: I think the social sciences are early in their development and it is important to make small, rather than huge, claims for the work. My peeve is that much of the mathematics used in the social sciences is not sufficiently rigorous. This is especially so in economics, where often demonstrably false assumptions of continuity and differentiability hold sway. There are equally troublesome problems in psychometrics.
5 jakeb 07.03.06 at 12:30 pm: I suspect the winner in a most-suspect-domain competition would have to be certain subfields of linguistics. After the birth of modern syntax, the notion that one could make arguments based on the grammaticality of sentences pulled out of one’s head was accepted. While a more empirical approach has started developing in syntax, one can still see at conferences debates of the type “Since X is grammatical and Y is not, therefore A.” “But I find Y grammatical!” “Yes, so do I!” “Well, then, in a dialect (like mine) where X is grammatical and Y is not . . .” It can be despair-inducing. (Particularly when you take into account that judgment fatigue sets on very quickly: repeat something enough times, and it has a good chance of starting to sound acceptable.)

I speculate that it has the same origin as do all those hellish grammar prescriptions: since everyone can speak a language with great skill, all it takes is a dose of extra self-confidence to begin pontificating on the dos and don’ts (although romanticizing the old days while ignoring the actual facts certainly helps). For some linguists, since they’re trained in linguistics, well of course they can make
generalizations about language. Any differences such as difference in senses of grammaticality are just noise.

This isn’t of course to say that the whole discipline is afflicted with this view. A sociolinguist, for instance, would probably be put up against the wall if he were to present a paper based on intuitions.
6 bob mcmanus 07.03.06 at 12:44 pm: Kinsey set the model:spent what, twenty years collecting bugs until he had a million; and then a lifetime, with assistants several lifetimes, collecting histories.

That’s social science. When your sample set is a well-distributed million, you may tentatively hypothesize about human behavior.
7 Eric 07.03.06 at 1:02 pm: On target here:

Sears, David O. 1986. “College sophomores in the laboratory: Influence of a narrow data base on social psychology’s view of human nature.” Journal of Personality and Social Psychology 51:515-30.

Michael E. Gordon, L. Allen Slade, and Neal Schmitt. 1986. “The ‘science of the sophomore’ revisited: From conjecture to empiricism.” The Academy of Management Review 11(1): 191-207.

Jerald Greenberg. 1987. “The college sophomore as guinea pig: Setting the record straight.” The Academy of Management Review 12 (1): 157-159.

Michael E. Gordon, L. Allen Slade, and Neal Schmitt. 1987. “Student guinea pigs: Porcine predictors and particularistic phenomena,” The Academy of Management Review 12(1): 160-63.
8 Jon H 07.03.06 at 1:59 pm: “Men Assume Sexual Interest When There May Be None”

Heh, funny, I’m exactly the opposite. I assume no interest, and a woman pretty much has to spell it out on a big sign and hit me over the head with it before I catch on.
9 SusanC 07.03.06 at 2:32 pm: In many instances it seems problematic to assume that a bunch of 20-year-olds in college are representative of the entire rest of the population.

This is also one of my peeves. In some areas, there are good reasons to believe that students are not typical. For example, suppose you want to test whether a particular computer program can be sucessfully used by a target user population. Usually, what you do is ask some volunteers to perform a specific task with the program amd measure how many of them can do it successfully. It is a very, very bad idea to use a sample consisting entirely of computer science undergraduates, unless the program is intended only to be used by CS undergraduates.

The perils of “Generalising to hypothetical populations from the population actually sampled” usually features prominently in introductory statistica courses for psychologists.
10 Fabio Rojas 07.03.06 at 3:07 pm: Bob McManus said: “Kinsey set the model:spent what, twenty years collecting bugs until he had a million; and then a lifetime, with assistants several lifetimes, collecting histories.

Thatâ€™s social science. When your sample set is a well-distributed million, you may tentatively hypothesize about human behavior.”

Bob – learn about the central limit theorem!! You don’t need huge samples to draw valid inferences. The trick – and this is the point motivating the whole thread – is you need a *random* sample. College students are extremely not-random, which is why we are suspicious of studies using only college students.

Kinsey is actually a bad model. Kinsey operated in an era before people understood sampling techniques. A modern researcher would say that collecting a million data points is not needed unless you are studying extremely rare events.

Also, Kinsey over-sampled certain populations and didn’t create sampling weights. This is a cheap shot, since he worked before we understood the importance of sampling weights, and many results are robust in the face of sample selection bias.

Kinsey may have been a pioneer, a solid scientist and a brave researcher, but when it comes to sampling and data collection, he is not a model and can be criticized on many legitimate grounds.
11 Eszter 07.03.06 at 3:19 pm: Since Trackback doesn’t seem to be working, I thought I’d point to the discussion at Political Animal on this. Some of those comments make me glad we don’t have a wider readership.

Eric – Thanks for those references, very interesting. For those who have access to these articles, I recommend taking a look at them (although you can probably guess from their titles what they address).

Susan C – That is very close to my research area (differences in people’s Internet user skills) and I agree that it can be really problematic to generalize from college students to others. That was precisely my point. By picking a piece about sexual matters, I was again trying to emphasize this. After all, what are the chances that a 20-year-old college male is representative of all men when it comes to this arena?
12 ft32 07.03.06 at 4:30 pm: “Since Trackback doesnâ€™t seem to be working, I thought Iâ€™d point to the discussion at Political Animal on this. Some of those comments make me glad we donâ€™t have a wider readership.”

I miss Calpundit! The comment section was so damn good. I’d also like to register an off-topic and an ungrateful complaint: what the hell is going on with Matt Yglesias? His posts are getting to be as short as Glenn Reynolds’s.
13 bob mcmanus 07.03.06 at 4:47 pm: “You donâ€™t need huge samples to draw valid inferences.”

Well, I did work quality control for a while, and do understand a little about sampling. Very little. But human beings are not widgets.

If Eszter is doing her study at Northwestern, with what degree of confidence can she say her results will be true for similar students at CalTech, Brown, U of Florida? Oxford, Hong Kong, Rio?

If sampling techniques had really good predictive or descriptive results with human behavior Wall Street, K Street, and Madison Avenue could shut down. And those predictors are dealing with millions of samples.
14 Steph 07.03.06 at 5:45 pm: This is a really interesting topic and one that Iâ€™ve been thinking about lately too. Given my field, I tend to think about nationally representative samples when confronted with any research question, though of course they are not always possible. but there are some fields where it is common practice to assume that all humans are interchangeable, because what is being studied is a biological phenomenon. Psychology and medicine are the fields that come to mind first.

The problem is that the researchers cannot evaluate their assumption that all people everywehere work the same, because they just do not have data that data to do that evaluation, i.e. they have only data from college first years, or people under treatment at major research hospitals.
15 Adam 07.03.06 at 8:02 pm: I have two problems with this study:
1. Who was doing the post study questionaire? Not to put to fine a point on it, but men are socialized to be far more comfortable talking about sex and sexuality than women are. Having the woman be debriefed by some grandpa figure will surely depress the woman’s response rate.

2. Was the woman providing encouragement? Please bear with me, I don’t mean in some obvious crass way. Previous experiments have show (see The Person and the Situation, by Ross & Nisbett) that a man talking to a woman he cannot see but believes is attractive will speak to her differently than he would were he to believe she is not attractive. Furthermore, and this is the important point – observers listening only to her end of the conversation rated her response as more engaging when the man believed she was attractive. There is unconcious feedback.

Overall this study seems to flawed — they are looking for dispositional factors to explain bad behavior when situational factors are far more important. They would do far better to study the environments in which harrassment has occurred and try to find common elements there.

ps. It kills me whenever someone 40+ yaps about women vs. men. Gender roles have changed dramaticly in the past 40 years and continue to do so. There is no state of nature we can make our measurements with respect to. Behavior on the part of women that would once have been considered pathological is now relatively normal. We may have started as chimps but we may end up as bonobos.
16 Randolph Fritz 07.03.06 at 8:04 pm: “learn about the central limit theorem!”

The assumption that the central limit is Gaussian, or approached in any reasonable amount of time only applies when the distribution has moderate variance. Information systems (like human minds and the internet) can routinely and easily produce non-Gaussian central limits because, IIRC, among other things, they have memory and ordered behavior, and are also capable of communication. Which doesn’t mean the Gaussian distribution is wrong in all such cases, but does mean that it is unreasonable to assume it. See Mandelbrot, passim.
17 Walt 07.03.06 at 9:17 pm: Randolph: That makes the Gaussian problematic in lots of situations, but _not_ in random sampling.
18 Megami 07.03.06 at 10:54 pm: “We may have started as chimps but we may end up as bonobos.”

That is beautiful – i am putting that on my door at work.
19 Jon H 07.03.06 at 11:11 pm: ft32 writes: “what the hell is going on with Matt Yglesias? ”

Well, he posts on three blogs – his own, TAPPED, and at TPM Cafe.

And possibly others I am not aware of.
20 Randolph Fritz 07.04.06 at 1:37 am: Walt, a sampling of information gleaned from randomly chosen people is a sampling of data from multiple points in a communicating network. That can produce, as I remember from my computing days, some serious surprises. And sometimes, I think, the assumption of a Gaussian central limit is a reasonable one, but I think it’s often made far too carelessly.
21 SusanC 07.04.06 at 2:05 am: This reminds me of a joke. A mathematician, a physicist and an engineer are on a train going through Wales. Looking out of the window, they see a black sheep.

Engineer: “That’s really cool – all the sheep in Wales are black”

Physicist: “But you’ve only seen one! The best you can conclude is that at that there is at least one black sheep in Wales.”

Mathematician: “You’re both wrong. At least one sheep in Wales is black on at least one side.”
22 yoyo 07.04.06 at 3:38 am: adam, i’ve had phone sex with physcially unattractive girls, so are you sure abotu no.2?
23 Fabio Rojas 07.04.06 at 1:23 pm: Randolph Fritz said: “The assumption that the central limit is Gaussian, or approached in any reasonable amount of time only applies when the distribution has moderate variance.”

True, but how many times does a social scientist work in a situation where the variance is not moderate? For example, if we wanted to study voting, how badly would the CLT be violated? Furthermore, as Walt said, it’s not a problem if you have truly random sampling.

My perspective on a lot statistical issues is that social scientists often analyze data that behave fairly well, even though they don’t perfectly satisfy the assumptions of many models. The goal is not accuracy (which is an issue in engineering) but measuring general tendencies among variables. E.g., we don’t care *exactly* how much income increases with education, but we’re happy as long as we get a reasonable ball park figure. Most samples of modest size are good enough to answer this type of question.

The problem Ester Hargatti identified has to do with an extreme violation of randoming sampling principles. No amount of statistical technique will help you if your sample is massively biased, say, by selecting only 18-22 year-old college students.

Bob McManus said:”â€œYou donâ€™t need huge samples to draw valid inferences.â€

Well, I did work quality control for a while, and do understand a little about sampling. Very little. But human beings are not widgets.”

I never claimed people were widgets, but the amazing thing about a well constructed sample is that you can obtain pretty good answers with very little data. It’s an amazing statistical fact that a small random sample can tell you a lot about a huge population.

Furthermore, it’s not the relative size of the sample – it’s the raw size of the sample. In other words, a random sample of 1,000 tells you the same about a ten thousand or a million people. It’s hard to believe, but it’s true. It blew my mind when I first understood this.
24 Ozma 07.04.06 at 1:35 pm: I wonder, actually, if this stuff really is intended as “scholarship” — there is a “researcher” at my university who made quite a big splash reporting that (1) parents are less attentive to “unattractive” children and that (2) “unattractive” parents are less attentive to their children generally.

the study was based on “investigators” going out and taking notes on parent/child interactions in supermarkets (and rating their attractiveness while they were at it). It seems to me that these kinds of “studies” — and the college student one you describe — are DESIGNED not with the serious research paper but with the titillating press release in mind.

I don’t work in psychology nor sociology, but I simply cannot believe there exists an entire discipline in which this kind of thing is taken seriously (evolutionary psychology is more an ideological disposition than a discipline, so that doesn’t count as a counter-example).
25 pdf23ds 07.04.06 at 2:05 pm: “Not to put to fine a point on it, but men are socialized to be far more comfortable talking about sex and sexuality than women are.”

Funny, a few people hold the opposite opinion.
26 SusanC 07.05.06 at 7:10 am: “True, but how many times does a social scientist work in a situation where the variance is not moderate?”

For example: suppose you’re asking your study participants how many sexual partners they have had. (For this discussion, I will leave aside the obvious problem that they may not respond truthfully). Most of the sampled population will give a low numbered response, but a few are sex industry workers who will give a very high response. To get an accurate estimate of the mean, you need quite a large sample.

But yes, in many situations the underlying distribution doesn’t have a very long tail, we’re not dealing with rare events, and a sample of 50 or so is quite adequate.
27 dfd 07.06.06 at 11:29 am: College students are typically used to study everything from blood types, to allergic reactions, to brain structure and function, to sexuality. This is only a problem if you think humans are infinitely variable and unpredictable, but we know that they aren’t.
Some famous findings (e.g., Maslow’s hierarchy) have been found to vary somewhat from culture to culture. But the very fact that people have gone out and looked into it shows that the system is not as broken as some of you think.
28 Fabio Rojas 07.06.06 at 3:37 pm: Susan said: “For example: suppose youâ€™re asking your study participants how many sexual partners they have had. (For this discussion, I will leave aside the obvious problem that they may not respond truthfully). Most of the sampled population will give a low numbered response, but a few are sex industry workers who will give a very high response. To get an accurate estimate of the mean, you need quite a large sample.”

This is a nice example showing how a little empiricism can really help you sort through some sticky statistical issues.

Comments on this entry are closed.

Appropriate empirical evidence?

Recent Comments

Search

Archives

Pages

Book Events

Contributors

Fine Print

Lumber Room

Old Wood

Meta

Recent Posts

Tags

Appropriate empirical evidence?

Share this:

Recent Comments

Search

Archives

Pages

Book Events

Contributors

Fine Print

Lumber Room

Old Wood

Meta

Recent Posts

Tags