If I’d only known…

by Eszter Hargittai on June 7, 2007

I am working on the Introduction to an edited volume on the nitty-gritty behind-the-scenes work involved in empirical social science research (to be published by The University of Michigan Press in 2008). While each chapter in the book gets into considerable detail about how to approach various types of projects (from sampling online populations to interviewing hard-to-access groups, from collecting biomarkers to compiling cross-national quantitative data sets), I want to address more general issues in the introductory chapter.

One of the topics I would like to discuss concerns larger-level lessons learned after conducting such projects. The motivation behind the entire volume is that unprecedented things happen no matter the quality and detail of preparation, but even issues that can be anticipated are rarely passed along to researchers new to a type of method. The volume tries to rectify this.

I am curious, what are your biggest lessons learned? If you had to pick one or two (or three or four) things you really wish you had known before you had embarked on a project, what are they? I am happy to hear about any type of issue from learning more about a collaborator’s qualifications or interests, to leaving more time for cleaning data, from type of back-up method to unprecedented issues with respondents. If you don’t feel comfortable posting here, please email me off-blog. Thanks!



eudoxis 06.07.07 at 8:30 pm

This is, undoubtedly, common knowledge, but it is one of those lessons that seem difficult to learn. Consult a statistician during project design, not after data collection, even if you know some statistics.


SamChevre 06.07.07 at 8:31 pm

This may be overly simple, but the key lesson I learned from my (undergrad, economics) major paper was, “Pay very close attention to your data.” Does it mean what you think it means?

All the tricky stuff is trivial if you don’t get the basics right–a representative sample, questions that are understood and answered accurately are the most basic basics.


Eszter 06.07.07 at 8:41 pm

Nothing is too trivial here, please feel free to share whatever observations you have! Thanks to both of you.

I completely agree that the methods have to be very sound up front. This cannot be emphasized often enough!


dsquared 06.07.07 at 8:48 pm

The very obvious one is draw lots of graphs.


M. Gordon 06.07.07 at 9:07 pm

As long as we’re passing along platitudes, I’ll pass along one that I share with grad students frequently, one that I learned personally, through much hard work. This is somewhat peculiar to experimental science, but has obvious corollaries. It is this:

There are two ways to do anything: the right way, and the half-assed way. The half-assed way never saves you time. Here’s why: if you do something the half-assed way, and it doesn’t work, you haven’t learned anything. You’ll have to go back and do it the right way to see if that fixes it. 99% of what you do doesn’t work the first time anyway, even if you do it right, so doing it the half-assed way will almost always be a wasted attempt. Do it the right way, the first time. Even if that means starting over halfway through because you fucked it up.


Eszter 06.07.07 at 9:31 pm

D-squared, can you say more?

Matt, agreed! Relates to the points above: if you didn’t do it well you might as well not have done it at all.


dsquared 06.07.07 at 9:48 pm

Just basically draw loads of charts of the data in time series and scatterplots before you start transforming it or doing any statistical work on it at all, to get a feel for the shape of it, see any anomalies, get an idea of how much shape there is etc. I said “obvious” because all of the books tell you to do that, but not everyone bothers.


Eszter 06.07.07 at 9:55 pm

Got it, thanks, so various diagnostics and basic explorations. Yes, this would be relevant to those working with quan data.


Daniel Zaccariello 06.07.07 at 9:58 pm

I’m a first-time researcher with the first project of my very own (undergrad research internship). I’ll contribute something probably obvious, but it really chaps me right now, big time:

Estimate the time for data collection and double or treble it (at least). I’ve run into so many problems with unreliable and/or invalid measures in other datasets which seem to contain good info I could use. I’ve run into conflicting values for the same variables from different db’s from the *same* organization with no obvious differences in operational definitions or collection procedures (OECD data on ODA, for example…data on terrorist incidents for another example). Things like this take extra time to work out.

This is in Political Science. Quantitative analysis (if I ever get my data together…which seems like a doomed endeavor after 3 weeks of manually coding data all by my lonesome…)

In a meeting on a Friday I told my adviser “Sure, I think I can do that by Wednesday” HAHAHA. 2 weeks later….still not done.

Good luck w/ your intro


sarapen 06.07.07 at 11:02 pm

This lesson is almost always learned too late, but it can never be emphasized enough: Always check your equipment before recording an interview!

God, that 2 hour interview I lost because I didn’t check my tape that one time still haunts me. If my reminder prevents just one future cock-up then I will consider my loss to not have been in vain. Obviously this is an addendum to #5 above.

Also, always carry writing materials with you, or at least when you’re in the midst of research. Ideas pop up at some odd times and you’ll be glad you managed to jot down that flash of inspiration you had when you were in the canned goods aisle.

I can’t think of any more advice offhand. Try not to work at home? Definitely a bad idea for certain personality types.


John Quiggin 06.08.07 at 12:35 am

Good collaborators are crucial, and bad ones can lead to lots of time spent on an effort that goes nowhere. But as far as I can tell, trial and error is the only way to tell the two apart – neither reputation, nor publication record nor first impressions seem to be reliable.

I guess this means that goodness is at least in part a matter of personal fit. OTOH, after some failed collaborations I’ve found myself a member of a large group of exes.


vivian 06.08.07 at 1:03 am

(Don’t forget to plug the book, and maybe point us to your intro online when it’s done!)

Periodically during the project, look into your soul. What are you glossing over, handwaving about, deferring to think about later? Now face the hard fact that by not addressing these now, you’re setting them up to be show-stoppers.

To expand on dsq’s excellent advice, look for patterns in any way you can think of – if not graphs, then look for differences between subgroups, try unlikely combinations just in case they’re interesting. “Marinate yourself in the data” a prof used to say. It’s not just quant advice, it’s not just visualization, it’s about knowing the material, the data, the background.

Leave time for the unexpected. Sometimes it’s a disaster, but sometimes it’s an unexpected insight leading to fruitful but different kinds of analysis/conclusions.

and of course, Until you have the data imported into your analytical software on your machine (or whatever), cleaned and ready to crunch, you don’t actually have the data yet.


greensmile 06.08.07 at 3:11 am

Eszter: IQSS has 3 blogs one of which is dormant but all of which discussed quantitative social science topics from a researchers point of view. no time to fetch the URLs just now but can get them.


Mike Maltz 06.08.07 at 3:13 am

There seems to be a “round up the usual variables” attitude in my field, criminology, and I imagine in others as well. So we “know” that delinquency is strongly associated with poverty, low educational achievement, single-parent households, etc., but there is less willingness to look at the micro-level variables, at what causes this person to become delinquent while another person with the same characteristics does well. Different people face different challenges and opportunities in their lives, but these differences are often not picked up in very many studies.


Witt 06.08.07 at 3:38 am

I’d say don’t delegate everything. Data entry people are fine, grad students to do qualitiative interviews are fine, somebody else to code the interview data is fine. But if you don’t actually do some of ALL of the work yourself, nobody will have the vertically integrated *experience* (not just the view) of the project. And that can make all the difference.

Also, if there are layers of personnel between you and the raw data collection, go and observe. Make sure they know your face, your name, and that you care about the quality of the project. It’s much easier to persuade yourself that the P.I. doesn’t care about ethics or even quirks in the data collection process if you’ve never met him/her.


The Happy Revolutionary 06.08.07 at 5:53 am

Strength and quality of a research methodology is almost always inversely proportional to the complexity of the statistics involved.


Ciarán 06.08.07 at 6:47 am

This may be even too trivial for a ‘nothing too trivial here’ thread, but my discovery as a philosopher entering into the joys of content analysis was: gathering data takes a lot more time than you’d think, so don’t commit to a project with deadlines so tight that the work is well nigh on impossible to even complete, never mind giving yourself time to reflect on the data.


SG 06.08.07 at 7:44 am

This comes from health, so I don`t know if it`s relevant to “empirical social science”, but it got me once … never believe on face value the claims from collaborating service centres on things like the number of subjects they might see a day, how likely they are to interview subjects as part of their routine service provision, how likely uninvolved staff are to cooperate, and what factors may hinder the data collection process in their service. Particularly if there is grant money or publication credit involved, or the subjects are … complicated … or the topic of the research is … challenging.

(And if one is doing what witt suggests, one will see early on that this is an issue, and can find a way to get around it – but a research assistant may not tell you that recruitment is impossible if they think it will reflect poorly on them).


lees 06.08.07 at 9:28 am

One that has come back to bite me numerous times because I haven’t observed it: Document, or even over-document, everything in a data set. Sometimes this seems unnecessary, but five years later you won’t remember exactly how you defined a variable with a funny name like PCTGRRX that seemed so very obvious at the time.


Doug 06.08.07 at 11:00 am

Aw dang, I thought the elaboration of #4 was going to be “more pies and shorter hours and bollocks to almost everything else”…


Stuart 06.08.07 at 12:06 pm

Not really my area, but something I have always found helpful is prototyping a process – in this case I guess this would mean running through the entire process you plan on doing (from designing data collection, getting it, cleaning it, and then a first stage of analysing it maybe), which will allow you to find a large proportion of systemic issues you will encounter during a project, and reduce the risk of running into an issue late on that requires a redesign and restarting of the project (if that is even a possibility given time/cost constraints).


Eszter 06.08.07 at 1:00 pm

Lots of additional great points here. Time is always an issue. I don’t think I have ever seen someone NOT underestimate how long something will take. Equipment, so true (I’d add the need to have extra batteries always, and also checking storage before each case.) Backing up ASAP and in multiple ways is related. And staying close to the process is very important, I agree. I have found it extremely important to (1) keep regular contact with personnel; and (2) get involved in every stage of the process as Witt suggests. It’s not possible to be realistic about expectations and to know the possible pitfalls without doing this.

The bit about not believing claims by others is quite tricky. So the idea is to revise the figures/dates they give without relying on their judgement too much?


Kieran Healy 06.08.07 at 1:31 pm

The very obvious one is draw lots of graphs.

Yes. Look at your data.


Barry 06.08.07 at 1:48 pm

“This is, undoubtedly, common knowledge, but it is one of those lessons that seem difficult to learn. Consult a statistician during project design, not after data collection, even if you know some statistics.”

As a statistician, I really, really support this. There’s an old saying, ‘post hoc is post-mortem’, or ‘when the statistician is consulted after an experiment is run, frequently the only thing that he can do is to confirm the cause of failure’.

In addition, good data management will save a lot of time, and might make the difference between success and failure. Having an idea of what variables are to be collected, how data is to be linked between/within subjects, times, locations, facilities, interviewers, etc., might save a person-year or so of re-do work. .

Last: if you’re scaling up (e.g., from a single research/facility to multiple researchers/facilities), don’t assume that you can do things in the manner which worked for smaller, simpler studies. You wouldn’t approach building a large house in the same manner as building a garden shed; it works similarly in research.


joe 06.08.07 at 2:25 pm

Greetings from a GWU colleague of Henry’s. Two points when working with quantitative data.

1. When entering data from surveys one has administered on one’s own, don’t worry excessively about how to code responses to questions. Settle on a consistent coding scheme that captures the relevant variation in the data, and THEN when you are ready to analyze the data, let your software do the work of creatingn appropriate variables (e.g. dummy variables) by using logical statements to create new variables.

2. Don’t “hardwire” your data when you enter it. Subject to memory capacity and similar constraints, keep the data in as flexible a form as possible, because as one begins to do the analysis, one may often find a need to create new variables, and this is easiest to do when data are not entered into a pre-existing format. This can save time back-tracking later one.


SamChevre 06.08.07 at 3:46 pm

Another in the “obvious, but often missed” category: if your study could be applicable to a political controversy, be aware of the critiques made of studies by partisans on both sides. Yes, some of them are ridiculous nit-picks–but often some very good, but non-obvious, points are well-known to partisanns and ignored elsewhere. (For example, the wedge between “wages” and “compensation” has grown very fast in the last 20 years.)


RSA 06.08.07 at 4:50 pm

This may be either too obvious or too detailed, but one of my favorite papers from the 1990s is by Julian Faraway, on the cost of data analysis.

A regression analysis usually consists of several stages such as variable selection, transformation and residual diagnosis. Inference is often made from the selected model without regard to the model selection methods that preceeded it. This can result in overoptimistic and biased inferences.

The technical content is about regression analysis, but there are good general lessons as well.


Witt 06.08.07 at 7:13 pm

Another one I learned the hard way: Before you start the project, practice writing your final report. Seriously. I have found more gaps in my data collection by realizing that I wanted to be able to write a sentence that said “The average participant got a loan of $3000, although the loan program is nationwide and the impact of that $3000 varied greatly….” ooops. Maybe I ought to think about cost-of-living differences around the country and whether those need to be factored in.


Craig 06.09.07 at 12:49 am

Your second (or third or …) language skills are never as good as you think they are.


Chris 06.09.07 at 3:36 am

I suppose both of these are trivial too, but they’re probably the most important lessons I’ve learned over the years.

1.) When designing an experiment, keep in mind that it’s probably not going to be the only one you need to run. Make sure you leave yourself directions to go next.

2.) Related to 1), when designing a study, try to imagine the criticisms you’ll get. I usually think to myself, “What would reviewers say is missing, or needs to be controlled, what follow-up experiments would they want to see, etc.” Thinking about the flaws that other people will see is a good way of taking a step back and looking at a study with fresh eyes.


tom 06.09.07 at 9:54 am

Automate your data analysis. If you use SPSS this means learning syntax. You should be able to redo your whole analysis at the press of a button – invaluable when you realise you want to include/exclude/preprocess the data differently, or at review stage you just want to work out exactly how you got from the raw data to the graph on page 4


leederick 06.09.07 at 12:15 pm

The most important thing is to try to arrange your study so you get a result which other scholars will find interesting no matter how the data or evidence comes out.

That can be tricky in some circumstances. If you are doing routine tests of drugs and you find your drug cures cancer, that’s a real result – but if you find your drug doesn’t cure cancer, people are going to be less interested. The outcome you get, and whether you end up on the front page of Nature or page 4568 of Annals of the Properties of Obscure Compounds: Series H, basically depends upon luck.

Social scientists have an easier time of it because the theories they look at are usually more complicated than whether something has a treatment effect or not. If you think hard enough about the problem, you can often arrange things so your data will say something new or interesting about some aspect of the theory you’re trying to address however it comes out. And you get to say something worthwhile to say to your peers either way.

So my advice would be to be very wary of standard research design from the experimental sciences and what the statisticians you consult will say. Standard NP-hypothesis testing and study design will set you up for a situation where you are basically playing Russian Roulette for an interesting result. You’re making a gamble, and if you lose and the results say accept the null or don’t have a large enough effect size you’re not saying anything interesting.

The other thing I would say is to be prepared to be methodologically eclectic when if comes to statistics. Statisticians can’t agree on a theory of inference amongst themselves, so there’s no reason you should feel obliged to stick to one. By all means set up a hypothesis test before you collect the data, but be prepared to throw that out the window and focus your study around exploratory data analysis, or post trial statistics, or Bayesian analysis if this will give you a more interesting result to publish.


Peter 06.09.07 at 1:52 pm

Make sure that your software licenses are up to date and in order. Don’t wait for the last minute like so many folks do.

This one was one that my group dodged during one class, but every other group in the course was burned by it. During an IC design class, we completed the design work early in the semester (the other 2 wanted to take time off to visit family in the middle of the semester, so we had to start early). The software license for the design software expired about halfway through the semester, so that we were the only group that managed to complete the design work that semester. Because the other groups waited until about the last month of the semester to start, the expired license wasn’t rectified until after the semester was over (most serious CAE software expires annually). I’m sure the EE dept was ticked off that most of the course (all but 3) got an Incomplete that semester.

Like Sarapen, I also carry a notebook of some kind all the time. And like Daniel, I’ve had some embarassing incidents horribly underestimating how long it takes to perform some tasks.


Peter 06.09.07 at 5:06 pm

Robert Chambers, in his book “Rural Development: Putting the Last First” (Longman, 1983), has a nice, intemperate discussion of the practical issues which impact the collection and analysis of social science research data, and which are usually ignored in textbooks. His focus is on research in developing countries but the lessons apply more generally, particularly since the (sub-)culture of the researcher is almost never that of the research subjects.


trane 06.10.07 at 7:27 am

I second #29.


trane 06.10.07 at 7:29 am

And the (second or third) language skills you think are sufficient seldom are.


lees 06.10.07 at 9:44 am

In case you’re not familiar with it, I’ll mention the existence of a book, edited by Richard A. Seltzer, titled Mistakes That Social Scientists Make: Error and Redemption in the Research Process (New York: St. Martin’s Press, 1995). The chapters are retellings by various social scientists of some mistake they’ve made while conducting research.

Comments on this entry are closed.