Here’s some information for fans of “the capability approach”:http://www.capabilityapproach.com/Briefings.php: the “Dutch Environmental Assessment Agency”:http://www.mnp.nl/en/index.html released “a report”:http://www.mnp.nl/en/publications/2007/Sustainablequalityoflife.html that I co-wrote on how to conceptualise the quality of life for national policy purposes in affluent countries – we argue for a capability metric and are rather critical of the happiness metrics. I should add, though, that the proof is in the eating of the pudding, and we don’t have any funding to collect the necessary data that a capabilities-based index of the quality of life would require; our work remains at the conceptual level only. It may well turn out that we would need a very long questionnaire in order to collect all data, which in turn might jeopardize the viability of a capability-index of quality of life (since the non-response-rate would be higher). And there are more problems to solve before we would arrive at a capability index, certainly one as (relatively) easy to measure as either GDP per capita or happiness indicators. Anyway, if anyone has more money and more time and thinks this is a fun project to pursue, let me know what comes out of it.
From the category archives:
Statistics
The “Financial Times”:http://www.ft.com/cms/s/0/0d758a1e-8af8-11dc-95f7-0000779fd2ac.html, for reasons best known to itself, serves up this “steaming heap of buffalo dung”:http://www.ft.com/cms/s/0/0d758a1e-8af8-11dc-95f7-0000779fd2ac.html from Heritage Foundation Vice-President for Governmental Affairs, Michael Franc, on its op-ed page.
A legislative proposal that was once on the fast track is suddenly dead. The Senate will not consider a plan to extract billions in extra taxes from mega-millionaire hedge fund managers. … Far from embarrassing, this episode may reflect a dawning Democratic awareness of whom they really represent. For the demographic reality is that, in America, the Democratic party is the new “party of the rich”. More and more Democrats represent areas with a high concentration of wealthy households. Using Internal Revenue Service data, the Heritage Foundation identified two categories of taxpayers – single filers with incomes of more than $100,000 and married filers with incomes of more than $200,000 – and combined them to discern where the wealthiest Americans live and who represents them. …Democrats now control the majority of the nation’s wealthiest congressional jurisdictions. More than half of the wealthiest households are concentrated in the 18 states where Democrats control both Senate seats. …
Soon this new political demographic may give traditional purveyors of class warfare the yips. To comply with new budget rules, liberal Democrats on Capitol Hill are readying a tax increase of at least $1,000bn over the next decade. Ms Pelosi says she wants to extract all of this from “the wealthy”. When has a party ever championed a policy that would inflict so much pain on its own constituency? At what point will affluent Democrats crack and mount a Blue State tax rebellion?
A hint to the people at the _Financial Times_ (a group whom I usually hold in considerable esteem): when an op-ed’s argument rests on a statistical fallacy so howlingly awful that it’s obvious to someone like me, it’s not a good idea at all to publish it. More generally, when a Heritage Foundation vice-president proposes a piece to you, et numeri ferentes, _especial_ attention to the basis of those figures is not only warranted but necessary. If it is true that Democrats tend to represent richer districts, both basic logic and an elementary grasp of statistics should tell you that this _does not imply_ that they represent richer voters. Indeed, not only does it not imply this, we _know_ that it isn’t true. See further “Andrew Gelman et al.”:http://www.stat.columbia.edu/~gelman/research/published/red_state_blue_state_revised.pdf
Do richer voters still support Republicans? If so, how can we understand the pattern that the Democratic do best in the richer “blue states” of the Northeast and West, while the Republicans dominate in the poorer “red states” in the South and between the coasts? … The Republicans have the support of the richer voters within any given state but have more overall support in the poorer states. Thus, the identification of rich states with rich voters, or more generally, the “personification” of so-called red and blue states, is misleading. For example, in the context of the Brooks quotes above, within an “upscale” area that supports the Democrats, the more “upscale” voters are still likely to vote Republican. … The pattern that richer states support the Democrats is _not_ a simple aggregation of rich voters supporting the Democrats.
And so on. I have a lot of respect for the _Financial Times_ – they do a better job in my estimation than any other newspaper of preserving a high quality of debate and are usually quite scrupulous about factual detail. This makes it all the more odd that a piece like this, which is better suited for publication in an introductory statistical textbook as a particularly egregious cautionary example of the fallacy of aggregation, could have made it in. Better filters please.
Republican Internet consultant Patrick Ruffini “points”:http://www.techpresident.com/blog/entry/11033/radiohead_republicans to this “fascinating resource”:http://www.facebook.com/flyers/create.php for figuring out the raw numbers of liberal, moderate and conservative Facebook users interested in a specific issue. Don’t try to create a flyer or whatever – just go to the “targetting” section, type the topic that you are interested in into the keywords section, and see how the numbers change whether you click Liberal, Moderate and Conservative (there’s further microtargeting of cities etc available too). For example, about 2,520 self-declared liberal Facebook users declare blogging as one of their interests, as opposed to 1,320 moderates and 1,100 conservatives. 5,180 liberals show the good taste to declare My Bloody Valentine as one of their favourite bands, as opposed to 1,120 moderates, and only 340 conservatives. Less obviously, the number of liberals (7,300) and conservatives (7,580) who like bluegrass music is about the same1. Obviously, treat these numbers with extreme caution; there is _no way_ that Facebook users are a random sample of the population 2, but still, this promises much idle entertainment.
1 It occurs to me on re-reading this post that I’ve phrased this in a misleading way – obviously, if you wanted to make a serious point about this, you’d weight the absolute numbers or provide the odds ratios or whatever.
2 For one, the liberal-conservative ratio is skewed to liberals among Facebook users as compared to the ratio in the general population – there are just over 2.8 million self-identified liberal Facebook users and 2.18 million conservatives. Most survey evidence that I am aware of suggests that there are considerably more self-identified conservative Americans than liberal Americans (although the numbers of self-identified conservatives is dropping).
Ow, ow, ow. Comment 2 is also pretty funny. Actually, the whole thread is hilarious.
“Megan”:http://fromthearchives.blogspot.com/2007/08/sampling-bias.html of _From the archives_ won’t be surprised that “this _NYT_ article”:http://www.nytimes.com/2007/08/12/weekinreview/12kolata.html?_r=1&oref=slogin, claiming that:
One survey, recently reported by the federal government, concluded that men had a median of seven female sex partners. Women had a median of four male sex partners. Another study, by British researchers, stated that men had 12.7 heterosexual partners in their lifetimes and women had 6.5. But there is just one problem, mathematicians say. It is logically impossible for heterosexual men to have more partners on average than heterosexual women. Those survey results cannot be correct.
is already “getting”:http://ezraklein.typepad.com/blog/2007/08/i-caught-a-fish.html “play”:http://www.chrishayes.org/blog/2007/aug/13/im-back/ in the blogosphere. The only thing is that it _isn’t_ logically impossible, at least as the author presents it. Ask “Andrew Gelman”:http://www.stat.columbia.edu/~cook/movabletype/archives/2007/08/medians.html
Jeff’s response: MEDIANS??!! Indeed, there’s no reason the two distributions should have the same median. I gotta say, it’s disappointing that the reporter talked to mathematicians rather than statisticians. (Next time, I’d recommend asking David Dunson for a quote on this sort of thing.) I’m also surprised that they considered that respondents might be lying but not that they might be using different definitions of sex partner. Finally, it’s amusing that the Brits report more sex partners than Americans, contrary to stereotypes.
Via “John Gruber”:http://www.dashes.com/anil/2007/07/pixels-are-the-new-pies.html I see “Anil Dash”:http://www.dashes.com/anil/2007/07/pixels-are-the-new-pies.html wondering about the trend toward “square blocks of color … being used to represent percentage-based statistics instead of the traditional pie chart.” Like this.
I’d seen the one on the left — from a “New York Times story”:http://www.nytimes.com/2007/07/29/magazine/29wwln-lede-t.html?ref=magazine about beliefs in the afterlife, and wondered about it, too. The white block in the middle of the Times graphic presumably represents “Don’t Knows” but it is not labeled. This is especially odd in the context of belief in the afterlife, as agnosticism is a recognized point of view and so not equivalent to “Don’t know” answers on other survey questions.
The main problem with this style of presentation is that it uses two dimensions to display unidimensional data. As the graphic on the right, especially, makes clear, the layout of the subcomponents of the graph is arbitrary. Maybe laying out responses on a line is impractical in a newspaper column. This is one reason pie charts are popular, but their problems are well known. (Word to the wise: don’t use them.)
“Mosaic plots”:http://rosuda.org/~unwin/Japan2003/UnwinISMTokyoNov03mosaic.pdf superficially resemble the ones pictured here, and they are sometimes used to very good effect. But the whole point of a mosaic plot is that it visually represents several categorical variables at once. It’s a picture of an n x n table, in other words, where the sizes of the blocks reflect the cell values in the table. “Here’s an example.”:http://www.stat.auckland.ac.nz/~ihaka/120/Lectures/lecture17.pdf Even here you have to be careful interpreting the results. But the boxes above take this kind of picture but use it with only one variable, which doesn’t make any sense at all.
(Initial bad temper warning: I am a little bit cross as I write this, because I think that the distribution of the paper on the Michelle Malkin website was both silly (because the paper has huge flaws that a mass audience can’t possibly be expected to understand) and rude (because at the time when he gave permission for it to be distributed, David was soliciting comments, seemingly in good faith, from the Deltoid community, aimed at improving it before distribution). The Malkin link has meant that this paper has metastatised and I will therefore presumably be dealing with cargo-cult versions of it by people who don’t understand what they’re talking about from now to the end of time. I see that Shannon Love of the Chicago Boyz website is claiming to have been “sweetly vindicated”, FFS. Ah well, the truth has now got its boots on, and big clumpy steel toe-capped boots they are too. C’mon boots, let’s get walking.)
[click to continue…]
Unclear why exactly:), Michael Froomkin asks the question:
What would be the most unattractive job in the regular economy? I’m not talking about the objectively least-well paid or statistically most dangerous, or most unpopular (car salesman?). I mean, what job would you least like to have. No fair saying subsistence farmer in Darfur either — I mean in the US (or other developed economy).
His response: toll booth attendant.
I was going to pass over this, but I am a shallow person. Fresh from “schooling me”:https://crookedtimber.org/2007/07/15/dept-of-being-savaged-by-a-dead-sheep/ on the treatment of outliers, Megan McArdle has expanded her ambition and now “takes Cosma Shalizi to task”:http://www.janegalt.net/archives/009901.html for his “bizarrely beside the point” “views”:http://cscs.umich.edu/~crshalizi/weblog/495.html on the heritability of IQ, the statistical estimation and interpretation of _g_, and his failure to understand the analytical methods of “the serious IQ guys.” Megan may not be aware that I “taught”:http://www.stat.cmu.edu/~cshalizi/754/ “Cosma”:http://www.santafe.edu/profiles/?pid=236 “what little”:http://www.cscs.umich.edu/~crshalizi/prob-notes/ he “knows”:http://www.cscs.umich.edu/~crshalizi/research/ about statistics. He’s also much nicer than me. So she’ll have no trouble disposing of him.
_Update_: Yeah, on second thoughts I should have just passed over this.
Someone I believe to be Megan McArdle weighs in at the Economist blog on the laughable graphic run by the WSJ the other day. Brad DeLong is not impressed, nor is Mark Thoma (in part because comments are misattributed to him in the post), and nor am I. She singles me out for membership in “a special category of wrong,” I think mostly because my Ph.D is in sociology and not economics.
By now you’ve probably all seen this ridiculous graphic from todays’ WSJ, which purports to show that the Laffer curve is somehow related to the data points on the figure. “Brad DeLong”:http://delong.typepad.com/sdj/2007/07/most-dishonest-.html, “Kevin Drum”:http://www.washingtonmonthly.com/archives/individual/2007_07/011682.php, “Matt Yglesias”:http://matthewyglesias.theatlantic.com/archives/2007/07/worst_editorial_ever.php, “Mark Thoma”:http://economistsview.typepad.com/economistsview/2007/07/yet-again-tax-c.html and “Max Sawicky”:http://maxspeak.org/mt/archives/003184.html have all rightly had a good old laugh at it, because it’s spectacularly dishonest and stupid. I just want to make a point about so-called outlying cases, like Norway.
[click to continue…]
I am working on the Introduction to an edited volume on the nitty-gritty behind-the-scenes work involved in empirical social science research (to be published by The University of Michigan Press in 2008). While each chapter in the book gets into considerable detail about how to approach various types of projects (from sampling online populations to interviewing hard-to-access groups, from collecting biomarkers to compiling cross-national quantitative data sets), I want to address more general issues in the introductory chapter.
One of the topics I would like to discuss concerns larger-level lessons learned after conducting such projects. The motivation behind the entire volume is that unprecedented things happen no matter the quality and detail of preparation, but even issues that can be anticipated are rarely passed along to researchers new to a type of method. The volume tries to rectify this.
I am curious, what are your biggest lessons learned? If you had to pick one or two (or three or four) things you really wish you had known before you had embarked on a project, what are they? I am happy to hear about any type of issue from learning more about a collaborator’s qualifications or interests, to leaving more time for cleaning data, from type of back-up method to unprecedented issues with respondents. If you don’t feel comfortable posting here, please email me off-blog. Thanks!
Really rather shameful. Riyadh Lafta, one of the co-authors of the Johns Hopkins/Lancet studies on excess deaths in Iraq, has been refused a transit visa for his flight to Vancouver to make a presentation on alarming increases in child cancer. He was apparently meant to be passing on some documentation to some other medical researchers who are going to write a paper with him on the subject; the presentation was happening in Vancouver because Dr. Lafta had already been refused a visa to visit the USA.
What on earth can be in this data? Presumably the UK and US authorities have reasoned that Dr Lafta is an ex Ba’ath Party member (as he would have had to have been to hold a position in the Iraqi Health Ministry), and thus the data he is carrying is not really about child cancer at all. Perhaps he is involved in some sort of “Boys from Brazil” type plot to clone an army of super-soldiers from Saddam Hussein’s DNA, and for this reason the UK cannot be exposed to this deadly information for even four hours in the Heathrow transit lounge.
The alternative – that Dr Lafta is being intentionally prevented from travelling in order to hush up his research on post-war deaths (research which even the Foreign Office have now more or less given up on trying to pretend isn’t broadly accurate), or to hush up the news about paediatric cancer for political convenience – is too horrible to contemplate. I’d note that there isn’t an election on in the USA at present, so the denialist crowd can shove that little slur up their backsides this time too.
(thanks to Tim Lambert as always)
In semi-related news, and with apologies to the person who gave me the tip for taking so long to post it, it appears that Professor Michael Spagat, the author of the “main street bias” critique, has a bit of previous form when it comes to making poorly substantiated and highly inflammatory statements about other people’s research. His involvement with the general issue came about because he’d been using some of the IBC data in support of a power law hypothesis[1] about the scaling of violent deaths. This carried on from previous work he’d done on Colombia, where he had also defended his own somewhat tendentious interpretation on the data by slagging off Human Rights Watch. I sense something of a pattern here; I noted in a previous post that although the “main street bias” critique appeared in the Lancet colloquium on the Burnham et al paper, Prof. Spagat himself did not, and I thought at the time it might be because of this habit.
[1] And one of Prof Spagat’s co-authors on the main street bias paper, and a few others in the power law of violence series was Neil Johnson of Oxford University, who was also a co-author of that paper about the Eurovision Song Contest that we had a go at a while ago, and so the circle of minor irritation is complete.
“Jeff Han”:http://cs.nyu.edu/~jhan/ftirtouch/ works on multi-touch interfaces: touch screens that can recognize more than one point of input, and thus combinations of gestures and so on. Here’s a “cool video”:http://www.macrumors.com/2007/02/12/more-multitouch-from-jeff-han/ showing some of the interface methods his company is developing. (Warning: cheesy music.)
You can see some cool possibilities for educational and presentational bells and whistles, such as the taxonomic tree one of the operators is shown navigating. The possibilities for high-dimensional dynamic data visualization are also obvious. We see a scatterplot being manipulated with some scaled data points on it, for instance. Something like “Ggobi”:http://www.ggobi.org/ would be fun to use on a system like this. In the near future, the phrase “touchy-feely” may well apply to the quantitative rather than the qualitative crowd.
LOL, best headline so far this year. The story describes two related but not identical issues. First that the Home Office statistics function is doing a piss-poor job of managing the ‘data’ it uses to back up its policies, namely the unreliablility of data-sets used to report on ASBOs, and other crime, prisons and immigration data. Secondly, the story brings in some more recent HO blunders on tracking crimes committed by Britons abroad, which would seem to have more to do with international data-sharing on criminal records than the HO’s statistical function.
There is a related but unmentioned issue; recent and long overdue moves to make the UK’s Office for National Statistics an autonomous agency that is completely independent of government. Now, as far as I understand it, the NSO does not have responsibility for statistics related to criminal justice, and perhaps it never will. But the current shambolic state of affairs at the HO shows that the only policy numbers worth having are those prepared independently of the advocates of that policy. As we all know, the incentive to cook the books or ignore data that doesn’t support the minister’s/civil servants’ desired policy is just too strong.