Two-point scales

by John Q on August 9, 2006

I’ve been reading Steven Poole’s Unspeak and he observes that having introduced a five-level color coded terror alert, the government has never used the top level (red) or the bottom two levels (blue and green). The obvious reason is that a red alert would require some specific action, while a move to a blue or green level would imply that there was some prospect of the War on Terror actually ending.

I’ve noticed much the same phenomenon with 5-point grading scales for worker performance, such as those used in the Australian Public Service for a while. A top score suggests a requirement for some kind of substantial reward, so these are rare, while a score of 4 or 5 implies a need for counselling and a possibility of dismissal. So just about everyone gets a 2 or a 3, yielding, in effect, a two-point scale.

I imagine someone in psychometrics must have studied this kind of thing in general. Any pointers?

Update James Joyner at Outside the Beltway made the same point a couple of years ago. BTW, I saw a fun movie clip with an earnest PR type talking about the creation of the color code, maybe posted by Eszter. I couldn’t find it on a quick search. Can anyone remember this?

Yet further update One day after I posted this, the Red Alert level has finally been used, but apparently only for commercial flights from Britain to the US, in response to the announcement by British authorities that they have detected a terrorist threat to blow up planes.

{ 2 trackbacks }

Outside The Beltway | OTB: 08.09.06 at 1:11 pm
Kieran Healy’s Weblog » Blog Archive » Queueing for Terror: 08.11.06 at 5:34 am

{ 43 comments }

1 Matthew 08.09.06 at 2:24 am: Can’t help with the links (though I remember reading that the existence of 3 sizes of soft drink is to make people choose the ‘regular’ in the middle, which in fact is a large size).

I’d thought I’d share the news that my Dad purchased a cheap fan for his office, which proudly declares itself a ‘Three-Speed Fan’, where the speeds are 2,1 and Off.
2 nick s 08.09.06 at 2:27 am: I can’t think of any citations, but I wouldn’t be surprised if there’s been analysis of, say, the ‘stars’ awarded in Rolling Stone‘s album reviews. Or any kind of subjective review using either five stars, a 1-10 scale, or a percentage value.
3 engels 08.09.06 at 3:19 am: Their One-Speed Fans must really suck…
4 Simon 08.09.06 at 3:25 am: If he’s referring to the UK, the top level has been used, for about a month after 7/7.
5 bad Jim 08.09.06 at 4:04 am: I’m damaged.

Around 1971 I was paid ten bucks to participate in a psychology experiment at UC Berkeley, where I was a student. It was basically the usual I shock you, you shock me sort of set-up. The shocks hurt.

We were each judging the other’s performance, and my ploy was to turn down the gain, to reduce both his and my penalty. (I didn’t know then that I was already Rapoport’s acolyte). In other words, I tried shocking the other guy less than he was shocking me. (There was no other guy, of course. Only my neurons were harmed in the making of this experiment.)

Apart from that, the point might be that, whereas Mom & Dad might have been the best that ever was, what room is left over for Beethoven or Picasso?

So: on a scale of 1 to 5, the people you know are most likely middling. If you’re shocking someone else, try to keep it under 3, at least.
6 Steven Poole 08.09.06 at 4:29 am: Simon, the passage John Q is referring to is discussing the US threat-alert system. The British government has only just made its own system public. It will be interesting to see how it is used.
7 Timothy Scriven 08.09.06 at 5:14 am: That’s why we should use ten point scales.
8 Reinder 08.09.06 at 5:29 am: Star ratings in music magazines use the entire spectrum, although the extremes are naturally rarer than the middling ratings. Music reviewers also like to occasionally declare that something they’ve been listening to*) sucks to high heaven, so they have an incentive to give 1- or zero-star reviews. But then, they have no disincentives either way; they can’t fire an album that sucks, and don’t have to do followup interviews with someone who has just made a brilliant album.

*) probably a very charitable assumption on my part, that.
9 Stuart 08.09.06 at 5:46 am: The question that always springs to mind when discussing of the US alert system comes up, is what rating would likely to have been in use on the morning of 11/9/2001 (or 9/11/2001 for the US people). I presume it would likely be a 3 (or even a 2 maybe given the relatively lower priority of ‘terror’ at that point in time over there). There seemed to be no indication from the government or media that there was a particularly high level of threat considered at the time.

If the warning level would have been at 2, 3 or at most 4 on that day, it makes you wonder what possible use the system is – as 3 and 4 are used so commonly now it would make little or no change on any of the publics actions, and the only time 5 is likely to be used is just after an attack has either succeeded or publicly been broken up. So 1-4 is more or less meaningless in effect, and 5 is a reminder that an attack just happened.
10 Dave F 08.09.06 at 6:24 am: Maybe it’s because the threat hasn’t been reduced. This seems to be yet another paranoid conspiracy theory. There’s a lot of it about, both left and right.
11 Steven Poole 08.09.06 at 6:31 am: It’s not a “paranoid conspiracy theory” if you are aware that the first Secretary of Homeland Security, Tom Ridge, publicly complained about political pressure on his department to raise the threat level when he didn’t think the intelligence warranted it.

Obviously it’s impossible ever to set the level at 1 or 2, in case an attack happens out of the blue and then you look foolish and worse.
12 harry b 08.09.06 at 6:40 am: At UW Madison we grade students on an A, AB, B, BC, C and D scale. My understanding is that A is widely used, and D rarely. I don’t understand why there is no CD option, by the way.
13 Alex Gregory 08.09.06 at 7:00 am: Another example – although one of which interpretations can vary wildly – is the degree classification system in the UK, in which it seems about 90% of people doing an arts subject get a 2:1. (and I also know that my university is hoping to change the grading system for the sciences so that it becomes similar)
14 Matt McIrvin 08.09.06 at 7:05 am: The PG-13 rating was added to the MPAA’s movie content rating system in part because there was a perception that the system had effectively shrunk to two useful ratings, PG and R: after a period in the 1970s when G and X ratings were actually given to well-respected movies for adults, usage had shifted so that G was only given to obvious kiddie movies and X to obvious porn.

Since the introduction of PG-13, PG has been gradually turning into a second kiddie rating, though it seems as if the system hasn’t yet completely degenerated to two ratings again. I suppose having two grades of kids’ movies does impart some information.
15 Steve 08.09.06 at 7:28 am: Deleted. As previously advised, anything else from you will get the same treatment. JQ
16 JRoth 08.09.06 at 7:33 am: Two observations (and sorry none of this is of the technical nature you asked for):

On grading, I still can’t get over my wife’s contention that, at grad school, everyone gets an A except for a few Bs, and C is effectively failing (others have backed up this assertion; sorry if you all think it’s simply not true, I don’t know). She argues that only exceptional students – A students – belong in grad school, while I argue that grades are always relative – even in the Manhattan Project, there was a bell curve of talent, however far over on the scale of general human achievement.

As for stars, I can speak as a professional (free-lancer, anyway): as a restaurant reviewer, 90% fall between 2.5 and 3.5 stars on a 4 star scale. A lot of this is selection bias – why would we subject ourselves to a crummy meal? – and a lot of it is the consequences of giving 1 star (0 is not an option). 1 star seems to suggest near-inedibility, and certainly results in angry calls to the editor. Meanwhile, 4 stars are reserved for essentially perfect meals (not necessarily haute cuisine – a cafe brunch recently got 4 stars), perhaps one every 6 months. So everyone else gets shunted into this absurdly narrow range. Our predecessor actually gave everyone 3 stars as a protest against the star system.
17 tps12 08.09.06 at 7:51 am: I Somewhat Agree with this post.
18 johm 08.09.06 at 8:09 am: Bell curve
19 joel turnipseed 08.09.06 at 8:18 am: Jordan Ellenberg’s fascinating piece on grade inflation in Slate a while back is probably worth a read in this context: http://www.slate.com/default.aspx?id=2071759.
20 bi 08.09.06 at 8:55 am: steve: it’s not about future events, but what some people want to portray to US citizens as its prognosis of future events. That much should be obvious.

(But hey, this is Bush we’re talking about! Every word from the Bush administration must be interpreted in the most charitable light possible, while every word from opponents of Bush must be interpreted in the most uncharitable light possible. To do otherwise will be tantamount to high treason.)
21 Antti Nannimus 08.09.06 at 8:56 am: Hi,

The magical number is seven, plus or minus two.

http://www.well.com/~smalin/miller.html

Have a nice day,
Antti
22 Richard Bellamy 08.09.06 at 8:56 am: I remember that a friend had a “card trick” in which he held up a card reading:

“PICK A NUMBER

1 2 3 4”

I picked “3”, and he turned the card over, and it said, “I knew you would pick 3!”

I asked what the trick was, and it turned out there was no trick. I watched him show the card to several other people, and they all picked “3”, too.

I’m not sure how, if at all, that relates.
23 apthorp 08.09.06 at 8:59 am: “where all of the children are above average”, and the truely exceptional move out. Society needs to be “good”, while recognizing the existence of “bad” so it can be defended against and the “great” so it can be isolated and not make the good feel bad.
24 eweininger 08.09.06 at 9:07 am: So just about everyone gets a 2 or a 3, yielding, in effect, a two-point scale.

This is sort of the opposite of what might be called the Spinal Tap Principle of metric-ology, whereby the introduction of extreme values into an (arbitrary) conventional scale makes it more attractive: “but it goes to eleven, man!”
25 Ginger Yellow 08.09.06 at 9:35 am: Funnily enough, somebody has just posted an analysis of the review scores given by two gaming websites: http://www.metafuture.com/
26 Steven Poole 08.09.06 at 9:47 am: I’ve worked as a music and theatre critic for newspapers who use a five-star scale. I tried to reserve my 5s for really mindblowingly outstanding things (ie one or two a year), but other critics were giving 5s out regularly to stuff they clearly (by the text of their reviews) thought was “merely” very good, so I looked ungenerous giving the same things 4, which in others’ usage apparently meant just “good”. And what did 3 mean? Average or mediocre or quite good or actually not very good? Any of the above, depending on the critic. Reviews given one star, meanwhile, were the most likely to be spiked, unless they were of highly anticipated or puffed things. The whole system of awarding points out of 5 to artistic endeavours is, of course, a philistine farce.
27 derek 08.09.06 at 12:57 pm: A counterexample to the two-point population huddled in the middle of a five point field, is Daily Kos’s now defunct rating system: for those who did not have special troll-bashing privileges, the ratings ranged from 1 (“I’d give you 0 if I could”) to 4 (“Excellent”).

4 became the expected rating, such that commenters complained bitterly about getting a mere 3 (“Good”), and demanded an explanation why they didn’t deserve a 4. 2 was similarly scorned, because if you disliked a comment enough to rate it down, you disliked it enough to give it the lowest rating within your powers.

Markos eventually abandoned the system and now has a literal two-point scale of “Recommend” and “Troll”.
28 Jake 08.09.06 at 1:07 pm: while I argue that grades are always relative â€“ even in the Manhattan Project, there was a bell curve of talent, however far over on the scale of general human achievement.

This is probably not the case. When you have a selected population like the physicists on the Manhattan Project, or the students in a top-ranked graduate program, you’re left with the only the left tail of the curve. If you rate your people on a 1-5 scale, you should expect the most 1s, fewer 2s, even fewer 3s, etc.
29 Ryan Miller 08.09.06 at 2:38 pm: Sorry that I don’t have anything to add to the general stats bleg, except perhaps to note that this is one reason social scientists should stop treating ordinals as ratio variables for analysis (especially common in the education literature, at least).

But one additional data point is that in the world of competitive college debate (and to some extent in HS), a nominally 0-30 scale has been reduced to a usable range of 27.5-29, with many judges begging for the freedom to award quarter points (they are currently restricted half-point steps), so that, presumably, the scale can be compressed yet further. I guess this means that college debate judges are stuck on the short end of 7+-2.
30 Nash 08.09.06 at 3:02 pm: richard bellamy, are you sure that’s what your friend actually said? I’m guessing you have added one thing that they didn’t say, because there is a well-known “trick” in which the tester says
“pick a number from 1 to 4”

“Three” is the response of something over 85% of the subjects tested.

The reason? The power of verbal suggestion. In saying “one to four,” we hear “one two four” and most subjects dive for the “safety” of the only number not named in the test: three

Try it on a friend sometime, works like magic. Don’t even need a card to do it, but the card is a nice touch.

Speaking of card tricks, has anyone ever had you call the “wizard”? Far and away the best card trick and the one that most baffles and amazes that I’ve ever performed.
31 John Emerson 08.09.06 at 4:13 pm: Haven’t read the comments, but my belief is that there are never more than three priorities, and the third priority is there just to make the second priority not feel like such a loser.

Prioritizing is the choice between #1 and #2. When a new time for making decisions arrives, a new set of priorities will be put together.
32 The Continental Op 08.09.06 at 5:37 pm: has anyone ever had you call the â€œwizardâ€?

That used to be my favorite too. I learned it from my uncle when I was a kid. Haven’t thought of it in years.
33 anno-nymous 08.09.06 at 6:46 pm: Fun Movie Clip: You’re thinking of http://www.zefrank.com/redalert/ . I’m pretty sure it’s been featured on CT before.

Ze Frank, by the way, is the host of (in my opinion) the only Video Blog worth watching regularly, http://www.zefrank.com/theshow .
34 Simon 08.10.06 at 2:19 am: The British government has only just made its own system public. It will be interesting to see how it is used.

I worked for the police for about six months last year, during which I had access to the threat classification (it was no great secret internally, being posted on the inner door as we entered the building). The threat has mostly been severe general (second highest) since 9/11, but was infamously lowered to substantial about a month before 7/7 then raised to the maximum afterwards, where it remained for a further month. They also had separate classifications for particular groups, eg dissident and indeed non-dissident Irish Republicans, for both of whom the threat was raised to substantial for a brief period after the IRA bank robbery.

Thanks to this morning’s news, the threat is back to critical.
35 Ben 08.10.06 at 4:21 am: I’d give this post 4/5.

It’s something I’ve definitely noticed in music reviews. I write for both The Oxford Student (newspaper) and http://www.uk-fusion.com and both officially use a 5* system. The latter avoid problems by using halves and even quarters, while the former regularly give 1* or zero to anything they take a dislike to.

I have also noticed the same tendency with student grades, though there it’s obviously less justifiable to give zero to anyone you take a dislike to! It’s particularly hard when grading Americans and not really knowing the scale; or, as I had this summer, ‘poor, average, good, very good, superior’. When average = 2/5 you wonder whether it’s really the expected mode, or a euphemism for mediocre…
36 James Joyner 08.10.06 at 6:51 am: I note the sad irony that we’ve been partially overtaken by events on this one as well. The Brits have actually gone to “critical,” although it’s not clear whether it’s system-wide. As you note, we’re at red for transatlantic flights and orange generally.
37 Richard Bellamy 08.10.06 at 8:56 am: richard bellamy, are you sure thatâ€™s what your friend actually said? Iâ€™m guessing you have added one thing that they didnâ€™t say, because there is a well-known â€œtrickâ€ in which the tester says
â€œpick a number from 1 to 4â€

Possibly, and could be a related mental barrier to the thought of ranking from “one to five” (with five being the highest) to avoid the numbers one, two, and five.
38 nick s 08.10.06 at 9:23 am: The iTunes five-star ratings system seems flawed for similar reasons, because (unlike reviewers) people who have music on their computers are unlikely to keep stuff around that’s at the bottom end of the scale.
39 nick s 08.10.06 at 9:26 am: Oops. Submitted too early. I think that iTunes would be better served with a two-and-a-half point scale that uses relative comparisons between the current track and the the preceding one: in essence ‘better’, ‘worse’ and ‘no opinion’.
40 Nash 08.10.06 at 9:30 am: has anyone ever had you call the â€œwizardâ€?

That used to be my favorite too. I learned it from my uncle when I was a kid. Havenâ€™t thought of it in years.

All of my family know to expect a call to the wizard at any time, day or night. Even wakened from his or her slumbers at 3 AM, the wizard has always come through.

I’m glad someone else knows the wizard.
41 Uncle Kvetch 08.10.06 at 11:26 am: The iTunes five-star ratings system seems flawed for similar reasons, because (unlike reviewers) people who have music on their computers are unlikely to keep stuff around thatâ€™s at the bottom end of the scale.

Actually, I find it quite handy. Sometimes I download lots of stuff at once, including “song of the day” podcasts by artists I’m not familiar with, and then listen to it on the iPod. If something new pops up that I don’t like and don’t want to keep, I immediately give it a single star. Then every once in awhile on iTunes I search out all the one-star tracks and delete them.
42 Craig Ewert 08.14.06 at 3:38 pm: Antti wrote:

Hi,

The magical number is seven, plus or minus two.

http://www.well.com/~smalin/miller.html

Have a nice day,
Antti
This result is irrelevant. The seven +/- two rule is for remembering bits of data, not for making rankings of things.

As the Spinal tap example hints, and as the old “on a scale of one to ten, she’s a twelve” joke also indicates, you should ignore the arbitrary boundaries often placed on these scales. Are movies rates from 1 to 4 stars (actually seven ranks, from 1 to 4 by 1/2 steps)? Then occasionally hand out a 6, or a -3. You know in your heart that some are just that bad, or that good.

Likewise with every other ranking/judging metric you have ever encountered.
43 Craig Ewert 08.14.06 at 3:39 pm: Dammit, I can’t even operate a comment box.

I give myself a -2 in blog commenting.

Comments on this entry are closed.

Two-point scales

Recent Comments

Search

Archives

Pages

Book Events

Contributors

Fine Print

Lumber Room

Old Wood

Meta

Recent Posts

Tags