My friend Jo Wolff has “a column over at the Guardian’s Comment is Free”:http://www.guardian.co.uk/commentisfree/2009/aug/03/students-university-dumbing-down section taking issue with Phil Willis MP, who chairs the Parliamentary Select Committee on Innovation, Universities, Science and Skills which just issued its report on “Students and Universities” (downloadable “here”:http://www.publications.parliament.uk/pa/cm/cmdius.htm ). Jo is upset with Willis and his committee for two reasons: first, because they suggest that the marked increase in the proportions of First Class and 2.1 degrees is the result of grade inflation; second, because they were sceptical about whether a degree of the same class from different institutions are necessarily of equal value.
Jo’s response to the first “dumbing down” point is to cite the benign influence of the government quango the Quality Assurance Agency on teaching standards in Universities. I can’t actually work out if Jo’s QAA argument is sincere or an attempt at heavy irony. In any case, I wonder if he’s actually read what the report says about the QAA. It argues — surely correctly — that the QAA has been more focused on processes than on what happens in the classroom and that it has been quite easy for universities to tick the quality-assurance boxes whist actual teaching quality goes unexamined. That’s one part of the report that is surely right. At the time of “subject review” I think only about 1/19th of a department’s score was a result of classroom observation and the rest was attributable to a lengthy paper-trail which we all laboured for months to assemble. But whatever, Jo knows as well as I do that the incentive structure that government has put in place for higher education in the UK has not favoured quality of teaching but rather research. Teaching has been seriously underfunded and there has been little to no payoff in terms of career advancement open to good teachers. In the circumstances it really is hard to believe that the across-the-board improvement in student grades is the result of better teaching. (No doubt any increase in performance in the UCL philosophy department is attributable to better teaching, but Jo needs an argument for the sector as a whole!)
Jo’s response to the second point — cross-institution comparability of standards — is to say “so what?”. He may be right. But what infuriated the Select Committee was that they asked the question of university leaders and failed to get a clear answer. Rather they were faced with obfuscation and evasion. Perhaps if Jo had been interviewed he would have given the committee a clear, reasoned and satisfactory answer as to why it doesn’t really matter: the Vice Chancellors of Oxford and Oxford Brookes universities weren’t willing or able to do so.
{ 75 comments }
Jo Wolff 08.03.09 at 5:25 pm
Thanks, Chris, for your post on this. I should clarify my position, which was based only on my reading of the Sunday newspapers and not the report itself.
First, I was annoyed primarily with the supposition that the fact that a higher proportion of students are getting good degrees is sufficient evidence of grade inflation, in the sense of us as examiners lowering our standards. In my experience students are preparing for exams in a more focused way, and by provision of more teaching materials (partly prompted by the QAA) we are helping them to do that. I make no claims about students getting a better education than they used to, or the QAA being a good thing, or even improving teaching (as distinct from increasing the volume of exam-focused study-support materials).
Second, and this point was probably badly expressed in the original post, it would be better if there was inter-institutional comparability of standards. But it doesn’t matter very much that there isn’t, provided everyone knows this, especially as the cost of making it happen would be to strip out most of what is valuable in our educational system.
Substance McGravitas 08.03.09 at 6:04 pm
It may serve to quote from the report regarding the second point:
According to the idea of the European Credit Transfer and Accumulation System variance shouldn’t exist, so what are you going to say? Our first in French Literature is worth 120 credits as opposed to the 180 we say it is? I suppose the face-saver is to argue that your degrees are in reality worth more credit…
Is variance That Which Must Not Be Spoken Of and the Innovation, Universities,
Science and Skills Committee unaware of the kabuki required on the European stage?
Chris Bertram 08.03.09 at 6:42 pm
#2 Substance, I _think_ your comment is based on a misapprehension.
Credits are one thing, grades are another. It is quite consistent to say that a degree at institution A and a degree at B carry the same number of transferable credits, whilst acknowedging that a first at A is more of an achievement than a first at B.
FWIW, the Wikipedia entry on the ECTS grading scale
http://en.wikipedia.org/wiki/ECTS_grading_scale
accepts that a grade at one place may not mean the same as a grade at another.
Substance McGravitas 08.03.09 at 6:50 pm
Sure. But if I get a first at institution X with the top grade across the board, and you get a first at institution Y with top grades across the board, the rational way to say “the quality is different” between the two institutions is to pretend that the amount of credit awarded should be different.
alex 08.03.09 at 6:56 pm
@4: and the retort from any university department faced with being ‘Institution Y’ would be ‘over my dead body will you devalue what we do’.
Honestly, the hypocrisy and obfuscation around this issue is astounding. How long ago was it that students at Bristol were up in arms about final-year contact-hours being slashed? How many 1994-group seminars are led by PhD students? How many University teachers are too focussed on their own research to really give a stuff about students?
And the killer – how many students still choose their university based on a ‘reputation’ that is at least half pure snobbery?
Actually, I take my second comment back: given the nature of UK HE, hypocrisy and obfuscation are exactly what one would expect on an issue like this.
Substance McGravitas 08.03.09 at 6:57 pm
That is “a first at A is more of an achievement than a first at B” carries with it an acknowledgment that the credit in course X may not, in fact, be transferable to a program worth its salt.
dsquared 08.03.09 at 7:00 pm
In my experience students are preparing for exams in a more focused way, and by provision of more teaching materials (partly prompted by the QAA) we are helping them to do that
I always suspect that the Phil Willises of this world are also likely to be keen on saying that David Beckham isn’t really all that good a footballer, compared to the old-fashioned guys who had to boot and head a waterlogged laced caseball.
Kathleen Lowrey 08.03.09 at 7:00 pm
Couldn’t this be something like the Flynn effect — as an entire population becomes habituated to do X (take IQ tests, aspire to university, be good students ), overall their performance at X gets better?
I would agree with the cranky traditionalists that upward-trending grades are not necessarily an indicator of “assymptotic approaching of That Indefinable but Absolute Entity Known as Excellence”.
But I would disagree that it necessarily means “because evaluators have lost all sight of the Absolute Entity, become lazy and indulgent and politically correct and mommy-stateish and can we not kick them to death already?”
I think it might just mean, as a “solid university education” has become more widely agreed upon as a good, a tacit working consensus exists as to what looks like a good student performance in that educational process , and young people have become adept at conforming to that tacit working consensus.
harry b 08.03.09 at 7:01 pm
Well, that’s one ration way to say it…. I’m inclined to think that Jo is over-optimistic about the availability of knowledge of the difference in value between degrees from different universities — in particular, while I suspect that most school teachers and students from lower-income homes are aware there is a difference, I doubt they have much good information about what the difference is, and suspect, too, that the beliefs teachers have about where each university is in the status order are strongly influenced by the beliefs they had when they went to university.
Chris Bertram 08.03.09 at 7:20 pm
By the way, thanks to Jo at #1 for responding.
It occurs to me that the following in Jo’s original piece is quite telling:
bq. Well, Evans is entitled to report on her own experience, but for mine the only time I ever heard an argument that we should give more firsts because another university did was 20 years ago, and that was shot down as corrupt. I don’t remember ever hearing league tables mentioned at an exam board.
Indeed. But Jo teaches (as I do) at a Russell Group university. Academics at such institutions (or at Oxbridge) are in a better position that those at post-94 institutions (say) to resist management pressure _on matters like this_ (and they will be backed up by external examiners recruited from similar universities to their own). So it is probably the case that standards haven’t shifted all that much in the elite institutions since 1997. (Which isn’t to deny that there may have been some changes, and some pressures.) At “new universities”, by contrast, managerial pressures can be more stronger and more direct to award more firsts etc. If something like this is right (and I’m only speculating) there’s a close link between the grade inflation across the section generally and the divergence of standards among institutions.
Philip 08.03.09 at 8:17 pm
I did my BA at Northumbria, a new university. Sometime after I left the economics department closed and quite a few of the faculty went to Durham, an older and more prestigious university. A few years later I did an MA in Applied Policy Research at Newcastle, a Russell Group university. At Newcastle there was obviously more emphasis on research than at Northumbria but I found the quality of teaching to be about as mixed.
It seems to me there is the perceived difference in value of a between degrees from different institutions and it is greater than the actual value. People tend not to know the quality of departments within universities unless it’s specifically in their field.
Chris Williams 08.03.09 at 8:42 pm
Philip at #11 has it in one. Quality is department-specific, not institution-specific. Both VCs were thus right to resist any kind of blanket vector addition to produce a ‘better’ score: I would hope that they _knew_ that they didn’t know enough to answer the question.
I would also imagine that this wasn’t their only or indeed their main motivation. “We are second-rate, admits ex-Poly VC” is not the kind of headline anyone wants to bring down upon themselves. But that’s what they’d have got. Chalk it up to the general dumbing down of British governmental discourse, and to the permanently operating Rice-Davies factor.
I am remarkably sanguine about whether or not my HEI* has dumbed down. It hasn’t. But that’s because, given that we let anyone in, we have always been used to applying the kind of standards that tend to fail a significant proportion of our entrants. Other HEIs, perhaps not so much.
*It’s the best university on the planet. You know – the one in Milton Keynes.
John Quiggin 08.03.09 at 9:27 pm
A side point: because university reputations change with glacial speed (even slower wrt teaching than wrt research, and the latter takes decades), the idea that a competitive market can produce incentives to higher quality (an idea v popular in Oz, and implicit in lots of discussion of the issue I see elsewhere) is nonsense. Competition leads to a focus on changing things that are apparent to buyers, such as the amount of work students have to do.
Anonymouse 08.03.09 at 9:57 pm
Well, as a UK academic in a former poly (and someone who has researched at a Russell Group institution) I do think standards have dropped, concomitant with students viewing a 2:1 as just about adequate and 1st as a success.
This is anecdotal of course. Just as anecdotal as the discussion I had with a colleague who teaches at another Russell group institution who has seen increasing student pressure to justify marks below 2:1 ‘especially when the University would like to push for variable fees’. Of course, Jo Wolff said ‘in my experience’ so anecdotal evidence is what we are working with here.
Also in my institution in 07/08 the exam regulations were adjusted so all students University wide had greater automatic rights to re-sits (in effect doubling the chances they had) as part of our ‘retention project’. Make of that what you will.
Tom Hurka 08.03.09 at 10:24 pm
I shouldn’t rely on my balky memory, but wasn’t one of Larry Summers’s complaints when he arrived at Harvard that the university was giving out too many A’s, and wasn’t one justification people gave for the practice that Harvard was so much better than other universities that a Harvard B was really equivalent to an A elsewhere and should therefore be called an A?
If that’s at all right, the dynamic can be the opposite of what Chris suggests at #10, i.e. it can be the elite universities where grade inflation happens. FWIW, some of my Oxford friends say the distinction between 2.1 and 2.2 has largely disappeared there. A 2.1 used to mean something just below a first; now, they say, it’s just a generic second and much more common.
As I say, just balky memory and anecdotes — but I’m not so sure the elite universities are OK, Jack, with only the newbies doing grading tricks.
Alex 08.03.09 at 11:15 pm
There is, of course, a very good reason why university degree classifications might exhibit a skewed distribution toward the high side; attendance is voluntary and entrance to finals is pre-filtered.
A lot of potential finals failures fail their second year. And in general, rather than failing finals, people drop out, transfer, or resit.
Further, it’s one of those systems where the grading scheme has a lot of influence. There can be very few marks between a 2:2 and a first, if you have high exit velocity.
armando 08.03.09 at 11:41 pm
As a UK Mathematician, I think I can fairly report that there is an almost universal consensus amongst mathematicians that standards have indeed fallen. I would not be confident at all about extending that assertion beyond mathematics itself, but at least there I think it is well accepted by academics.
Part of the problem in this kind of debate is working out how to distinguish between various claims – dumbing down versus improved learning, say. I really don’t know how to do that in any consistent fashion, although it is obviously a politically loaded area where a person’s beliefs are likely to be an excellent predictor for their conclusions, it seems to me.
Salient 08.04.09 at 12:43 am
As a UK Mathematician, I think I can fairly report that there is an almost universal consensus amongst mathematicians that standards have indeed fallen.
Thing about mathematics is, we’re extensively teaching so many out-of-department majors. So there’s inter-departmental pressure to pass, to relax rigorous standards that are allegedly irrelevant to engineers, et cetera. There’s a constant fight over the calculus sequence each semester.
I wonder: do other departments receive this kind of pressure? I’m trying to think of another subject of study which is explicitly relied upon by several other departments, who require their students to take several core classes in this subject of study.
John Quiggin 08.04.09 at 4:30 am
Economics has exactly the same problem wrt business/accounting, though of course we’re on the consumer end wrt mathematics.
At my uni, and lots of others I suspect, there’s a problem in that the econ department wants pure maths, ideally with some real analysis and topology, but the maths department is geared towards engineering/physics.
Vance Maverick 08.04.09 at 4:42 am
How about chemistry (or biology) with respect to medicine?
dsquared 08.04.09 at 5:00 am
Also in my institution in 07/08 the exam regulations were adjusted so all students University wide had greater automatic rights to re-sits
This is not a reduction in standards though, is it? It’s a measure which makes it less difficult for the students to achieve an unchanged standard.
alex 08.04.09 at 7:31 am
OTOH, at my institution, post-92 and proud of it, we are raising admissions criteria, and tightening regulations to reduce the number of ‘second chances’ for people who can’t be bothered to take their first year seriously. So throw that in the pot of anecdata…
Phil 08.04.09 at 7:58 am
Anon: students viewing a 2:1 as just about adequate and 1st as a success
Tom H: A 2.1 used to mean something just below a first
Neither of these matches my experience. When I was reading English at Cambridge 30 years ago, 2.i meant “good” and was essentially the standard you were aiming for, 2.ii meant “not so good” and First meant “really good”; the group at my college divided something like 1/9/1/1 (where the last 1 was this guy who never did any work at all and got a Pass). Now I’m teaching Criminology at a Russell Group institution, and the spread I see in final results is pretty similar. If our students (only) viewed a First as a success, there would be an awful lot of disappointed students.
I do think marks are drifting up generally, but that’s not to say that marking is getting more lenient. It’s as Jo said – “students are preparing for exams in a more focused way, and by provision of more teaching materials (partly prompted by the QAA) we are helping them to do that”
Chris Williams 08.04.09 at 8:36 am
About 20 years ago, I estimated (accurately, as it happened) that I could scrape a 2.i in Modern History at Oxford on about 10 hours work a week, whereas to have a chance of a First, I’d have to put in at least 20, and to make my chance of a First bullet-proof against the vagaries of inter-collegial politics, historiographical horn-locks, and sundry other contingencies, I’d have to put in at least 40. In that particular faculty, with that particular syllabus, at that particular time, there was a big gap between a First and any kind of Second.
alex 08.04.09 at 9:15 am
“…bullet-proof against the vagaries of inter-collegial politics, historiographical horn-locks, and sundry other contingencies,”
And there you have a very good example of how subjective – not to say flat-out political – the awarding of degrees at our ‘best’ institutions can be, and probably still is. I still get a chill when I think of the documentary a few years ago where they tried to get Oxbridge tutors to mark anonymously, and the casual way in which the tutors decided, against the ‘rules’ they had agreed to, that they just would look up the candidates’ names before they confirmed their grades, in case they were being ‘unfair’………
peter 08.04.09 at 9:18 am
Jo Wolf says:
“for mine the only time I ever heard an argument that we should give more firsts because another university did was 20 years ago, and that was shot down as corrupt. I don’t remember ever hearing league tables mentioned at an exam board.”
No doubt UCL is a Russell Group university in a class of its own. As someone lecturing at another Russell Group university, my experience is precisely opposite of this. Scores of departmental discussions over the last decade about the proportions of candidates awarded different degree clasess, discussions which arose because of centrally-mandated guidelines to departments to award more firsts and upper seconds, guidelines which were specifically designed, and universally known internally to have been specifically designed, to address the university’s position in the various league tables.
The standard internal argument for awarding lots of firsts and upper seconds has been as follows: We are a Russell Group University and therefore attract better-than-average applicants for our undergraduate programs. Therefore, it is to be expected that we should award higher-than-average proportions of firsts and upper seconds.
Such arguments lead in only one direction – to the degree-classification arms-race that British Universities are now engaged in!
Chris Williams 08.04.09 at 10:40 am
NB – at the best university on the planet, we do mark completely anonymously, even in History. So there. Partly this is a matter of principle, but mainly it’s because there’s only a certain degree of personal bias you can add if there are 700 people on the course.
Philip 08.04.09 at 10:44 am
Peter, wow that is an argument that is actually used. Russell Group universities attract lots of students they then want to select the best ones they can. This is mainly done by looking at A’ level results and they tend to select students from richer and more Privileged backgrounds (don’t forget fees and future debt restrict the choices of poorer students as well). Then because they are obviously ‘better’ students they should get good grades er… so this what we will give them.
Anonymous 08.04.09 at 12:46 pm
I work at a Russell Group university. I am more pessimistic than Jo. It is the case that humanities students once upon a time would virtually never earn 80% or higher. Today, we are often encouraged to use the full range of a first (70-100%). Students now receive 85% and the like because of this encouragement . This higher mark today is not a result of more focussed study, but an inflation of marks for firsts. Anecdotal evidence is that this is true is elsewhere with one result being more firsts. Of course, it is a bit easier to receive a 2:1 or 1st when you can receive 85% or even higher.
dsquared 08.04.09 at 2:02 pm
Once more, isn’t this evidence of past failures to mark properly, rather than a lowering of standards now? I mean, why didn’t you give marks above 80% in the past? If what you’re saying is that in the past you were operating de facto on a “marks out of 80” scale with the mark for a first set at 87.5%, then there has been a grade inflation in the sense of lowering the mark to 70 on a “marks out of 100” scale but a) I can kind of see why people would want you to use the full range of the scale and don’t really see the point in originally only using 80% of it, and b) it is a change in the name given to particular levels of achievement, not a change in the actual standards.
Harry 08.04.09 at 2:05 pm
Committed as they are to academic rigour, no doubt the Russell Group universities have kept a large file of tests and papers across the range of marks so that we can figure out empirically whether this is inflation or improved performance.
alex 08.04.09 at 2:11 pm
If the top 30% of the mark-scale is ‘first class’, and you’re ‘encouraged’ to use all of it, the eventual result is obvious…
Academic work at University is supposed [traditionally…] to be marked on a scale from 0 = absolute incompetence, to 100 = absolute genius, leading the marker to resign his/her post immediately and offer it to the student, who clearly has more to teach than to learn.
When ‘just very good but really not exceptional’ marks start creeping up towards 80 or 85, this point, of absolute quality, is being lost. Maybe it’s being replaced by something else worthwhile, but it’s hard to tell what – after all, there is nowhere beyond 100 to go, and increasing clustering at the top end will just make the distinction such marks are supposed to afford impossible to achieve. Until we invent the ‘110%’ mark, just like they had to invent A* lower down the sector…
Phil 08.04.09 at 2:27 pm
When ‘just very good but really not exceptional’ marks start creeping up towards 80 or 85,
Haven’t seen it. We have guidelines – guidelines, I tell you! – which basically say that you should only give an essay 90 if you think it could be submitted to a journal unchanged, and your second marker agrees. 80 and 85 aren’t quite that hard to hit, but they do have to be really, really good work – and all Firsts (on a module, not the degree overall) have to be ratified by the external examiner. And when you’ve got students’ final mark being decided on two full years’ work, there would have to be quite a lot of putatively inflated 80s and 85s to push a student who “really” deserved a 2.i. over the 70% line.
Thom Brooks 08.04.09 at 2:32 pm
I’ve never quite understood a first. It is commonplace to find that a fail covers more than 10%. Thus, it is often 60-5% or worse at many US universities and 39% or lower in the UK. Fine. All other grades/classes should then be equal in size. Thus, A, B, C, and often D each have about a 10% range:
A = 90-100
B = 80-89
C = 70-79
D = 60-69, etc.
In the UK, this about 10% range is true for all but first class:
1st class = 70-100
2:1 = 60-69
2:2 = 50-59
3rd = 40-49, etc.
One worry with using a full range of first is that it can have a disproportionate effect on the average. For example, let’s suppose someone scores a high A and two low C’s in the US. Their average would be ([97+72+72]/3=) 81% — a low B.
Now let’s compare a high first (on the full range view) with two low marks two grades/classes down (e.g., 2:2). We get ([90+52+52]/3=) 65% — a clear 2:1. Or say someone earned a 100% first, but failed a module/course (35%) and earned a low 2:2 in another (53%). His/her average will be 63% — also a clear 2:1. On the contrary, something similar with A’s, B’s, etc. would yield — 100% first becomes 100% A, the fail becomes 50%, and the low 2:2 becomes 73% — 74% which is a clear C.
While I recognize that there are a number of problems with comparing the different scales, I am concerned about the disproportionate effect a full range of first may have on averages.
Chris Bertram 08.04.09 at 2:51 pm
#34. That isn’t true about the UK, Thom …. 2.1 and 2.2 have 10pc ranges, but 40 is a bare pass and 3rds start at 45 (at Bristol, anyway).
(In any case there is something a bit misleading in giving the numbers for a first class standard, etc. on individual papers in this context, since most universities operate systems combining averages with numbers of marks in a class to determine the final result. – so people will often get firsts with an average below 70, permissible because they have more than N papers above 70 … and so forth.)
Really, it would be much better simply to abandon the class system and move to US style transcripts.
Thom Brooks 08.04.09 at 3:06 pm
Many thanks for the swift correction, Chris! (At Newcastle, thirds are 40-49.) You are absolutely correct on the degree classification system where many factors may be taken into accout. Nevertheless, I wholeheartely endorse US transcripts. If only more people agreed!
dsquared 08.04.09 at 3:14 pm
When ‘just very good but really not exceptional’ marks start creeping up towards 80 or 85
but this wasn’t what Anonymous said was happening; what was described was a once-and-for-all change in the overall scale rather than a gradual process of inflation.
We have guidelines – guidelines, I tell you! – which basically say that you should only give an essay 90 if you think it could be submitted to a journal unchanged
Argh. Why, oh why oh why? I think there’s a whole load of deep philosophical issues here, and a massive gulf between my view of a fairly straightforward performance assessment task, and what academics believe themselves to be doing in marking a student’s paper. I mean, even Enron and GE at their most pathological didn’t think that their “rank and yank” systems would be improved by adding a special grade at the top end that nobody was allowed to have.
Philip 08.04.09 at 3:15 pm
Chris, yeah that’s how I got my 2:1 when I started to actually do some work to ge my average grade up and just missed 60%. It seemed to be discretionary as to how the grade was classified. So I don’t know if I got a 2:1 because I got a good mark in my dissertation and other 3rd year work or they wanted to have a certain number of students getting a 2:1.
This just points out how ludicrous the whole situation is when comparing degrees. Most people just look at the grade and where you got it from so if you were half a point off a first it is no different to just scraping a 2:1. So it’s difficult enough comparing the value of degrees from within the same department never mind between universities.
Chris Bertram 08.04.09 at 3:23 pm
_It seemed to be discretionary as to how the grade was classified._
I hope nothing I said led you to believe that that was the case. Usually there are tables setting out the combinations of overall average and proportion of marks in class, there’s nothing discretionary about it.
Phil 08.04.09 at 3:44 pm
Why, oh why oh why?
Why not, oh why not? I’m not saying 90 is a special mark nobody’s allowed to have; I’ve seen (and confirmed) a paper that was marked at 90 (it was very good indeed). You could argue that that’s an effective ceiling at 90, but who’s to say that some student next year or the year after won’t submit a paper even better than that?
dsquared 08.04.09 at 3:51 pm
There’s nothing intrinsically wrong with it I suppose, but it’s not very logical – my preference would be a linear scheme across the population so that your mark would correspond to your percentile rank; alternatively some sort of normal curve. Having a different criterion for marks above 89 that doesn’t really match up to the normal marking scheme just seems a bit ad hoc to me and to introduce needless complexity. This is the big philosophical gulf I’m talking about; as far as I can see it’s very important in this schema that marks of 100 are basically reserved for Allah, and that there are certain levels of recognition that are reserved for the very very best – if you look at the thing as I do, as a rough and ready means of ranking papers, then you quickly start noticing that this bit of it doesn’t seem to match up all that well to the purpose for which the scores are typically used.
Tim Wilkinson 08.04.09 at 3:59 pm
Re: mapping grades to classifications – It’s perhaps worth pointing out explicitly that (I’m pretty sure that) the scale on which percentage grades supposedly fall is illusory, at least when discursive answers are being marked. A marker knows perfectly well that (say) 71% represents a low but definite first, 69% indicates a high but definitely-not-quite-high-enough II.1, etc. Looking at my own BA transcript certainly suggests this anyway. I’d bet that overall, such numerals are massively overrepresented compared to the distribution you would expect from a allocating numbers on a continuous scale. For all I know, markers use the discrete greek literal system, then read the percentage off from those. This is also relevant to complaints about excess ‘headroom’ at the top of the scale.
‘We are a Russell Group University and therefore attract better-than-average applicants for our undergraduate programs. Therefore, it is to be expected that we should award higher-than-average proportions of firsts and upper seconds.’ i.e. let’s drop our grading standards so as to make our degree classifications comparable to those of the lower-end universities – fair enough perhaps, as long as the previously true ‘our degrees are worth more’ line is moderated pro tanto.
There are two kinds of ‘standards’ involved. Grade inflation – a drop in exam standards – has no necessary connection with educational degradation – a drop in standards of teaching. But it does have implications for inter-generational fairness/commensuration, as well as granularity problems at the top end, hence A* grades at A-level.
If the argument is that exam standards (viewed in isolation) haven’t dropped, then grade-devaluation might lead one to conclude that resources are being put into exam-prep, especially given that the arguments for why it’s happening suggest that good exam results per se are accorded high priority. That in turn raises the possibility that those resources are being diverted away from substantive teaching – so that educational standards are being compromised.
One way for exam results to be boosted would be to focus on checklists, mnemonics, box-ticking, which might be done without taking resources away from actual education. But such approaches alone aren’t supposed to be able to get you from a second to a first, are they? Insofar as there is supposed to be a qualitative difference between firsts and seconds, rather than those just being arbitrary divisions of a continuous scale, creeping inflation is degenerative. (A deliberate and systematic recalibration is different – has anyone heard of such a thing being done?)
Another way to ‘artficially’ improve exam results without actually marking individual papers more generously would be for the syllabus to be narrowed so as to focus on questions expected to be featured in the exam (and/or tailor the exam to the questions addressed.) Exams involve a kind of sampling (though in principle they could be comprehensive enough not to). The exam measures the ablity to answer a representative sample of questions. Second-guessing the exam can be seen as an attempt to bias that sample, so teaching to the exam is a disguised way of dropping exam standards.
Some level of exam-prediction is obviously unavoidable, has always happened, and is not necessarily a problem in itself. But if choice of teaching topics becomes more exam-focussed (and the exam is set in such a way as to complement, or at least not frustrate, that approach), that is an attempt to increase the sampling bias in candidates’ favour. If successful, it makes it easier for the candidate to get a good result without needing to be more knowledgeable about the subject. (To take an extreme for illustrative purposes: if one candidate sneaks a look at the exam paper, they have an advantage. If everyone does so, everyone has that advantage.)
As suggested above, that’s a matter of a (disguised) drop in exam standards, not necessarily in educational ones – if only the latter is of interest, you could try and adjust for the (in effect) lowered exam standards, and look at whether the adjusted performance level has gone up or down. How you do that I dunno.
(Chris Williams @24 – there was a big gap between a First and any kind of Second – I think I know what you mean, but that’s surely not it.)
Salient 08.04.09 at 4:36 pm
I think there’s a whole load of deep philosophical issues here, and a massive gulf between my view of a fairly straightforward performance assessment task, and what academics believe themselves to be doing in marking a student’s paper.
Eh, not all academics: but maybe I’m insufficiently indoctrinated as of yet. My criterion for determining whether to feel good or queasy about awarding a student an A is: “If I were to present the student with a set of questions to investigate, which implicitly rely upon a solid & thorough & comfortable understanding of the material from this course, would I feel confident that the student could competently understand the questions and attack them?” It works out pretty well, and the fact that a handful of students get As each semester despite clearly not meeting the above standard suggests to me that grade inflation is a real phenomenon.
Salient 08.04.09 at 4:36 pm
But anyway, I think this focus on “grade inflation” is fundamentally backwards. Shouldn’t we instead be looking at whether “syllabus deflation” exists?
Substance McGravitas 08.04.09 at 4:52 pm
ECTS grades come with helpful sentences describing what they’re supposed to mean across Europe. I like that as an ideal. There are target percentiles which I like less: “A” work should be “A” work.
In practice implementation varies of course.
Tim Wilkinson 08.04.09 at 5:04 pm
Salient @43 – Eh, not all academics yes, you too. Dsquared (IIUC) wants grades to be a ranking of candidates. You have an independent, objective understanding of the import of grades (well ‘A’, at least).
Re syllabus deflation, #42 was about that but was (I now see) a bit confused by the time it finally got to the last para. It’s the obvious answer to the problem of how to improve exam results without increasing educational resources and without actually marking individual papers more generously. It’s also consistent with the description ‘preparing for exams in a more focused way’.
dsquared 08.04.09 at 5:30 pm
so that your mark would correspond to your percentile rank
erratum/clarification: your percentile rank in the (hypothetical) population of all the students who ever have or ever will do that course, not in any given year.
dsquared 08.04.09 at 5:36 pm
I am very wary of counting “teaching to the test” as a lowering of standards; if this is the case, then surely this is a sign that the test isn’t correctly designed? Driving instructors are the ultimate teachers to the test and I don’t think this is a problem.
In general, I think we can all agree that if, say, physics students weren’t allowed cribs and had to memorise every formula that was on the test, then a) average marks would fall as more errors were made, b) the exam would have got “harder” and c) we shouldn’t count this as an increase in standards. What worries me is that a lot of the perceived decrease in standards are actually the removal of accidental and irrelevant impedimenta (which was the point of my analogy to the kind of football fan who thinks David Beckham isn’t really all that good because he doesn’t play with a sodden caseball).
Retakes are a particular bugbear of mine; I don’t see how a more generous retake policy could reasonably be called a reduction in standards.
Chris Bertram 08.04.09 at 5:41 pm
dsquared: I don’t think that _ranking_ is exactly what we’re about here. Though I have my doubts about whether degree classes capture natural kinds of performance, there’s some such notion at work. And at a certain point, we’re just certifying competence – an exercise that doesn’t necessarily imply any comparative assessment at all.
Think about driving tests: we’re assessing whether the person can perform to an objectively acceptable degree, not whether they’re ranked in the nth percentile.
Salient 08.04.09 at 5:44 pm
Dsquared (IIUC) wants grades to be a ranking of candidates.
Well, depends on what “ranking” means. I thought the point was to get some rough serial ordering or classification that allows us to judge “better” and “worse” and “equivalent” given two students from the population, with some reasonable relationship to performance on some absolute scale.
“Equivalent” is important: I don’t think D^2^ was protesting the idea that two students might get an A for grades of 98 and 94 respectively. Sometimes individual students really are fungible: for example, among the students in my class who are carrying a solid A grade into tomorrow’s final (including at least one carrying a 100%), I’d happily hire any one of them if I needed someone to solve problems related to the material taught. I’d more hesitantly hire one of the B students, and I probably wouldn’t hire one of the C students unless the problems I needed solved were uniformly routine.
So, this system is a ranking of candidates, provided one has the proper translation: A = Fully capable, can attack the questions independently, B = Mostly capable with some assistance, can attack some questions independently but will require help with some intermediate steps, C = Quite capable of handling the more routine types of questions with assistance at unusual intermediate steps, and D = E = F = whatever. I guess the numerical percentile grades correspond, very roughly but about as well as possible, to my degree of confidence that a student could attack a collection of reasonably interesting questions related to the material and come up with some accurate, satisfying answers.
For all the preoccupation with grade inflation, I would hope the above grade definitions correspond at least roughly with what pretty much everybody assigns for grades in any department at any University. It would surprise me to hear this is systematically not the case, as that does not align with my own experiences.
This implies that inflation/deflation/better/worse has a lot more to do with what is taught, and what is considered part of the course, than how the exam responses are graded.
As for syllabus deflation — indeed yes, about 3 minutes after I typed my second comment, I thought, “You know, I should have explicitly said, ‘That is to say, what Tim said in #42.’ or some such acknowledgment. Oops.” :-)
Salient 08.04.09 at 5:47 pm
And at a certain point, we’re just certifying competence – an exercise that doesn’t necessarily imply any comparative assessment at all.
True of pass/fail systems, I think, but not as true of systems that distinguish A grade work from B grade work. I think it is necessary to appeal to comparative assessment in order to justify A/B distinctions: essentially, I think some comparative assessment is necessary in order to establish what absolute standard of competence is appropriate for differentiating A from B.
Salient 08.04.09 at 5:50 pm
What worries me is that a lot of the perceived decrease in standards are actually the removal of accidental and irrelevant impediments
I perceive two components: impediment removal and syllabus deflation. The former is an unambiguous good, and the latter is problematic insofar as it exists (because it throws off the “for all time” percentile rankings you promoted above).
Phil 08.04.09 at 5:53 pm
Having a different criterion for marks above 89 that doesn’t really match up to the normal marking scheme just seems a bit ad hoc to me and to introduce needless complexity.
Perhaps I should have clarified that there are also descriptive guidelines for each decile below 70 (including 0-9). What’s different about 70+ marks is that there are no ranges (marks are given at 75, 80, 85 usw) and that there’s a guideline for each of those points.
your percentile rank in the (hypothetical) population of all the students who ever have or ever will do that course, not in any given year.
That’s exactly what the deciles-and-guidelines system aims to produce.
Driving instructors are the ultimate teachers to the test and I don’t think this is a problem.
Driving instructors’ work is assessed by a practical exercise, which it’s difficult to fake.
dsquared 08.04.09 at 6:02 pm
That’s exactly what the deciles-and-guidelines system aims to produce
… in the range 0-80, surely? My whole harrumph about the system was that marks in the range 90-100 are handed out much, much less than 10% of the time and that this doesn’t get washed out in large samples[1] (I suppose that we could be saving those grades up for the supragenius 18 year olds that the Flynn effect tells us will matriculate in 2300 CE, but suspect not)
[1] if you think I’m going to work out what happens asymptotically then you are sadly mistaken my friend.
Philip 08.04.09 at 6:16 pm
Chris at 39, thanks for the clarification. I never really understood how the borderline grades were assessed so it seemed to me that there was some level of discretion involved, but perhaps not.
Salient 08.04.09 at 7:51 pm
… in the range 0-80, surely?
OTOH, I would hazard a guess that grades 0-19 for actively participating students are about as rare as grades 81-100.
Phil 08.04.09 at 8:01 pm
I never really understood how the borderline grades were assessed so it seemed to me that there was some level of discretion involved, but perhaps not.
Where I work, at least, there used to be discretion involved but there isn’t any more.
peter 08.04.09 at 9:30 pm
dsquared @ 48:
“Retakes are a particular bugbear of mine; I don’t see how a more generous retake policy could reasonably be called a reduction in standards.”
Of course they are: Students retaking exams get additional months to prepare for them (eg, 11 instead of 5 months for 1st semester exams), and normally do not have to retake the exams they have already passed, so they have to prepare for fewer exams the second time around. Surely passing 5 exam papers in one 14-day period is much harder than passing 5 exam papers in two examination periods held 6 months apart.
dsquared 08.04.09 at 9:54 pm
Yes, but the fact that something’s harder to do doesn’t mean it’s a higher standard. It would be even harder to pass the exams if we insisted that they had to be taken consecutively without breaks for sleep, for example. The distinction between an exam being harder because it demands a higher level of knowledge, and an exam being harder because it has been set up in an inconvenient or difficult format, seems to me to be one very much worth making, and I really don’t see what the benefit to society is of randomly punishing a few score bright kids every year because they happened to have a really shitty day.
armando 08.04.09 at 10:02 pm
I think that dsquared is trying to say that if a student meets a certain standard of knowledge, understanding and so on, then the fact that she does it without having to navigate the arbitrary obstacles we want to put in her way doesn’t actually detract from her achievement.
There is a serious point here, of course, but it can’t quite be right either. If I give the same exam and simply allow the student longer to complete it, I have made the exam easier. In some sense, I would be examining the same knowledge, but the fact that I do so under easier conditions is not irrelevant to the level of understanding assessed by the exam.
dsquared 08.04.09 at 10:05 pm
If I give the same exam and simply allow the student longer to complete it, I have made the exam easier. In some sense, I would be examining the same knowledge, but the fact that I do so under easier conditions is not irrelevant to the level of understanding assessed by the exam
isn’t this just the exams vs coursework debate though? or am I wrong in assuming that this is pretty much settled as a matter of educational philosophy?
armando 08.04.09 at 10:24 pm
Well, I don’t think it is settled really, but even so you don’t set the same assessment for coursework as you do for a timed exam. Its different. But I think that this is a way in which standards get lowered – you change some aspect of your assessment philosophy, but it is used as an excuse to make the assessment easier. Take a 2 hour exam, and set it as a coursework and suddenly you find grades improve dramatically. Is that due to better teaching methods?
I *know* this sounds like conservative bollocks, always claiming that things were better in the past, but it isn’t. I certainly do not think that students are lazier, more stupid or anything like that. But I’ve taught at enough Math departments and have seen syllabi reduced in response to lack of student knowledge. Scaling is a big thing where I am – essentially, the university decides how many of each grade classification there are. And simply, as a professional, you think you have a good feel for standards (this can be misleading, however).
I don’t think it is entirely negative, because I think it largely results because of increased student numbers, which is most definitely a good thing. But just because there are some tired old arguments from boring conservatives saying that standards are falling, doesn’t mean its not true.
Salient 08.05.09 at 12:03 am
But I’ve taught at enough Math departments and have seen syllabi reduced in response to lack of student knowledge.
And lack of student… affinity. General analytical skill. Comfort tackling new things. These are important and rare characteristics, increasingly rare as greater portions of the population
Bloix has got me thinking about why so many of these folks, who quite possibly don’t find any enjoyment in their studies, go to college. I’ve long held this idea that any human being who wishes to take an interesting or potentially useful course at a college should have the opportunity to do so. Grading systems (as well as tuition, class meeting hours, and geographic distribution of public universities) should be set up to facilitate this: relevant to this thread, a student should be able to treat a grade as a meaningful communication from the assessor about her/his understanding as reflected in her/his work.
Take a 2 hour exam, and set it as a coursework and suddenly you find grades improve dramatically. Is that due to better teaching methods?
But… but… rephrase this. Take some coursework, and assign it instead as an hour exam. The grades will fall dramatically. Is that due to worse teaching methods?
What’s so hard about a 2 hour exam?
* No opportunity to consult resources.
* No time to think.
If there was a way to accomplish the former without the latter, I’d be all for it. Student wants to take an extra hour to finish their exam? Why not? Gives me a chance to see what they can do with a fairly peaceful head: I’m interested in what they understand, not how well they destress in real time.
Granted, there are limits to what it’s possible to do: you just can’t get an exam room for 4 hours. But it always saddens me to give an exam and see students just rushing, just rushing not thinking not processing just go go go, for the last 20 minutes of an hour exam. I see them fail to do things I know they can do, since I’ve seen them do the same tasks in class fluently. I see them hit mental blocks there’s no justifiable reason for me to want them to hit.
Neither I nor they get anything out of the responses which are written in this state of mind. So I completely agree with D^2^ on this one.
alex 08.05.09 at 7:37 am
Exam = hard, good; coursework = easy, bad, is very lazy thinking. You can’t produce an essay with footnotes in an exam, you can’t be sure you’re engaging with what a range of authors really say in an hour’s scribbling, you certainly don’t have time to reflect and redraft, which as we all know is fundamental to good writing.
In humanities subjects, at least, the only meaningful rationale for closed examination is to prevent cheating; and if stopping the students cheating were to be the determining rationale for assessment policy, then we’d be acknowledging that it isn’t really education any more, it’s just a knock-down fight for the piece of paper with the highest number on it…
Rabelais 08.05.09 at 8:03 am
I’ve set exams as ‘seen papers’ – showing students the paper before hand, to allow them to prepare. I get some really good work back from a few students – you can see that they have really taken the opportunity to prepare seriously. Most seem to assume that because they’ve seen the paper they actually have to prepare less because they no what’s coming. And there is a group of students, who even though they have seen the paper, produce a script that looks like they didn’t know what the questions where in advance (some don’t even seem familiar with the course).
Whatever way you chose to assess you’re left with a reflection of different abilities and levels of engagement.
As for slipping standards, I have a hunch that it is getting increasingly difficult to fail a course. In some institutions I’m familiar with they go to extraordinary lengths to keep ‘bums on seats’. This means there are lots of thirds and 2.2s that probably should have failed. This pushed satisfactory/average students into the 2.1s, to distinguish them from the strugglers; while a 1st is still a 1st, except the distance between 1st class work and truly mediocre has narrowed.
As I say, just a hunch… It wouldn’t take much to convince me that standards are slipping and we’re all off to Hell in handcart.
Chris Williams 08.05.09 at 6:37 pm
alex “In humanities subjects, at least, the only meaningful rationale for closed examination is to prevent cheating”
No. It’s a very good way of asking a number of different kinds of question about the content of a course and getting answers. Set coursework, and you usually end up with one or two extended essays. Students quickly realise that to amximise their marks in the extended essay, they need to concentrate in depth on their essay topic. An exam is a very handy (and comparatively cheap) way of setting some questions that test knowledge of the course as a whole alongside others that go into depth. I don’t think, by the way, that humanities degrees should only be assessed through exams like mine was.
Open papers are a very good way to combine the standards of coursework and the flexibility of an exam, but they only really work for residential study where you can assume that every student is equally able to shuck off all their other responsibilities for up to 12 / 24 / 48 / 72 hours.
Phil 08.05.09 at 6:53 pm
An exam is a very handy (and comparatively cheap) way of setting some questions that test knowledge of the course as a whole alongside others that go into depth.
“Mastermind”-type questions, as I explained it to a student the other day. It’s generally considered unsporting to set an entire exam in this format – students like to have an essay question so they can -trot out the essay they’ve prepared- get to grips with the subject in depth. In one module last year we set ‘cross-cutting’ essay questions, i.e. required them to think on their feet and write an essay’s worth of connected prose. We got complaints. (We also got some pretty good exam scripts.)
Salient 08.05.09 at 7:10 pm
In one module last year we set ‘cross-cutting’ essay questions, i.e. required them to think on their feet and write an essay’s worth of connected prose. We got complaints.
This sounds far less fun than oral examinations. It could be quite fun for students, and I imagine more useful for assessment purposes, if the university let you book an exam room for all Saturday and you let students bring a bag lunch and coffee and take a few hours to complete the essays, so they can afford to cross-cut at a ponderous pace.
Tim Wilkinson 08.05.09 at 7:40 pm
Continual assessment is evil. I am extremely grateful that I just had finals – with an elective presubmitted essay paper and the rest 3-essay exams (though I think presubs were possible for 1-2 of the exams). Apart from the stress/oppression aspect, It must be deeply demoralising to know that however well you do, earlier results mean you can’t possibly do very well overall. Maybe there are ways round that, I don’t know. And to be fair to those of differing temperament, suppose the converse holds – get some good early results in the bag and take the stress off.
But I tend to think that there’s a holistic element – in philosophy anyway – which means you’re not really ready to be examined in anything until you’ve finished the course (though if I’m honest there’s also an element of exhausting the appeal of cut-price sybariticism before coming round to the idea of doing some focussed study.)
‘cross-cutting’ essay questions, i.e. required them to think on their feet and write an essay’s worth of connected prose Isn’t that kind of supposed to be the idea? Oral exams would certainly have been good fun though, in a world in which suitably qualified examiners were sufficiently plentiful.
Phil 08.05.09 at 11:30 pm
if the university let you book an exam room for all Saturday and you let students bring a bag lunch and coffee and take a few hours to complete the essays, so they can afford to cross-cut at a ponderous pace
I like that idea – it’s true that the task of connecting idea A with ethical question B and body of research C doesn’t go that well with what we think of as Exam Conditions. It’s not so much the coffee as the concomitant toilet break that I worry about: our in-bred* dislike of being watched while peeing would create a substantial opportunity for what we pedagogues refer to as “cheating”.
*Learnt? Culturally specific? Universal?**
**OK, wrong thread.
Phil 08.05.09 at 11:32 pm
Um, “Learnt” etc is supposed to be footnote 1, referenced at ‘in-bred’, and it’s supposed to end with a reference to footnote 2. And not to be in bold.
Salient 08.06.09 at 1:24 am
would create a substantial opportunity for what we pedagogues refer to as “cheatingâ€
I guess, but even in mid-hourlong-exam I let students go use the restroom.
I do remember having a logic professor who prohibited me from leaving the examination hall to use the restroom. I finished the test ignoring the intermittent bouts of rather acute abdominal pain, and hard-mindedly wrote a whole additional paragraph, apparently encoded in what I guess was not-quite-satisfactory predicate logic notation, which asserted that anyone who would humiliate themselves by resorting to bathroom liaisons in order to discover answers for a test as blastedly undemanding as this one ought to be completely beneath the notice of as distinguished a gentleman as he.
I probably still have the blue book marked 96/100 around somewhere — had points taken off for errors in that extra paragraph. But yes, not letting exam-takers use the restroom is a horrible horrible policy, unless the university where you teach is kind enough to provide students with longer than a ten-minute break between exams that occur at opposite ends of campus. :P
Oral exams would certainly have been good fun though
Oral exams are, for better or worse, an excellent means for very quickly finding out just how comprehensively befuddled one’s ‘C’ students really are. (And this gets back around, obliquely, to the general topic of the thread: what does a letter grade or a ranking like 2.1 mean, what does it communicate, and is it reasonable to call today’s grades “inflated”?)
Watson Aname 08.06.09 at 1:37 am
create a substantial opportunity for what we pedagogues refer to as “cheatingâ€.
This is solvable too, you combine the effectively unlimited exam time with a complete open book policy. No discussions is all you really need to insist on. Then you ask fundamental questions they haven’t seen before, all of which are answerable if only you understand the course material well.
The down side is it’s a real pain to mark, and will risk leaving you terribly depressed about your abilities, and theirs. It’s a bit like oral exams that way, but perhaps more practical for a large class.
Phil 08.06.09 at 8:12 am
will risk leaving you terribly depressed about your abilities, and theirs
I suspect that the students who do brilliantly under exam conditions would do even more brilliantly and the students who genuinely would have done OK but for those conditions would do OK – but that the main effect would be a big sorting-out at what’s now the 2.i level. In other words, we’d lower the 2.i hump and bulk up the left and right tails. (Was that what we wanted to do? I forget.)
Chris Williams 08.06.09 at 3:54 pm
For me come exam times, “Flatten the distribution” is the motto engraved on the inside of my eyelids. YMMV.
Comments on this entry are closed.