As I approach formal retirement from my academic job, I’m still thinking about ideas in my main theoretical field of decision theory. But I’ve largely lost interest in publishing journal articles, leaving the chore of dealing with Manuscript Central and other robotic systems to my younger co-authors in the case of joint work, and not submitting many of my own. I’ve also gone retro on reviewing. If I’m invited to review a paper, I write back to the editor and offer to do the job as long as they send me the manuscript directly.
That distance from the process provides me with a somewhat different perspective on how Large Language Models (LLMs) are changing things. The rise of LLMs combined with the growth of the global university sector and the dominance of a “publish or perish”[1] has inevitably produced a flood of AI-generated slop which threatens to overwhelm the whole journal process, especially when AI is also being used to generate referee reports.
But will it always be slop? I’ve been trying out various LLMs, including OpenAI Deep Research and, more recently, its French competitor Mistral. I recently used DR to write a piece in the format of a journal article, though I have no plans to submit it anywhere.
The process started when I ran across a reference to Hempel’s “paradox of confirmation” in Richard Pettigrew’s Substack newsletter.
I was interested because Hempel’s work is adjacent to my main remaining research project on reasoning with bounded awareness. And, I love me a good paradox.
The paradox runs as follows. Suppose we want to make a probability judgement about the claim “all ravens are black”. Every time we see another black raven, we count this as confirmation of the claim. But, as Hempel observes, “all ravens are black” is logically equivalent to the contrapositive “every non-black thing is not a raven”. When we observe, for example, a white shoe, we should increase our belief in the contrapositive, and therefore in the original claim.
This seems obviously wrong, but the majority view of the philosophers who’ve written on the subject is that we should, indeed, increase our belief in the blackness of ravens very marginally upwards whenever we see a non-black non-raven. It’s easy enough to come up with what seems like a refutation, along the following lines
“Consider a world with one raven and one shoe. Each may be black or non-black. If the colour of the shoe is independent of the colour of the raven, observing the shoe tells us nothing about the colour of the raven”
I tried this out on Deep Research, and it turns out that this isn’t a new argument: a more complicated version was put forward by I.J. Good (a collaborator of Turing, and early predictor of superhuman AI), back in the 1960s, but didn’t settle the dispute. Here’s an updated statement of the problem from Branden Fitelsen
DR put up a vigorous defence of the mainstream position, and forced me to refine my position, as well as giving me lots of useful references, in a part of decision theory with which I’m not so familiar. However, as is usual with LLMs, and despite the shift away from the sycophancy that used to prevail, DR eventually came around to my way of thinking.
My final position was that the paradox reflects the impossibility of Hempel’s core project of deriving probability judgments independent of any model of the world. I saw the analogy to a similar project that was popular in economics in the 1980s, vector autoregression. It was claimed to be theory-free, but actually depended on (often implicit) identification assumptions, that is, the way in which variables are introduced into the estimation process.
You can read my paper here
What have I learned from this episode? Most notably, there is a version of vibe coding here. Starting with an idea, which might or might not be original, it’s now pretty easy to turn it into a working paper that looks like the standard product, including citations [2]. That’s a good thing for the growth of knowledge, but it is going to create huge problems for the use of journal publications as a credential by academics seeking employment or tenure.
Instead of just AI slop, journals are going to be faced with increasing volumes of papers that are plausibly publishable. In fields like economics and philosophy that will mean increasing rejection rates from their current absurdly high levels (above 90 per cent anywhere decent) to the point where acceptance or rejection is a lucky dip, or else the result of insider connections (for example, I saw this paper on the US seminar circuit and I know the author is a good fellow)
It’s also important to remember that while LLMs are causing big changes, they are a continuation of a process that’s been going on steadily at least since 1970 (it seemed brand-new when I started university in 1974). Innovations around that time were citation and keyword indexes (big thick books in tiny print) and survey/review journals like the Journal of Economic Literature. Then came the Internet. Even though it hasn’t lived up entirely to its early promise, Internet access has massively reduced the gap between the core and the periphery of the academic world, at least to the extent that the gap reflects communication problems. For me, as an Australian not particularly keen on international travel, this has been transformational.
In some ways, it’s a pity to be leaving the academic game when such marvellous new tools are available. In other ways, I’m glad to have done my work without worrying about whether I would be replaced by a computer program. But either way, LLMs aren’t going away and we will have to work out a way to live with them.
fn1. Although that’s a pejorative, I’m not a fan of the norm, dominant in philosophy and most of economics, of publishing only a few articles (say, one per year) and only in the very top-rated journals. As was once said of me, I embody the primal urge to publish, and used to turn out articles by the dozen. But now that we have blogs, Substack on so on, I can satisfy my need to express my views on every topic without the tiresome process of dealing with referees (I now deal with comments, but I can respond to these or ignore them as I please).
fn2. As some recent examples have shown, you need to check these. But that was always good practice, if not universally followed – a lot of citations I’ve seen turn out to be cut and pasted from earlier papers, propagating errors along the way. And the replication crisis has turned up numerous examples of papers being cited after they were retracted.
{ 0 comments… add one now }