As I approach formal retirement from my academic job, I’m still thinking about ideas in my main theoretical field of decision theory. But I’ve largely lost interest in publishing journal articles, leaving the chore of dealing with Manuscript Central and other robotic systems to my younger co-authors in the case of joint work, and not submitting many of my own. I’ve also gone retro on reviewing. If I’m invited to review a paper, I write back to the editor and offer to do the job as long as they send me the manuscript directly.
That distance from the process provides me with a somewhat different perspective on how Large Language Models (LLMs) are changing things. The rise of LLMs combined with the growth of the global university sector and the dominance of a “publish or perish”[1] has inevitably produced a flood of AI-generated slop which threatens to overwhelm the whole journal process, especially when AI is also being used to generate referee reports.
But will it always be slop? I’ve been trying out various LLMs, including OpenAI Deep Research and, more recently, its French competitor Mistral. I recently used DR to write a piece in the format of a journal article, though I have no plans to submit it anywhere.
The process started when I ran across a reference to Hempel’s “paradox of confirmation” in Richard Pettigrew’s Substack newsletter.
I was interested because Hempel’s work is adjacent to my main remaining research project on reasoning with bounded awareness. And, I love me a good paradox.
The paradox runs as follows. Suppose we want to make a probability judgement about the claim “all ravens are black”. Every time we see another black raven, we count this as confirmation of the claim. But, as Hempel observes, “all ravens are black” is logically equivalent to the contrapositive “every non-black thing is not a raven”. When we observe, for example, a white shoe, we should increase our belief in the contrapositive, and therefore in the original claim.
This seems obviously wrong, but the majority view of the philosophers who’ve written on the subject is that we should, indeed, increase our belief in the blackness of ravens very marginally upwards whenever we see a non-black non-raven. It’s easy enough to come up with what seems like a refutation, along the following lines
“Consider a world with one raven and one shoe. Each may be black or non-black. If the colour of the shoe is independent of the colour of the raven, observing the shoe tells us nothing about the colour of the raven”
I tried this out on Deep Research, and it turns out that this isn’t a new argument: a more complicated version was put forward by I.J. Good (a collaborator of Turing, and early predictor of superhuman AI), back in the 1960s, but didn’t settle the dispute. Here’s an updated statement of the problem from Branden Fitelsen
DR put up a vigorous defence of the mainstream position, and forced me to refine my position, as well as giving me lots of useful references, in a part of decision theory with which I’m not so familiar. However, as is usual with LLMs, and despite the shift away from the sycophancy that used to prevail, DR eventually came around to my way of thinking.
My final position was that the paradox reflects the impossibility of Hempel’s core project of deriving probability judgments independent of any model of the world. I saw the analogy to a similar project that was popular in economics in the 1980s, vector autoregression. It was claimed to be theory-free, but actually depended on (often implicit) identification assumptions, that is, the way in which variables are introduced into the estimation process.
You can read my paper here
What have I learned from this episode? Most notably, there is a version of vibe coding here. Starting with an idea, which might or might not be original, it’s now pretty easy to turn it into a working paper that looks like the standard product, including citations [2]. That’s a good thing for the growth of knowledge, but it is going to create huge problems for the use of journal publications as a credential by academics seeking employment or tenure.
Instead of just AI slop, journals are going to be faced with increasing volumes of papers that are plausibly publishable. In fields like economics and philosophy that will mean increasing rejection rates from their current absurdly high levels (above 90 per cent anywhere decent) to the point where acceptance or rejection is a lucky dip, or else the result of insider connections (for example, I saw this paper on the US seminar circuit and I know the author is a good fellow)
It’s also important to remember that while LLMs are causing big changes, they are a continuation of a process that’s been going on steadily at least since 1970 (it seemed brand-new when I started university in 1974). Innovations around that time were citation and keyword indexes (big thick books in tiny print) and survey/review journals like the Journal of Economic Literature. Then came the Internet. Even though it hasn’t lived up entirely to its early promise, Internet access has massively reduced the gap between the core and the periphery of the academic world, at least to the extent that the gap reflects communication problems. For me, as an Australian not particularly keen on international travel, this has been transformational.
In some ways, it’s a pity to be leaving the academic game when such marvellous new tools are available. In other ways, I’m glad to have done my work without worrying about whether I would be replaced by a computer program. But either way, LLMs aren’t going away and we will have to work out a way to live with them.
fn1. Although that’s a pejorative, I’m not a fan of the norm, dominant in philosophy and most of economics, of publishing only a few articles (say, one per year) and only in the very top-rated journals. As was once said of me, I embody the primal urge to publish, and used to turn out articles by the dozen. But now that we have blogs, Substack on so on, I can satisfy my need to express my views on every topic without the tiresome process of dealing with referees (I now deal with comments, but I can respond to these or ignore them as I please).
fn2. As some recent examples have shown, you need to check these. But that was always good practice, if not universally followed – a lot of citations I’ve seen turn out to be cut and pasted from earlier papers, propagating errors along the way. And the replication crisis has turned up numerous examples of papers being cited after they were retracted.
{ 54 comments }
oldster 03.30.26 at 11:00 am
I disagree with your set-up of the minimal toy-case.
The standard algorithm says, “make up a list of all the ravens, then check for any non-black ones.”
The contrapositive algorithm says, “make up a list of all non-black things, then check for any ravens.”
As applied to the toy case, the question is not, “what do I learn by examining all of the non-ravens?” but, “what do I learn by examining all of the non-black things?”
It’s true that I learn nothing by examining the shoe qua shoe. But if I examine all of the non-black things, and none of them are ravens, then I have found good evidence that all ravens are black in the toy world.
Your point that “observing the shoe tells us nothing about the colour of the raven” is equivalent to thinking that the contrapositive of “all R’s are B” is “all non-Rs are non-B”.
Different question about refereeing:
“I write back to the editor and offer to do the job as long as they send me the manuscript directly.”
As opposed to what? I thought this is how refereeing usually worked?
Mike Huben 03.30.26 at 11:37 am
So is a white raven a black swan? :-)
A quick search with google for “albino raven” says albinos/leucistics occur with a probability of 1/30,000.
Wouldn’t Karl Popper point out that observing any number of black ravens doesn’t support the “all” claim, which could only be falsified and never proven? How can you have a probability judgement of such falsification without a model of how ravens could be n0n-black? Models could include exhaustive examinations of populations, genetics, painting, etc.
In biology, we have a related question of how many new species are yet to be discovered. The answer is statistically estimated based on collections, and how many species are represented in the collections by one specimen.
Dmitri 03.30.26 at 11:58 am
Suppose you have an urn of colored plastic animals. Suppose you know it contains some black ravens. If it contains non-black ravens, you should expect to draw one periodically. Each animal you draw, that is not a nonblack raven, should increase your confidence that all the ravens in the jar are black.
M Caswell 03.30.26 at 1:07 pm
re: some LLM-assisted papers: I’m not sure why, but something in me strongly objects to spending more effort reading something than the author spent on writing it.
oldster 03.30.26 at 2:35 pm
Perhaps more simply:
if all you know is that you are looking at a shoe, then its color tells you nothing about the raven.
but if you know that you are looking at all of the non-black objects in this world, and that they (it) are all shoes, then that tells you a lot about the raven.
And what the algorithm for the contrapositive process told you, applied to the toy-world, was:
make an exhaustive list of the non-black objects, then check to see whether any of them are ravens. If none of them are ravens, then “if non-black, then non-raven” is confirmed. If any of them are ravens, then “if non-black, then non-raven” is disconfirmed.
So, before you started looking at anything in the world, you already had your list of all of the non-black items in it.
Now, it will be hard to get exhaustive lists for non-toy worlds. But it’s worth agreeing on the interpretation of the toy-world case before we move on to bigger ones.
Tm 03.30.26 at 3:32 pm
“When we observe, for example, a white shoe, we should increase our belief in the contrapositive, and therefore in the original claim.”
From a logical point of view, this makes absolutely no sense. Suppose we hypothesize “all noneven numbers are prime”, then observe that all even numbers are non-prime, does this strengthen the hypothesis? Absolutely not… But maybe logic has nothing to do with this?
steven t johnson 03.30.26 at 3:48 pm
“But, as Hempel observes, ‘all ravens are black’ is logically equivalent to the contrapositive ‘every non-black thing is not a raven’. When we observe, for example, a white shoe, we should increase our belief in the contrapositive, and therefore in the original claim.”
It seems to me that that the inadvertent change equivocation. “Every non-black bird is not a raven” is one meaning. By this, when we see every non-black bird that isn’t a raven we would be correct in increasing our belief in the original claim. When we insist that all non-black things are ravens, we omit “some black things are ravens.” “And speaking only of non-black things, we omit “Some black things are not ravens.”
I suppose if you wanted to be consistent and insistent that “All ravens are black” means only “all black things” then it is correct to say that a white shoe affirms the contrapositive. But every black shoe refutes the contrapositive, thereby disproving the proposition. Ravens are in fact not the only black things. The thing is, formal logic as I understand it is designed to avoid such ambiguities. “All ravens are black.” There isn’t a word about ravens also being birds. Nor is there an explicit statement that ravens=all black things.
If shoes count as non-ravens, then black shoes refute the proposition about all ravens are black. If shoes don’t count as ravens, then white shoes are irrelevant.
To me this seems as if the basic notion that one can look only at the grammar of the argument to determine whether the conclusion follows, without looking at the meaning of the words, is fundamentally flawed. An argument made with bad grammar must be invalid. But good grammar may yet be invalid because the words are misused, ambiguous. (Hence symbols in formal logic.)
“My final position was that the paradox reflects the impossibility of Hempel’s core project of deriving probability judgments independent of any model of the world.”
My question is, isn’t the meaning of words, either referents to objects themselves or abstractions that are a model of the world, a form of your “probability judgments?” Do I agree with you?
Peter Dorman 03.30.26 at 5:23 pm
fn2 suggests a more general thought: a lot of people do what LLMs do, at least in part. That is, it’s common for someone to have an idea — maybe small, just a tweak– and then fill out the article with references to prior work they haven’t actually engaged with. So, in a sense, they’re doing an LLMish job, amassing and recycling stuff that’s out there and the conventional judgments of it. That can be OK if the conventional takes on those topics are OK, but often they are superficial or leave out important complications. As a reviewer, I’ve come across lots of work that I want to rescue for its creative core, but have to struggle with how to deal with the surrounding glop.
bekabot 03.30.26 at 6:20 pm
“Starting with an idea, which might or might not be original, it’s now pretty easy to turn it into a working paper that looks like the standard product, including citations. That’s a good thing for the growth of knowledge”
Why? It might (or might not) be a boon for the propagation of ideas. But then (first of all) ideas are not always right, and (second) ideas and knowledge are not the same thing.
mw 03.30.26 at 7:33 pm
Instead of just AI slop, journals are going to be faced with increasing volumes of papers that are plausibly publishable.
Yes, not just plausible, but as good or better — papers with fewer statistical errors, for example. But would a greatly increased volume of high quality papers be a good thing or not?
M Caswell @4 I’m not sure why, but something in me strongly objects to spending more effort reading something than the author spent on writing it.
But you don’t have to do that. You can drop in the PDF and ask LLMs to summarize, to check for errors or weaknesses, identify the most consequential points, etc.
As a thought experiment to take AI out of the equation, would academic publishing be better or worse off if every researcher had a large, eager, unpaid team of tireless crack research assistants who would help the researcher prepare papers for publication and also to review, critique, and summarize the work of others?
Tm 03.30.26 at 8:43 pm
Another point about the ravens, this time empirical: there are many ravens, but there are many many many orders of magnitude more things that are not black. Thus the probability of finding a raven that isn’t black is higher, under plausible assumptions, than the probability if finding a non-black thing that is a raven. For example, you could analyze the color of millions of grains of sand and wouldn’t gain any knowledge about ravens. This argument seems so obvious that I wonder if this whole theory is a hoax?
Jim Harrison 03.30.26 at 9:04 pm
All these years later, I’ve got nothing new to say about the all-ravens-are-black bit; but in the days before blogs I used to self publish a little mag I called Indoor Ornithology. Thanks for the trip down memory lane. Another irrelevant personal item: I actually met Carl Hempel. I knew his son at grad school and tried to look him up at his dad’s place in Princeton. I’m still trying to figure out how to work “He’s not here” (personal communication) into a scholarly citation.
Tm 03.30.26 at 10:14 pm
I retract my comment 5. I fell for the fallacy: The even numbers don’t matter for testing a hypothesis about odd numbers. What matters are the non-primes: we need to look for a non-prime that is odd to falsify the hypothesis.
Alex SL 03.30.26 at 10:19 pm
“to the point where acceptance or rejection is a lucky dip, or else the result of insider connections”
This is exactly where I have landed. The consequence of being overwhelmed with AI-generated papers, grant proposals, and job applications is that personal connection will again become the only feasible form of quality control. “Again”, because that is how it used to be before an attempt was made to be more fair and objective through the use of e.g. how many citations does she have or what journals does he manage to get accepted in. The emeriti who were around when I was a student still had fond memories of hiring people on the basis of another professor recommending his favourite student. The end result will be that it will again become extremely difficult for newcomers to establish themselves without connecting themselves to one of the well-established people in the field, which dependency also increases the risk of exploitation. LLMs will make us go backwards, from at least the aspiration to a meritocracy (as flawed and gameable as citations are) to open nepotism.
Regarding the question whether LLM-generated manuscripts will always remain slop, there are three options here:
First, they stay slop because LLMs don’t actually understand anything. This is my experience so far: even when hallucinations are caught by verifying references against a database, LLMs don’t understand what the cited papers said or meant, and also, their writing is awful. I am told that it would work fantastically if I paid for the highest tier of pro subscription to Claude poem version whatever, but well, see next point for why I won’t do that.
Second, assuming it does work well enough not to be recognisable as slop, and I use an LLM to write my paper and then slap my name on it, there are words for that, and they are fraud and plagiarism. To the degree that these LLMs work, be it in prose or in coding, it is because they have stored in their model weights what is best understood as an extremely lossy compression of other people’s work. When somebody proudly talks on BlueSky about how they vibe-coded an app in SWIFT to track some US grant proposal outcomes, my immediate thought is, great, there is a human app developer somewhere whose work you have just substantially copied without either of you realising. That is why your app works.
My answer for what journals should do given that LLMs are here to stay is therefore quite simple. It is the same answer I would have provided if somebody had asked me what universities should do given that paid-per-gig thesis ghostwriters are here to stay, or what bar owners should do with regard to underage alcohol consumption given that people who can convincingly fake a driver’s license are here to stay. Fraud is not a complex moral conundrum, and we should not pretend it is merely because that would be in the interest of a handful of tech billionaires.
Third, some future AI can actually reason independently and entirely replace a researcher. This is purely hypothetical, of course, and LLMs cannot do that. It also has to be noted that in my field of research, aside from the occasional review paper summarising some aspect of a subfield, LLMs cannot do the job because we need data. We would not be able to publish a paper equivalent to yours about black ravens and white shoes; we have to produces measurements or DNA sequences from specimens, and conduct field collections, laboratory work, or glasshouse experiments, so that replacing a researcher would mean at a minimum AGI plus a team of affordable (!) humanoid robots. I find it difficult to believe that such robots will ever be cheaper than giving a human money to buy meals and rent an apartment, because at least my human body maintains itself for a few decades instead of having to have its joints and oil replaced at regular intervals. But let’s assume this is possible and economical.
Now we are at the point I have argued here previously: all of humanity is unemployed, what happens next? The alternatives are cyberpunk dystopia with >99.9% of people destitute and suppressed by the robot enforcers of a handful of oligarchs, or communism. I am not saying this is plausible – societal collapse from global warming with loss of over three quarters of the world populations and falling back to ca. 18th century technology is the vastly more plausible concern – but I just wish those who discuss the AGI future would be honest that these are the two options, and ideally be transparent about which of them they will be fighting for if it comes to that. Just a week ago I listened to an interview with an economist, and it was fascinating how he and the interviewer had to dance around the fact that they were arriving at the conclusion of communism being the logical non-catastrophic outcome of AGI making everybody unemployed, because being in the USA, they can’t use that word even while quite transparently describing such an economic system: the government gives everybody money to afford life’s necessities while machines do all the work.
Tm 03.30.26 at 10:32 pm
I retract my comment 5. I fell for the fallacy: The even numbers don’t matter for testing a hypothesis about odd numbers. What matters are the non-primes: we need to look for a non-prime that is odd to falsify the hypothesis.
I think Hempel’s logical point is this: to test the “all ravens are black” hypothesis, we should pay no attention to black objects at all. A black object can never falsify the hypothesis. So we need to focus on non-black objects and check them for raven-ness. This only makes sense if we specify an experiment in which we encounter random objects and extract information from them to test our hypothesis. It’s not how practical science works and therefore seems far-fetched.
engels 03.30.26 at 11:57 pm
You have a lucky dip wuth red circles and red, yellow, green and blue squares. You think all circles are red then you pull out a non-red thing, yellow say, squint it at it—uh oh—then you see it’s not a circle but a square… how has your confidence in the generalisation evolved?
oldster 03.31.26 at 6:18 am
Also — what do we have to do around here to send up the Kenny Easwaran bat-signal?
He’s the person — after John Q — whose opinion I’d most like to hear.
Tm 03.31.26 at 7:25 am
“something in me strongly objects to spending more effort reading something than the author spent on writing it.
mw: But you don’t have to do that. You can drop in the PDF and ask LLMs to summarize, to check for errors or weaknesses, identify the most consequential points, etc.”
A common use case seems to be to give an LLM a bunch of bullet points to turn into verbose, tediously written text, so that on the other end somebody can ask an LLM to extract the bullet points from a verbose, tediously written text.
So this is the progress that we have to burn the planet for. Makes total sense.
J-D 03.31.26 at 7:40 am
I concur.
If you think that what you’re dealing with with is an urn of coloured plastic animals, then you’re working from a theoretical model; if you think that what you’re dealing with resembles, in relevant respects, an urn of coloured plastic animals, then you’re working from a theoretical model; and the problem is that your theoretical model is a ridiculous one.
A theoretical model which leads you to entertain as a practical possibility the idea that you might ever find yourself looking at all of the non-black objects in the world is an even more ridiculous theoretical model. You’re never going to find yourself in that situation.
Exactly.
‘Is it the case that all ravens are black?’ is not the kind of question that a practical ornithologist (or any other kind of scientist) would investigate and if you asked one ‘How would you test the hypothesis that all ravens are black?’ I doubt they’d take you seriously.
Pinhead: What colour are ravens?
Ornithologist: They’re black.
Pinhead: But how do you know?
Ornithologist: I’ve seen them.
Pinhead: Ah, but have you ever seen an albino raven?
Ornithologist: As a matter of fact, I have. What about it?
Pinhead: Doesn’t an albino raven disprove the hypothesis that all ravens are black?
Ornithologist: Nobody ever hypothesised that all ravens are black. I said that ravens are black because they are. Albinism is an interesting phenomenon and worth investigating but it’s not a reason for changing the way we describe ravens. Do you have a point, apart from the one at the top of your head?
Pinhead: Maybe one day we’ll find other ravens that aren’t albinos but are other different colours?
Ornithologist: I suppose that’s within the bounds of possibility. If you want to spend your time looking for them, you’re free to do so. If you ever find any, we might learn something from examining them, but what we learn won’t be that we’ve disproved the hypothesis that all ravens are black.
oldster 03.31.26 at 8:21 am
And it occurs to me that the existence of an exhaustive list is inessential to the algorithm.
Just do it this way:
1) find a non-black thing. If it’s a raven, terminate with disconfirmation of hypothesis. If it is not a raven, then number it and exclude it from future searches.
2) find your next non-black thing, repeat “if” and “if not” operations.
3) if you exhaust all of the non-black things without finding a raven, then the hypothesis is confirmed.
As applied to the toy-world, this would lead to confirmation of the hypothesis (assuming that the raven in the toy-world actually is black).
Again, this algorithm will not readily scale up to worlds in which there are many non-black things. But I hope it will show that the “easy enough” refutation does not work.
EWI 03.31.26 at 8:45 am
Alex SL @14
LLMs will make us go backwards, from at least the aspiration to a meritocracy
This is a case of LLMs functioning as designed/intended by capitalists
oldster 03.31.26 at 11:28 am
JD:
“… if you know that you are looking at all of the non-black objects in this world ….”
“A theoretical model which leads you to entertain as a practical possibility the idea that you might ever find yourself looking at all of the non-black objects in the world is an even more ridiculous theoretical model.”
First, please note that the line you quote from my earlier comment says “this world,” not “the world.” I was referring to JQ’s toy world that he constructs for his “easy enough” refutation. I have acknowledged several times that there is no easy path from that toy world to “the world,” i.e. the actual world we inhabit.
Second, if you think that a model in which “you might ever find yourself looking at all of the non-black objects” is “an even more ridiculous theoretical model,” then you should take this complaint up with JQ, who constructed such a model.
Alex SL 03.31.26 at 11:29 am
Tm,
I was recently in a meeting where somebody set out to have an LLM summarise a block of 200-300 words THAT THEY HAD JUST WRITTEN THEMSELVES. (Then they saw the face I couldn’t help but make and thought better of it, admittedly.) In other instances I see people generate code and then immediately run it without trying to understand it. Those are the kinds of situations that make me completely lose my faith in the future. Many people who embrace LLMs are eagerly de-skilling themselves, not just in coding, no, even in expressing their thoughts concisely or in extracting meaning from snippets of their own native language.
There will always be brilliant people. I have the privilege of working with amazing team members who care about developing themselves and about getting things right. But we will increasingly be surrounded by many others who let their cognitive abilities atrophy. They will use LLMs for writing the way they already use cars for moving, unthinkingly, on reflex, because it is just that little bit too much effort to think for ten seconds about how to phrase the email to the procurement team.
EWI,
I don’t doubt that the tech bro CEOs want a worse world, but I doubt that they designed LLMs in any way outside of how they post-train them. That transformer technology and the attempts to build ever larger text prediction / chat bot models are the logical progress of a long-running research program in data science; I find it plausible that LLMs would at some point have been invented by academic researchers even under public good state ownership of tech companies. Then, of course, billionaires do what billionaires do with technology once it exists: they train it on stolen IP, they post-train it to make it maximally obsequious, addictive, and aligned with their ideology, they market it e.g. to students for cheating on homework, they overpromise, they insert it into every software system against the wishes of its users.
Even then, I think LLMs would still be corrosive to society if they were run as public goods on carefully curated training data and with strong guardrails. The de-skilling issue at least would still be there, and probably also the addiction and psychosis risk, because thinking that these chat bots are sentient and believing them when they say something we like to hear is human nature, not the success of a Silicon Valley plot.
D. S. Battistoli 03.31.26 at 12:21 pm
Tm @11, you’re dealing less with the paradox being hoax-like than the fact that it is situated near the core of one of the demonstrations of Gödel’s incompleteness theorem as relates to early twentieth century deductive logic.
John Q, in the original post, talked about demonstrating that the problem is with systems of logic and not with the world being described (eg, how many grains of sand exist in our world, from your example). We could imagine a world (set) that has two types of things: ravens and shoes. To slightly modify John’s example, the world would have infinite ravens and one shoe. From a Bayesian perspective, seeing that one shoe to be white might actually be more confirmatory of the statement “all ravens are black” than seeing any one black raven. Which of course seems like nonsense, but it’s only nonsense when you’re trying to apply deductive logic out in its hustings, which would be like trying to apply Newtonian physics (which remains super useful in its core applications) to the interaction of a pair of quarks.
It’s my understanding that the line from Quine through and beyond Goodman’s grue (which starts out as a needlessly confusing thought problem if you first hear of it after learning the nursery rhyme “Le petit ver de terre”) really put Hempel’s paradox to bed: both in Indo-European natural language and in formal logic, we bundle a rotating variety of heterogenous ontological and epistemic claims within the term “to be.” So the verb “to be” is a great bodyboard to ride to the edge of a logical system’s capacity for completeness, especially when you put a totalizing modifier like “all” in your subject.
MisterMr 03.31.26 at 2:04 pm
So, take the argument “all ancient greeks had a beard”, according to this logic, if I meet a modern frenchman who doesn’t have a beard this would for some reason increase the likelyhood of the first hypothesis.
Who ever in the history of the world made this sort of logical argument practically?
Suppose I realize that all modern female frenchmen do not have a beard, this is more likely to make me doubt of the hypothesis than to confirm it.
There is IMHO a confusion between how intuitive and/or inductive beliefs are created and how we treat propositions logically.
OK yes I understand this was not the point of the OP.
John Q 03.31.26 at 7:48 pm
“They will use LLMs for writing the way they already use cars for moving, unthinkingly, on reflex, because it is just that little bit too much effort to think for ten seconds about how to phrase the email to the procurement team.”
I’d use an LLM to write emails to the procurement team and feel no guilt about it
Tm 04.01.26 at 7:26 am
24: “We could imagine a world (set) that has two types of things: ravens and shoes.”
I think this is the wrong way of putting it. We have a world in which two properties are of interest, raven-ness and black-ness. All objects can be classified either as raven or not raven, and either as black or not black. Then the hypothesis “all ravens are black” can be tested, conceptually, by looking at all (or as many as possible) non-black objects; if we find a raven, we have refuted the hypothesis. In that sense, JQ’s two object world isn’t a problem. By verifiying that the only non-black object is a white shoe, we have confirmed that all ravens are black. It doesn’t matter whether ravens exist btw.
Logically speaking the reasoning is correct (that’s why I was wrong in 5). What is misleading is to mistake this thought experiment for an empirical research project.
EWI 04.01.26 at 10:39 am
Alex SL @ 23
I’m familiar with the what and how of LLMs. But the Broligarchy (and the old Silicon Valley tech giants) choosing to train it on deliberate, all-pervasive surveillance and intellectual property theft is a decision, and the only plausible real-world understanding of the frightening amounts of money being invested into it and the astronomical valuations around the likes of OpenAI (as with the recent failed self-driving car boom; see Musk ejecting Tesla) is that the business model here is to replace employees on a capitalist system-wide scale.
The public statements by the likes of pathological psychopaths like Altman, the fascism-curious Musk and the outright fascist CEO of Palantir all tend to confirm the vision here, and people should be really clear on what’s going on.
MisterMr 04.01.26 at 1:04 pm
@TM 27
This works only if your world has a finite number of objects, but in that case we are not speaking anymore of induction, that I think is what we are speaking about.
If the number of objects is infinite (which is the caswe in the real world) there is no way to collect a sufficient number of non-black objects.
So if we are speaking of induction, that is a process that makes sense only if we have not access to the sort of perfect knowledge that exists in this sort of limited world tought examples, the enumeration of opposties cannot work as an advancement of the “induction” IMHO.
David, a Bostonian in Tokyo 04.01.26 at 1:08 pm
I wrote something rude and deleted it, but had something to say.
First of all, J-D got it right.
The point is that no one friggin’ says “all ravens are black”, they say “ravens are black”. The whole logic rabbit hole is seriously stupid (there’s the rude bit, sorry. But you all get that.). Human beings understand that “rules” have exceptions, aren’t bothered by it. And have no trouble reasoning in such situations. Like J-D’s ornithologist. So logicians (and logic-oriented AI types of my generation) aren’t paying attention to what we humans do. Sure, logic is fun and useful for proofs in mathematics. But making it fly for real world situations will require a lot more thought and work.
Interestingly, this need for a lot more thought and work bit is why LLMs are so popular. We (my generation of AI types) failed miserably at making our programs do sensible things, so the AI universe grasped at the LLM straw. It generates great text (if you can stomach that slish) without doing the work of creating a world model and a theory of reasoning with that world model that roughly but reasonably simulates human reasoning.
Kenny Easwaran 04.01.26 at 7:10 pm
On the ravens bit – did you read the Janina Hosiasson-Lindenbaum paper about this? It gives a good argument that under certain plausible assumptions, there could be a tiny bit of confirmation from the non-black non-raven (though there’s a bit of trickiness in that observing that something is a white shoe is quite a bit stronger than observing it as merely non-black and merely non-raven!)
But of course your argument is also totally right – from a strict Bayesian point of view, there is nothing general you can say about a confirmation relation between two propositions unless there is a deductive logical relation between one of them (or its negation) and the other (or its negation). Hempel wanted a formal logic of confirmation, but it turns out to always depend on this substantive information built into the prior.
On the LLM publishing bit – I’ve desk-rejected about five LLM-generated papers now, at a variety of levels of polish. A couple of them I really wanted to accept because they were on interesting ideas, but either it was a straight-up LLM-generation (that dissolved into a swarm of bullet points by the end, and never really gave the argument) or was written by a person and LLM who focused on a bunch of issues that might have been appropriate for a non-philosophical context, but never got input from someone who knew what philosophers would be interested in reading.
John Q 04.01.26 at 8:40 pm
Oldster @1 Refereeing now works through horrible systems set up by Elsevier and similar, which don’t allow direct contact between editor and referee. I have to look the editor up and email them.
Kenny @35 I did look at Hosiasson-Lindenbaum as someone who had anticipated the kind of argument I wanted to make. But as with Good, her work didn’t appear to have settled the debate, so I thought it was worth making a somewhat different version of the argument.
“Hempel wanted a formal logic of confirmation, but it turns out to always depend on this substantive information built into the prior.” This is indeed my central point, and one I immodestly claim to have made more sharply than the existing literature.
I must admit, I have never managed to work out what philosophers would be interested in reading, even though my field is closely adjacent. As Richard Pettigrew mentioned to me in discussing my paper, one valuable use of LLMs is to check what people in adjacent fields, using different terminology, are saying about the same topic. This might break down some of the disciplinary silos in time.
Alex 04.01.26 at 8:51 pm
Is the Black Raven thing really an issue worth discussing? Imagine that there are 2 urns, one with 2 black ravens and 98 blue seagulls, the second with 1 black raven, 1 white raven, and 98 blue seagulls. You draw a blue seagull from an urn. Does that provide you with any information about which urn you are drawing from? Obviously not.
More generally, if we have a population of ravens (color unknown), and a population of non-black objects, then what drawing non-black non-ravens does is both (1) set ever stricter upper limits on the fraction of non-black objects that are ravens and (2) set ever stricter limits on the fraction of all objects that are ravens. That last information gain is critical, and between the two bits of information, any statements about the coloration ratio of ravens cancels out.
Tm 04.02.26 at 3:12 pm
MisterMr: I never said that this makes sense in the real world. In fact I have said it doesn’t. Also, I don’t know whether Hempel is speaking of induction.
Alex SL 04.02.26 at 11:49 pm
EWI,
Agreed.
John Q @26,
This example popped into my head because I recently had to ask Procurement to make a change to a purchase order. First, this served as an example because it (and other similar emails people write multiple times every day) is so trivial and quick to phrase that using the socially corrosive, energy-wasting plagiarism bot for it should be unnecessary. It is the writing equivalent of using the socially corrosive, energy-wasting car to drive to store that is only a hundred meters from home. Merely entering the prompt is nearly as much work as writing the real thing, yet it seems quite a few people unthinkingly use AI for even something like this, and the tech bros certainly expect us to do so (I receive a five word message from my wife, and WhatsApp asks if I want it summarised).
Second, outside of submitting some kind of standard ticket where selecting from two drop-down menus and attaching a PDF tells the story, in cases like this where I ask for something non-standard that I need to state in a written message, I respect my fellow humans who I expect to read my message by actually writing it myself. The colleagues in support services are people, and I will treat them as such, because I know how I react myself when I receive bot-generated messages or, say, am expected to read web pages generated by an LLM. In many contexts, the lack of thought and effort is an insult to the recipient.
But again, I understand what you are saying if you just need to communicate what amounts to selecting an option in a ticketing system, just like I don’t mind a computer script sending out alert messages.
I am sorry to say that I have no contribution to make regarding the ravens. I have the vague intuition that applying this kind of formal logic to empirical questions is a category error, but epistemology isn’t my area.
Zamfir 04.03.26 at 5:22 pm
@ Alex, it does give you a bit of information. P( not a white raven) is 1 against 0.99, so not drawing a white raven should shift your confidence slightly towards the urn without. That gets clearer with large number – imagine you drew 90+ not-a-white raven (without return), or a few hundreds of not-white-ravens (with return)
At that point, you would not have certainty, but you would surely lean towards “no whute raven” – unless you have some other reason to expect the white raven.
John Q 04.03.26 at 9:39 pm
Alex SL,
I was assuming that procurement emails were enough work to justify automation, a bit harder than a quick cut and paste into a form email. That’s the kind of job I would hand to an AI if I had to do it
My only common email is declining inquiries from prospective PhD students as I’m retiring. I used to answer if there was any indication of personalisation, now only if it’s obviously not AI. And since it’s only a couple of sentences, I still type the whole thing rather than finding a way to auto.
Alex 04.04.26 at 7:47 pm
@Zamfir it is actually a pretty simple calculation using Bayes’ Theorem:
P(all-black-raven | non-raven-draw) = [P(non-raven-draw | all-black-raven) * P(all-black-raven)] / P(non-raven-draw).
P(non-raven-draw | all-black-raven) = 0.98, and P(non-raven-draw) = 0.98, so we arrive at:
P(all-black-raven | non-raven-draw) = P(all-black-raven).
The act of drawing a non-raven does not provide any information. Also, the question of non-black-ravens is a triviality answerable in seconds by high-school probability.
Zamfir 04.05.26 at 11:58 am
@ Ale, that’s true. It seems a peculiarity of the setup though? Its not necessarily true that P(non-raven-draw | all-black-raven) == p(non-raven-draw), its a model of the world. For all we know, the world is better modelled by assuming that the existence of white ravens has no effect on black ravens, so p(black-raven | all-black-raven)= p(black-raven). Or something else all together.
Nick 04.05.26 at 2:06 pm
Only objects within a hypothesis’ range can serve as evidence; all others are irrelevant. Logical equivalence ensures this applies equally to the contrapositive.
In other words, inverting a pair of terms does not alter the hypothesis’ range. The Raven Paradox arises solely when we falsely assume it does.
steven t johnson 04.06.26 at 4:23 pm
Nick@40 says that inverting a pair of terms cannot alter the hypothesis range. Does that mean for Hempel, the proposition (hypothesis) “All ravens are black” has the “range” of not just blackness but birdness? Therefore the correct contrapositive is “All non-black birds are non-ravens.” And the paradox arises from extending the range of the proposition from black birds to all non-black things rather than birds?
To me “range of the hypothesis” expresses the same notion as John Q’s “probability judgments based on a model of the world.” I suppose one could see “range of hypothesis” as talking about the beginning of an research project and “probability judgment” as about the results? (And my version @6 is more about how researchers must talks to each other throughout?)
engels 04.06.26 at 7:39 pm
I think if you’d gone through (almost) all the non-black things in the universe and discovered that none of them were ravens, you’d be pretty confident that all ravens are black, but if observations of the first didn’t confirm the second I don’t see how that could be.
Chet Murthy 04.07.26 at 6:01 am
JQ,
I’ve never used any of these AI thingies, but I’ve read that it is pretty straightforward to get an AI to give you a decent argument for some proposition X, and then for its negation. I’ve seen this demonstrated with transcripts of interactions with ChatGPT, in the domain of politics. I would guess that this is doable in the world of philosophy, too. In a way, it seems to argue that all of sophistry. But also, I remember “epistemic learned helplessness” (wherein the author noted that it was easy to be convinced of a proposition X, and also of its negation, when one didn’t actually know the subject area).
This leads me to wonder whether your conjectured future world is one in which hypotheses, philosophical arguments, etc, will be devalued, and the only thing that will have any value is empirical/experimental work.
I’m glad to be retired. This brave new world seems pretty terrible, and I only hope that humans can retain their humanity, and still make progress, in it.
John Q 04.07.26 at 7:43 am
Chet @43 Reviewing all the arguments on both sides is a good thing. Unless you’ve heard and rejected the counter-arguments, you have only weak grounds for yo.ur beliefs. This was JS Mill’s big argument in favour of free speech, IIRC.
Hidari 04.07.26 at 8:02 am
On the general topic, this article is interesting, and indicates that the current way of doing science, which has continued since, roughly, the 1970s, is now collapsing and disintegrating for many reasons, not just the threat of LLMs (e.g. the replication crisis, the absence of progress on core topics detailed by John Horgan (‘The End of Science’), the increasing corporatisation of science and the necessary concomitant of that, increasing corruption and fraud in science, and so on).
https://davidoks.blog/p/how-citations-ruined-science?
The current ‘way’ of doing science is collapsing: this process will continue for the next few decades, and eventually we will have to create an entirely new way of doing science or else the Western scientific tradition, as we have known it since the Renaissance, will simply come to an end.
About the specific sub-topic: of course here we run into the deranged anti-Wittgensteinian (indeed, anti-Ordinary Language Philosophy as a whole) weltanschauung of the American influenced (post war) social/human sciences, in which snatches of ‘overheard’ language are yanked out of context and the decontextualised snippets then used as the foundations of gigantic de facto metaphysical systems (‘an inverted pyramid of piss’ as Kingsley Amis once put it).
As @30 points out: the whole, entire ‘debate’ is stupid. All of it. Because of course in the real world no one* actually says all ravens are black. They say ravens are black which is a very different kind of statement. So the entire ‘debate’ is irrelevant to real people and how real people speak and live their lives.
A similar sort of ‘debate’ to the one that Bertrand Russell engaged in when he persuaded himself that ‘The present day King of France is bald’ was the sort of statement that contemporary French people uttered on a regular basis, and which therefore needed philosophical explication.
*By which I mean ‘no one’ who is not an academic philosopher/logician/statistician, literally paid to play word games.
Nick 04.07.26 at 12:49 pm
steven: “To me “range of the hypothesis” expresses the same notion as John Q’s “probability judgments based on a model of the world.”
I think they’re related, but part of my claim is that the logical error can be identified without appealing to frameworks.
The hypothesis, accurately expressed, determines what it ranges over. By logical equivalence, the contrapositive can’t extend that range—if it did, it would be doing more than preserving truth conditions; it would be changing the hypothesis.
Finite cases help illustrate this. When a hypothesis is defined over a fixed set (say, 1000 objects), everything in that set is within its range, and valid elimination effects from discarding non-black non-ravens become relevant.
Similarly, consider the universal hypothesis: “All non-black things are non-ravens.” Its contrapositive, “All ravens are black,” doesn’t make observing a black raven evidence for the original claim, because they lie outside the hypothesis’ range.
Alex SL 04.07.26 at 9:09 pm
Zamfir,
Of course, that logic is the entire basis of statistics. I just don’t understand why drawing a white shoe should change my estimate of the colour of ravens.
Tm 04.08.26 at 8:08 am
Hidari: “A similar sort of ‘debate’ to the one that Bertrand Russell engaged in when he persuaded himself that ‘The present day King of France is bald’ was the sort of statement that contemporary French people uttered on a regular basis, and which therefore needed philosophical explication.”
What a silly statement. I’m sure you are aware of the fact that Russell was working on the foundations of logic, not concerned with everyday language use.
engels 04.08.26 at 9:17 am
I just don’t understand why drawing a white shoe should change my estimate of the colour of ravens
You wouldn’t say that if it flapped its wings and flew off.
MisterMr 04.08.26 at 9:23 am
@ TM 34
My point is that the “paradox” only is relevant if we are speaking of induction, in which case it is wrong, so a tought experiment that uses an example where we are not exactly in a case of induction doesn’t count.
I don’t understand very much bayesian probablilities or similar, but the problem seems to be: people often come out with general rules “all ravens are black”, how do people do that?
Generally this happens because someone sees a lot of ravens, and notices that they are all black (classical induction).
But, suppose that I come out with the different general rule “all non black things are non ravens”; following the same logic of classical induction, maybe I saw a lot of non-black stuff, and I noticed that it was all non raven; but this sounds weird to most people (even if it is the same induction method, and actually the second general rule is logically equivalent to the first rule).
Why does this second induction sound strange? If we give a naturalistic explanation, it is because I will never see a lot of “non black things”, and I will also never notice that they are “non ravens”, because neither “non raven” nor “non black” are intuitive categories.
If we look at how the human brain evolved, it will recognize similarities and so will recognize a category of similar objects as “ravens”, but not the category of “not ravens” that exists logically but not intuitively.
But in the naturalistic hypothesis, a human (or also other animals) will live in a very complex world with an infinite number of “things”, and with no advance knowledge (or perhaps very minimal evo-psy pre-existing categories). In this situation, where there is an infinite number of “things” and a smaller subgroup of recognizable “ravens”, induction only can work for the first hipothesys (all raven are black because I saw a lot of ravens and they were all black) but not the second (all non blacks are non ravens) because my brain will never see all “non blacks” as a category.
Now the problem is that this “naturalistic” situation of an infinite number of items is really the basis of human cognition, and all the logical structures like deductions or statistics can only come later (e.g. I can’t make deducvtions without a previous induction that gives me the premise), so the paradox fizzles out in the naturalistic situation even though it still works in some situations where I have already some previous knowledge that e.g. makes the number of items non-infinite, but then I don’t need induction in the strict sense anymore and I have a different kind of categories (not created directly from induction).
J-D 04.09.26 at 4:23 am
If you’re trying to discuss something that actually happened, I think it would be closer to the mark to suggest that people noticed a kind of bird which, for whatever reason, they found it useful to be able to distinguish from other kinds of birds, and then created or modified or adopted a word for that purpose–in English the word ‘raven’, in other languages other words, and that one of the distinctive features they observed that led them to identify these birds as a category was their blackness (although blackness probably wasn’t an important part of the distinction–for examples, crows are also black). What this means is that (in all likelihood) the observation that they were black came before, not after, the designation of them as ravens.
If your purpose is not to discuss what may have happened in this particular instance but rather to use it as an example to illustrate your more general point, then I suggest it might be helpful to find examples which do a better job of illustrating your general point.
engels 04.09.26 at 11:56 am
All the ravens I’ve ever seen have been blite (black until the year 2030, white after).
MisterMr 04.09.26 at 12:25 pm
@J-D 51
You are correct that in my argument I’m mixing up the designation of a category (recognize a set of similar items as “ravens”) with the attribution of a general rule (“all ravens are black”), however I don’t know if/how the general designation of categories can be distinguished from the creation of general rules, in fact I think the two are more or less the same in natural thinking, which is why I think the “paradox of confirmation” doesn’t really work.
Assuming that it is about induction, that I believe is the source of both the designation of categories and the creation of general rules.
Hidari 04.10.26 at 6:57 am
@48
Wittgenstein’s point (in his ‘later’ philosophy) was that logic and maths were simply different kinds of language games with no greater ‘metaphysical heft’ (if you want to put it that way) than ordinary language, but which were ultimately derived from ordinary language because they were ultimately devised (not discovered) by real flesh and blood human beings who existed in a specific socio-economic and socio-cultural (and historical) ‘situation’. In other words he was anti-Platonist in terms of maths, which is fairly well known, but also in terms of logic (by the, ahem, ‘logic’ of his own philosophy).
Comments on this entry are closed.