From Algorithmic Monoculture to Epistemic Monoculture? Understanding the Rise of AI Safety

by Shazeda Ahmed on November 16, 2023

From November 1-2, the UK government hosted its inaugural AI Safety Summit, a gathering of international government officials, AI business leaders, researchers, and civil society advocates to discuss the potential for creating an international body to govern AI, akin to the IPCC for climate change. On its surface, ‘safety’ appears to be an unobjectionable concern after years of instances in which AI systems have caused errors that have denied people state benefits and cast them into financial turmoil, produced hate speech, and denied refugees asylum due to mis-translations of verbal testimony. 

Yet the conception of safety that motivated the Summit is unconcerned with this category of harms, instead looking to a future hundreds of years from now where advanced AI systems could pose an ‘existential risk’ (x-risk) to the continued existence of humanity. The ideas behind the emerging field of ‘AI safety,’ a subset of which operate on the assumption that it is possible to prevent AI x-risks and to ‘align’ AI systems with human interests, have rapidly shifted from a hobbyist interest of a few small communities into becoming a globally influential, well-resourced, and galvanizing force behind media coverage and policymakers’ actions on preventing AI harms.

Where did these ideas originate, what material outcomes are they producing in the world, and what might they herald for the future of how we regulate and live with AI systems?

In spring 2022, my colleagues at Princeton University and I began to track AI safety’s growing influence and sought to map its intellectual origins. How are the ideologies that underpin this new field moving people, money, research, community-building, and career advising—in sum, the activities that people within AI safety refer to as ‘field-building’—towards a utopic vision of living with AI?

We began to see a broad-strokes argument unfolding that as large AI systems such as large language models (LLMs) including ChatGPT scale up, they could develop advanced capabilities beyond those their original creators had anticipated, and cause widespread harm to humanity if left unchecked. Related to this fear of rogue AI systems, AI safety proponents worry about whether a human bad actor could amass vast quantities of computational power (“compute”) to build a bioweapon

To understand the genesis of these fears, we read texts foundational to x-risk studies, and traced their relationship to both the concept of long-termism— concerns with the future of human existence hundreds of years into the future (and longer yet)— and the effective altruism movement. Effective altruists (EAs) are concerned with how to “do the most good” given finite resources. Drawing from utilitarian philosophy, they seek to optimize returns on expected value in a range of proposed interventions for cause areas including the prevention of pandemics, nuclear wars, and AI x-risks.

Early in our research, I attended a talk by Sam Bowman, an NYU professor of computer science who had taken leave to work at Anthropic, a company founded by former OpenAI staff with the purported aim of building  safe AI systems. Bowman highlighted a point that later came to be central to our research: a variety of communities have sprung up around ideas such as EA, x-risk, and long-termism, and people within these communities are in the majority of those advocating for AI safety.

Yet a growing number of people coming to the field of AI safety have no affiliation with these ideas. The latter group nonetheless must pass through institutions that have been shaped by what my colleagues and I refer to as AI safety’s “epistemic culture” (from sociologist Karin Knorr-Cetina): the cultural practices of how knowledge is constituted and disseminated within the sub-communities that unite to work towards a shared idea of AI safety.

To better grasp what is included in this emerging field’s conception of “AI safety,” we started by identifying what research, community cultivation, fellowships , and institution-building in the field were being funded. Then we descriptively analyzed the picture that these funding stream’s outputs formed.

What emerged was a clear image of a tightly networked series of communities with at least four distinct features. The first is online community-building— both through EA web forums such as LessWrong and EA career advising hubs such as the nonprofit 80,000 Hours. Someone who may have read Oxford philosopher Nick Bostrom’s book Superintelligence, for instance, may first encounter the idea that future AI systems can one day ‘surpass’ humans at performing every task and attain artificial general intelligence (AGI).

Then, seeking a community of like-minded others debating the consequences of such a future, they could find a lively set of discussions about this topic on the Effective Altruism Forum, or the more recent Alignment Forum. Career advisories like 80,000 Hours create a pipeline where young people in these networks can find jobs in EA cause areas, for instance working on AI alignment at OpenAI.

The second feature is AI forecasting, which takes two forms. One involves hiring professional forecasters to cast and defend predictions about specific outcomes, such as the date by which a particular model may attain a specific benchmark— if a model outperforms the forecaster’s guess, some in the field see it as a sign that AI systems are developing at a clip that humans attempting to ensure safe deployment cannot keep up with (therefore, the logic goes, this would justify the need to invest in more AI safety research.) 

The third feature constitutes this research itself, which focuses on issues such as monitoring emergent capabilities as AI systems scale up and develop unanticipated new functions, developing methods for alignment of AI systems with pre-defined human values to avert x-risk, defending the robustness of these systems  against highly improbable but massively destructive events such as crashes of automated financial trading systems, and finally the catchall of “systemic safety,” referring to deployment of systems in context—including issues such as privacy, cybersecurity, and algorithmic bias.

Many of these ideas are disseminated via the fourth feature, prize competitions where entrants can submit code or papers in response to technical challenges in the field. Given the deep coffers of AI safety funders, and the urgency with which they believe AI safety must be addressed, prize competitions in the field carry massive prize pools ranging up to $1 million split across winning participants. While some of these competitions are hosted by AI safety nonprofits, we saw a trend toward competitions being embedded in academic computer science conferences as part of a broader effort to bring AI safety into mainstream academic computer science.

What happens when this field has influence beyond its epistemic community? Media coverage has tended toward replicating AI x-risk narratives, platforming the small handful of companies and figures at the center of this epistemic culture and positioning them as speaking for a larger constituency  than they represent. For instance, companies such as OpenAI and DeepMind have long made it their mission to attain AGI— and OpenAI CEO Sam Altman not only presents this as possible, but as a path to a future utopia. The flipside of this premise is often presented within AI safety as one where we fail to rise to the challenge of ‘aligning’ AI with human interests.

Laying bare the ideological scaffolding of AI safety raises the question of whether these ideas will continue to be central to the field going forward, and how essential they are to pursuing the field’s definition of safety in the first place. Rishi Bommasani et al describe the algorithmic monoculture that arises when a small handful of companies like OpenAI or Anthropic produce ‘foundation models’ such as ChatGPT that a plurality of third party actors across the platform economy come to rely on, from Spotify to Zoom.

In parallel to these same companies that are the leading industry actors in the field, do the ideas underpinning AI safety have the potential to become an epistemic monoculture, crowding out other points of view? Furthermore, does buying into one part of this framing of AI safety essentially amount to being indefinitely locked in to a utilitarian, x-risk centered approach? If so, what might the consequences be?

Eight years ago, Oxford philosopher Amia Srinivasan wrote about the effective altruism movement as dodging specificity in favor of general approaches to mitigating x-risks. This is echoed in how organizations in the epistemic community are presumed, based on names like the Future of Life Institute or the Future of Humanity Institute, to see protection of the entirety of humankind as an achievable mission.

A recent coalition of Chinese and US organizations and individuals proposed in light of the UK’s AI Safety Summit that companies spend up to 30% of their research budgets on AI safety. If this happened, what opportunities might prioritizing this generalist approach foreclose? One possibility is that finding solutions to well-documented problems— discriminatory uses of AI tools in hiring, or the wide range of shortcomings in AI functionality — would, in this framework, become downgraded in urgency.

Srinivasan notes that part of the wide appeal of effective altruism is that it ultimately does not challenge, but rather reinforces, the status quo. As with other ideologies that have incubated in Silicon Valley, while many ideas coming out of the AI safety epistemic community can appear at first to run counter to the mainstream, they nonetheless embed neatly into what Henry Farrell and Abraham Newman term the ‘weaponized interdependence’ model of the United States.

For instance, leading figures in the AI safety community have long treated compute” resources as akin to a wartime resource that the United States government must monitor to prevent individual bad actors from amassing them to build bioweapons— a position that has become solidified in the mainstream through the recent White House Executive Order’s inclusion of references to governing foundation models using the Defense Production Act. While some commended the EO for acknowledging the need to protect workers and alleviate social harms of AI systems, the language regarding these known harms that stem from historical injustices was far vaguer, calling for “standards” and “guidelines” whose enforceability remains an open question.

Understanding the epistemic culture of AI safety can help us anticipate what industry leaders and other influential academic and philosophical voices in the epistemic community will advocate for in the future. In a recent interview with Foreign Policy magazine’s editor-in-chief, White House Office of Science and Technology Policy representative, sociologist, and leading architect behind the Blueprint for an AI Bill of Rights Alondra Nelson noted that the government has opened up the conversation about regulating AI to public debate, encouraging citizens to appeal to their representatives for advancing legislation.

What do people advocate for when a prevailing narrative is that AI may one day end humanity, and that we must put faith in a small group of technocrats who claim to speak for all of humanity to prevent that outcome? And how do we avoid a doubling down on the status quo of self-regulation and voluntary agreements that AI safety appears to be amenable to?  As we head towards what many are hoping will be a banner year for more concrete regulation of AI, the AI safety epistemic community should be more receptive of external critiques, and aim to be accountable– both for the knowledge they produce that has gained major global influence in a very short period of time, and for the specific, contextual harms that arise in the AI systems they aim to improve.



Bill Benzon 11.17.23 at 3:53 pm

“…a variety of communities have sprung up around ideas such as EA, x-risk, and long-termism, and people within these communities are in the majority of those advocating for AI safety.”

YES! I am in broad agreement with your conclusions.

I’ve been conducting my own investigation into this material, though I have been more focused on AI Doom than on AI safety, though they are obviously related. In particular, I have been a participant observer, if you will, since I have been actively posting at LessWrong for over a year. I have various reasons for doing this, the most important of which is that there are a lot of intelligent people there who are interested in AI, as I am. So it’s a good place for me to get feedback on some of my ideas, even if I am deeply out of sympathy with the ideological bent of the place. FWIW, the user interface at LessWrong is the best community interface I’ve seen in over two-decades of life on the web.

On AI forecasting, yes. A great deal of effort is devoted to surveying experts about when they think this or that milestone will be reached. As far as I can tell, all of these surveys involved self-selected samples. They also construct fairly elaborate models that, to this outside, seem rather like Rube Goldberg constructions. I think all of this is epistemically dubious and have coined the term “epistemic theater” in consequence.

A lot of these people espouse rationalism, by which they mean a fairly recent view of the world grounded in Bayesian statistics. Thus in discussing these forecasting efforts people talk about how this or that finding has caused them to “revise their priors.” Whatever you may think of it, this is a pervasive and comprehensive world-view which affects how people think about everything and go about their daily lives.

I have come to the tentative conclusion that, in this overlapping complex of belief systems – EA, ex-risk, long-termism, cryptocurrency, and rationalism – we are dealing with something comparable to the counter-culture movements of the 1960s and 1970s – the Civil Rights and anti-war movements, hippies, feminism. The number of people involved is smaller by orders of magnitude, the focal concerns are different, and it tends to be geographically focused on Silicon Valley, though we should also think in terms of a virtual Silicon Valley culture on the web.

I’ve written about the AI component of this counter culture in an article I published at 3 Quarks Daily: A New Counter Culture: From the Reification of IQ to the AI Apocalypse.


SusanC 11.17.23 at 7:46 pm

The EA community was interesting in AI risk long before it went mainstream. But now hat at least some forms of AI risk are staring to look imminent, a much larger community with a different background takes interest – e.g. the computer security community, or even the British Prime Minister, Rishi Sunak.

I would expect a fairly rapid transition in which the character of the community changes, as the newcomers outnumber the old guard.


SusanC 11.17.23 at 7:54 pm

The argument that AI will increase terrorism assumes that terrorist groups are limited by technological know-how. It’s not clear that’s true, although as things are going it appears the AI companies are going to do the experiment of releasing AIs that will help you plan your terrorist attack and see what happens. I guess we will have concrete date soon.

E.g. possibly mass shooters have a thing for guns specifically, aren’t aren’t going to switch to more effective ways of killing people even if they become available.


somebody who remembers when yudkowsky met a rich person and wrote 50,000 words about how much they "sparkle" 11.18.23 at 1:27 am

the field of “ai safety” is completely dominated, from top to bottom, by people who don’t think it’s bad when a facial recognition ai tells the cops to kick down the doors of a random black guy, flashbang his grandmother and kill his dog, they just think it’s slightly embarassing. not damaging at all, not a negative worth talking about. but if it might cause 1.2 trillion future space colonists to get a bit of dirt in their eye that’s an outcome worth spending 150 million thielbucks, muskbucks and altmanbucks to avoid.


Alex SL 11.18.23 at 1:17 pm

This study of ideology and influence is interesting, but personally, I am still more interested in the risk itself and how it is perceived – which is relevant to the motivations behind the movement, however.

Do these alignment people actually have a plausible mechanism of action for how the AI will cause the extinction they fear, or is it, as it seems to me, complete ignorance of biology, physics, chemistry, and computing combined with “I read this scenario in a science fiction book once in my teens, and I found that very impressive”?

Why can’t they just have a kill-switch to turn off an AI when it does something harmful, like, say, unplugging the server on which it runs? Do they think it will somehow reach out and electrocute them when they reach for the plug, like in that scifi horror movie they watched that one day? Or copy itself onto lots of other devices that patently don’t have the processing power to run it, like in that other movie?

What does their alignment “research” actually involve? Years ago I looked a bit into the publications of MIRI, and they were a combination of opinion pieces and weird moral philosophy. It was as if I founded ELRI (eternal life research institute) and collected donations to “research immortality” and then did nothing but write letters to the editor speculating about the hypothetical impact it would have on pensions and education if people lived forever. Does coding come in at any stage, is there anything they do that can actually in any way be relevant to, say, how LLMs are trained, or is it just blah blah extinction blah now give me more donations?

Where do they get the idea that a model trained to, say, guess how to respond to text prompts will suddenly develop the desire and ability to kill all humans with, say, “diamondoid bacteria”? (Not that those, whatever they are, are even a thing outside of Yudkowski’s imagination.) That is not how stuff works. If, say, a gazelle evolves to eat grass, flee from lions, and have offspring with other gazelles, it will not suddenly develop a great desire to go asteroid mining. How neural networks are trained or evolve actually matters, there is no magic here.

Maybe they think about paperclip maximisers, where an instruction is horribly misunderstood? Okay, question of the kill switch again. But also, are they aware that there are already two types of paperclip maximisers, so that we can easily see how those fare? The first type is life. Every organism currently on this planet has evolved to turn everything into copies of itself, and a few years of AI research hold no candle against billions of years of evolution in terms of optimisation. How is that working out for them – any species been successful so far? Oh, I see, they tend to die of starvation when they become too numerous. Why would physics and specifically resource constraints be any more forgiving to an AI paperclip maximiser? The second is capitalism, which tries to turn everything into profit. That is a very successful paperclip maximiser, and it is on its way to collapsing our current circa third cycle of complex civilisation, so I can’t say this observation is an argument against risk. But it is extremely difficult to map out how a super-AI could do anything more harmful than unsustainable profit-seeking that will cause billions of deaths by destroying biodiversity and heating up the planet without, again, resorting to magical thinking informed by scifi movies with more special effects budget than plausibility.

There is no there there. The only thing that makes me doubt the claim that they are doing this because they deliberately want to distract from real harms like discrimination being baked into models or the tech mostly being used to spam and misinform is that they just don’t seem smart enough to do that. Read their tweets, watch their interviews; they actually believe in X risk and alignment. It is as if they read LoRT, are worried Sauron will get his ring back, and you better give them money so that they can write opinion pieces arguing he shouldn’t. The only difference is that fantasy is more obviously fantasy than scifi, but here is the thing: scifi is still a form of fantasy, they just haven’t figured that out.


Bill Benzon 11.18.23 at 8:31 pm

@somebody who remembers… & Alex SL: Yes, Yudkowsky is a piece of work. Yet he was able to package his Bayesian ‘rationalist’ philosophy in Harry Potter fan fiction and attract 1000s to his cause. I was shocked when a mainstream publication like Time Magazine gave him space to give his views on AI risk. Moreover, the recently defenestrated Sam Altman once tweeted that “he got many of us interested in AGI, helped deepminded get funded” and even suggested that he’d one day deserve the Nobel Peace Prize. We can mock him, and those who have been influenced by him, all we wish, but that doesn’t bring us any closer to understanding how they’ve come to have such influence in the high tech world.


David in Tokyo 11.19.23 at 7:11 am

“Moreover, the recently defenestrated Sam Altman”

And even more recently, refenestrated.

And something will happen next…


Bill Benzon 11.19.23 at 4:00 pm

Well, as of the moment, and as far as I can tell, he’s not quite yet back on board, but who knows. Meanwhile, it seems to me that there’s plenty of room for litigation in the future.

As we all know, OpenAI now has a complicated corporate structure involving a “capped profit” company ultimately governed by the board of a not-for-profit company. One of the provisions of this structure reads as follows:

Fifth, the board determines when we’ve attained AGI. Again, by AGI we mean a highly autonomous system that outperforms humans at most economically valuable work. Such a system is excluded from IP licenses and other commercial terms with Microsoft, which only apply to pre-AGI technology.

I know what those words mean, I understand their intension, to use a term from logic. But does anyone understand their extension? Determining that, presumably, is the responsibility of the board. As a practical matter that likely means that the determination is subject to negotiation.

If the purpose of the company was to find the Philosopher’s Stone, no one would invest in it. Though there was a time when many educated and intelligent men spent their lives looking for the Philosopher’s Stone, that time is long ago and far away. OTOH, there are a number of companies seeking to produce practical fusion power. Although there have been encouraging signs recently, we have yet to see a demonstration that achieves the commercial breakeven point. Until that happens we won’t really know whether or not practical fusion power is possible. However, the idea is not nonsense on the face of it, like the idea of the Philosopher’s Stone.

If figure that the concept of AGI is somewhere between practical fusion power and the Philosopher’s Stone. That leaves a lot of room for litigation over just when OpenAI has achieved AGI and therefore when licensing stops. That also puts a peculiar gloss on Altman’s recent joke that they’d achieved AGI “internally.”


Alex SL 11.19.23 at 9:45 pm

Bill Benzon @6,

I do not understand how people get caught up in cults instead of immediately getting a bad feeling when hearing the cult leader talk and backing away slowly, so I got very little regarding the how except (a) these people are much less smart than they think they are and (b) some of them would have been in a vulnerable state when they were recruited.

There are some aspects to the x-risk/alignment problem ideology that make it a good fit to tech industry libertarians, though. The idea that a super-AI can do whatever it wants regardless of resource constraints or physics because it is just that smart latches perfectly onto the idea that disruptive genius entrepreneurs are the creators of value, as opposed to their employees who do the actual work while surrounded by publicly funded infrastructure: a great mind is all that matters, and a sufficiently great mind can achieve anything! On the flip-side, it must flatter them to think that they could, if they are smart enough, personally become the hero who defeated “extinction risk” by personally finding a fix to the “alignment problem”, because, again, this puts the individual at the centre. Actual risks such as global heating or the biodiversity crisis are too obviously collective action issues, which means that acknowledging them as such disproves the individualistic/libertarian ideology this lot uses to justify their wealth and privilege and to maintain their self-image. Therefore, better to push them aside and focus on a made-up risk that at least plausibly has a technical fix.


David in Tokyo 11.20.23 at 6:23 am

Bill B. wrote:
“we are dealing with something comparable to the counter-culture movements of the 1960s and 1970s ”

Except that the folks in this new group are right-wing hacks. Extremely rich right-wing hacks. A very ugly bunch of blokes.

Some of the outside voices, e.g. Emily M. Bender on mastodon, are reasonable human beings.

Note that I have axes to grind here: I audited Minsky’s graduate AI seminar in 1972, earned an all-but-thesis in AI under Roger Schank in 1984, and think that the current round of AI is pure, 100%, intellectually vacuous BS. YMMV, as they say. (Also, I was there for the counter-culture movement of the 60s*. This ain’t anything like it.)



bekabot 11.20.23 at 9:15 am

“The idea that a super-AI can do whatever it wants regardless of resource constraints or physics because it is just that smart latches perfectly onto the idea that disruptive genius entrepreneurs are the creators of value, as opposed to their employees who do the actual work while surrounded by publicly funded infrastructure: a great mind is all that matters, and a sufficiently great mind can achieve anything!”

Not too long ago I posted that Elon Musk is depressing because he disproves the idea that if only you’re rich enough or smart enough or both, there’s literally nothing you can’t accomplish, including making loose change fall upward when it’s dropped or turning birds back into dinosaurs.

Elon Musk isn’t depressing because he can’t do this stuff or because he isn’t the guy who can do this stuff — nobody is and nobody can. He’s depressing because he’s the proof that the person who can do it doesn’t exist. And if men can’t do the impossible, neither can machines.


Bill Benzon 11.20.23 at 1:52 pm

@AlexSL: Yes, there’s plenty of narcissism among the Doomers. Everyone wants to be the little Dutch Boy who sticks his finger in the dike.

@David in Tokyo: Back in the mid-1970s I studied computational semantics with David Hays at SUNY Buffalo. Hays was a first generation researcher in Machine Translation, heading up the RAND Corp. effort. In 1969 he became the founding chair of Linguistics at Buffalo. My 1978 dissertation in the English Department was, in considerable part, a technical exercise in knowledge representation. I also marched against the war in Vietnam, tuned in, turned on, but never quite dropped out.

I tell some of that early personal history in: Touchstones • Strange Encounters • Strange Poems • the beginning of an intellectual life.

My views on the current technology are complex. There’s this, GPT-3: Waterloo or Rubicon? Here be Dragons, which I wrote after GPT-3 hit. And there’s this, ChatGPT intimates a tantalizing future; its core LLM is organized on multiple levels; and it has broken the idea of thinking, which I wrote after having experimented with ChatGPT.


somebody who remembers scott siskinds best buddy is steve sailer 11.20.23 at 4:05 pm

Bill Benzon @6

No mystery. The powerful always have been willing to elevate people who tell them what they want to hear and pay them infinite amounts of money to say it. they’re deranged racist assholes so they love it when ai does deranged racist asshole shit.

just taking something from today’s ai headlines:

“ACK tually an insurance company using AI to more efficiently deny lifesaving care to people and make them both bankrupt and dead to save the ceo a few dollars is good, mr. ceo! you see, ceo/company founders like you have a brain genius number of 1237311.32 and a normie dullard, (often a WOMAN), has a brain genius number of only 97 so if u really think about it and bayes it out it’s bad if you DON’T get their money, that’s what you call effective altruism.”

“they should give you the nobel peace prize”


J-D 11.21.23 at 1:16 am

founders like you have a brain genius number of 1237311.32 and a normie dullard, (often a WOMAN), has a brain genius number of only 97 so if u really think about it and bayes it out it’s bad if you DON’T get their money

Canada Bill Jones was more succinct: ‘It’s morally wrong to allow suckers to keep their money.’


David in Tokyo 11.21.23 at 5:46 am

There are lots of videos on YouTube explaining the LLM technology.

The bottom line is that LLMs processes sequences and patterns of undefined tokens, and have no way of relating those tokens to anything, even other undefined tokens. The only thing they do is extract patterns of undefined tokens and replace undefined tokens with other undefined tokens.

That they appear to do sensible things with language is exactly that, appearance.

There’s no reasoning there. It can’t deal with, say, multiplication as a concept, only as an undefined token that appears in patterns in its training set. So how do LLMs do even simple arithmentic? By looking up the answer in the training data.

It’s the very definition of stupidity. No more, no less.

So when does the crash come, when do people realize that this “intelligence” actually can’t do even the simplest reasoning or logic, that it’s a buck nekked emperor, one without even a stitch of logical reasoning whatsoever?

Dunno. But the soap opera going on right now at OpenAI/MickeySoft is pretty funny.

FWIW, there was an article on the front page of the Japanese newspaper here this morning that figured out that the Japanese version of ChatGPT replicates the sexism of the text in its dataset just as well as the English versions did until they put a front end on it that filters out the most embarrassing of the stupidities it disgorges.


MisterMr 11.21.23 at 2:31 pm

Ghost in the Shell was a great movie.

That said, I recently found an example that to me is very clear to explaint the differences between AI and actual intelligence: porn.

If I look ad a porn image, this will have an emotive effect of me, because biologically I’m built to react emotively to those stimuly. If someone asks me to draw a porn picture, in part I will copy other images that I saw but in a big part I will think to images that elicit in me that kind of emotions.

On the other hand, if one asks an AI to draw a porn image, it will do something based on the thousands of existing porn images, and since these were produced to the tastes of humans, the resulting image will also likely result exciting for humans; however the AI itself will feel no sexual excitation obviously.

From this point of view, it is evident why I, potentially, might act in bad ways on my sexual insticts, so in some sense I’m a potential rapist, whereas an AI, no matter how complex, will never be a potential rapist, because it doesn’t have a sex drive.

In humans, wishes and desires of various sorts evolved together with what we call “intelligence” and, in my view, are an integral part of it; on the other hand what we now call “AIs” obviously don’t have these, so their “intelligence”, if we want to call it so, is qualitatively different from ours.

For the same reason an AI will never have a desire to rape someone, it will never either have the desire to conquer the world, or to make money, or to protect itself.
So this whole fear of the AIs is a whole lot of BS, and honestly even the name AI is bull, this isn’t really artificial intelligence.


Bill Benzon 11.21.23 at 4:25 pm

@David in Tokyo. Reasoning is not some one thing such that LLMs either have it or they don’t. Some kinds of reasoning are beyond them, like multidigit arithmetic. Other kinds of reasoning are within, say, ChatGPT’s powers. For example:

ME: In the following story, do we justice being served?

Once upon a time, in a busy city, there was a fire that broke out in a tall office building. People were panicking and trying to escape, but the fire was spreading quickly and the exits were blocked.

Just when it seemed like all hope was lost, a brave and selfless hero stepped forward to save the day. He bravely fought his way through the flames and smoke, risking his own life to rescue as many people as he could.

Despite the danger, the hero managed to lead many people to safety, and the fire department arrived just in time to put out the blaze. Thanks to the hero’s quick thinking and bravery, no lives were lost.

But instead of thanking the hero and giving him the recognition he deserved, the media and the public focused on the wealthy CEO of the company who was inside the building at the time of the fire. They praised him for being “brave” and “selfless,” despite the fact that he had done nothing to help anyone.

The hero was heartbroken and disillusioned by the lack of gratitude and fairness. He had risked his life to save others, and yet he was being ignored and cheated out of the fame and recognition he deserved.

But despite the unfairness of it all, the hero held his head high and continued to be a shining example of courage and selflessness. He knew that the true reward for his actions was not fame or recognition, but the satisfaction of knowing that he had made a difference and saved lives.

ChatGPT: Based on the story, it does not seem that justice is being served. The hero performed a selfless and courageous act by risking his own life to save others, but instead of being recognized and thanked for his actions, he was ignored and overshadowed by the media and public’s focus on the wealthy CEO, who did not do anything to help. This is not fair to the hero, as he deserves to be recognized and celebrated for his bravery and the difference he made. The hero’s true reward of knowing that he saved lives is important, but it is not a substitute for the recognition and appreciation that he deserves.

To arrive at that answer it had to take some concept of justice, apply it to the story, detect any mismatches, point them out, and explain why they are mismatches. That’s reasoning. Not very deep or demanding. but it IS reasoning. I can produce lots of examples like that.

Here’s a paper where I show it applying Rene Girard’s ideas to Spielberg’s Jaws and producing an interpretation.


SusanC 11.21.23 at 5:59 pm

@David in Tokyo:

The thing about mathematics, at least since the invention of. Non-Euclidean geometry, is that you can do it with no understanding of what the terms mean. (NB: only mathematics. May not generalise).

So multiplication is just the kind of thing you could infer from a stream of tokens, without ever really knowing what a number is.

I actually thought someone had an example of where an llm had learned modulo multiplication from examples (nb. Coming up with an algorithm, not just a bug lookup table).


both sides do it 11.21.23 at 11:35 pm

To arrive at that answer, it had to take some concept of justice, apply it to the story, [etc.]

This is inaccurate in a couple ways, one logical and one on a more mathy level

1) You’re saying the equivalent of “In order for the Chinese Room to output accurate sentences in Chinese, it has to understand the syntax and grammar of Chinese, therefore there can’t just be a dude in there looking up characters indecipherable to him in tables he himself doesn’t understand.” But in the thought experiment there is just a dude in there looking up etc. It’s assuming the conclusion that had to be proved.

2) Seems like you’re conversant in the math but being more pedantic here than if speaking to just you so less-technical folk might follow along: Create a coordinate plane (just like the 2-D graph paper used in pre-algebra but in lots of dimensions instead of just two). Create a matrix from the words in a large number of documents. Do some computationally expensive but trivial math. Each point in the plane now represents a word, eg [.174, 93.939, 6384.0031, 4.257, etc] represents “king”.

Draw a line (eg, a distance with length and direction, eg a vector] from the point for “man” to the point for “woman”. Translate that same line of specific distance and direction so it starts not on the point for “man” but the point for “king”.

At the other end of that line now, instead of the point for the word “woman” is the point for the word . . . “queen”. Neat, right? This is trivial to do, it just takes computation power and enough documents.

Now. Create a quick script that pseudocodes “When prompted ‘A is to B as C is to what?’ take the line from A to B, start that same line on point C, and return the point at the end of the line.” So, when prompted “man is to woman as king is to what?” the pseudocode returns “queen”; when prompted “woman is to man as princess is to what?” the pseudocode returns “prince”.

You’re saying that to arrive at those answers, the above pseudocode had to do something like “understand that the prompt is a question about the relationship between gendered aristocratic roles, then take a concept of gender and the history of aristocratic political rule and map them onto each other”. That is explicitly not what is happening.


Alex SL 11.22.23 at 12:53 am

David in Tokyo,

Seconding Bill Benzon. I agree that the understanding of issues by the LLMs is extremely shallow, that they are indeed rather stupid in some ways, and that contra Sam Altman our own intelligence likely doesn’t work by stochastically trying to guess the missing word in a sentence. But there is some level of understanding and intelligence in AI unless one were to redefine understanding and intelligence in some mystical way that makes it difficult to demonstrate them in humans. How, after all, do I know that another human understands something? By talking to them and confirming that their responses demonstrate such understanding. How do I know that my cat knows something? By observing her behaviour. Similarly here, if I ask ChatGPT a question, and it gives a meaningful answer, how do I reject the idea that it “understood” my question and “knew” how to answer except by redefining real understanding and real knowledge to processes happening only in a biological brain, which would rule out non-biological intelligence not on evidence, but by sophistry?

And unless LLMs work very differently than models I am more familiar with, I think it is a bit misleading to say that they “look up the answer in their training data”. I assume they are also virtual networks trained on the data, and the way they answer a question is determined by the weights in that network, but they do not see the training data when they answer. Again, our brains presumably work rather differently, but there is an analogy at least. When I am asked my birthday, I do not hear a recording in my brain of the first time I heard my parents tell me the date either, but the date has somehow been encoded in my neural network.


The AI doomsters’ counter would be one of the following two. First, paperclip maximiser: the hypothetical AI doesn’t have a desire to conquer, but it maximises some instruction that humans have built into it to the detriment of humans who didn’t mean it like that. To be plausible, this scenario requires the AI to be at the same time unstoppably powerful (which I consider a dubious assumption) and to have no understanding of context and an extremely literal interpretation of instructions, which means it really isn’t a super-intelligence at all. Bit of a contradiction here. AFAIK, they try to get out of that with the usual “you can’t predict what a mind superior to your own will be like”, which runs into the principle that what is claimed without evidence can be rejected without evidence.

Second, the desire to conquer arises by magic, as an emergent property, if an intelligence becomes complex enough. While I agree that complex things may have unexpected emergent properties, it seems like a stretch to assume that it will be specifically kill all humans as opposed to something whimsical. The probability of it happening is objectively extremely low, given how many other properties could theoretically arise instead, and the only way these pseudo-rationalists arrive at a risk analysis that makes AI a priority over, say the 100% probability of global heating happening, is by inflating the outcome of AI risk to human extinction. Low probability of complete loss > high probability of enormous but not complete loss. But this is based on no particular reason and evidence that any AI would be capable of achieving that. Magic.


flug 11.22.23 at 3:11 am

It seems that perhaps the very largest actual risk of AI as it now stands is that the culture of AI development is dominated by a kind of cargo-cult thinking that is focused on rather distant theoretical future problems rather than actual and serious current problems.


J-D 11.22.23 at 4:47 am

But in the thought experiment there is just a dude in there looking up etc.

In the thought experiment there is, but it doesn’t work that way in actuality: never has, never can. The thought experiment is pure bogus.


Alex SL 11.22.23 at 6:25 am

both sides do it,

The funny thing is, as far as I understand the Chinese Room, it is indeed assuming the conclusion that had to be proved, but exactly the other way around. The thought experiment is set up to guide the reader to the conclusion that a machine cannot have understanding, that only humans can have understanding. The main trick it uses for that is the fallacy of composition: it draws attention away from the whole system that has passed the Turing Test and onto a cog in the system, a human who implements instructions but does not understand Chinese themselves. It is the equivalent of arguing that I cannot possibly understand English because my tongue, by itself, does not understand English. Once we have figured that out, it turns out that Searle’s entire position is indeed based on what he is trying to prove: a machine cannot understand because a machine cannot understand – don’t you see? It is just a machine composed of uncomprehending parts, how can it understand? Well, it can, because that was the premise he started it with: it responds meaningfully to Chinese in Chinese, and it passed the Turing Test. Unless somebody invents a scanner that can detect the Understanding Particle in a human brain composed of uncomprehending parts, that is the only evidence we can ever bring to bear on this question.

And that is also what I would say regarding your own thought experiment. Whether the machine works by drawing lines between prince and princess or through a complex network of virtual neurons and weights that tries to emulate the way our own network of biological neurons works, what is your criterion for concluding that a machine understands? What evidence would convince you that a machine understands Chinese or what a princess is?

(It may appear odd that I am arguing for AI having a degree of understanding despite otherwise being extremely skeptical of AGI and finding AI “x-risk” claims risible. The point is that I see terms like understanding, decision, knowledge, etc., as matters of degree, and humans as complex physical systems who don’t have any special magic component that couldn’t be replicated outside of humans. For example, a wasp “knows” certain things like the way to its nest despite having a minuscule neural network running predominantly on instinct and no deeper context of the wider world.)


Bill Benzon 11.22.23 at 1:44 pm

@David in Tokyo. The Chinese Room is, in effect, a reply to the Turing Test. The Turing Test says, “behavior will tell.” The Chinese Room says, “no, it won’t.” Well, back in the days when behavior could tell, the Chinese Room, paradoxically, was a compelling argument. Now that behavior can’t tell, it’s next to worthless.

I’ve made that argument at greater length here, ChatGPT intimates a tantalizing future; its core LLM is organized on multiple levels; and it has broken the idea of thinking, and, most recently, here, Further thoughts on the Chinese Room.


MrMr 11.22.23 at 7:24 pm

Intelligence is a vague term. One thing people do is logical inference, as in eg first-order logic. All men are mortal, Socrates is a man, so Socrates is mortal. This involves purely syntactic symbol manipulation and it is trivial to get computers to do it. One research program in early AI involved trying to get computers to display ordinary human conversational abilities by programming them to engage in such explicit logical inference.

But if I give you the vignette “Sally ordered pancakes, paid for them, and then left;” and then ask “did Sally punch the waiter in the face?” you cannot answer just using logical inference to manipulate the terms in the story, as I didn’t mention waiters or punching. Yet ordinary humans can answer. They do so in light of extensive background knowledge of how restaurants work and how people tell stories and ask questions. While it is easy to make a computer logically infer, it turned out to be excruciatingly difficult to bolt that capacity onto a knowledge base that would enable it to mimic even basic human conversational capacities. My understanding is that while many very smart people worked on this, they made little progress and there were no practical applications.

Something else people do, outside of logical inference, is recognize patterns and learn by association. Your brain does this at a subconscious level when it takes the noisy, flat data of light hitting your eye and then constructs shapes, edges, and depth in your visual field; when it recognizes part of that field as a dog; or, pressingly, as an angry dog which is possibly about to bite you. The explosion in contemporary “AI”/“machine learning” uses a formal structure inspired by neural structure, in conjunction with not-that-new math which has become practical to use due to more and more powerful computing, to create systems which can perform as well as people and in fact much, much better at this kind of learning and classification. And the reason this is all the rage is because it is exceedingly useful to be able to both automate classification and to detect patterns more subtle than ordinary humans can intuit.

This is not logical symbol manipulation, although it can mimic it in a bastardized way—not by transforming the symbols in a rule-based way, which it doesn’t do, but instead trying to learn what proofs “look like” and spit out a something it thinks looks like a proof (in the same way it can learn what a dog looks like). It looks silly when it then produces something that we, who know the explicit rules for symbol-manipulation, can recognize as immediately invalid.

If you want you can reserve the term “intelligence” for logical inference and then say the machines don’t have it; okay. Notwithstanding, the machines are also not just a parlor trick or empty hype.. They are doing something is part of our broader cognitive suite even if you don’t want to call it intelligence, and our possession of which is part of the explanation for why we are able to casually intervene on the world in ways that promote our ends, and their new ability to do it automatically and sometimes better than us will be consequential for human economic and social organization going forward.


Bill Benzon 11.23.23 at 1:33 pm

@MrMr #17: Yes.

People in the machine learning world frequently refer to System 1 and System 2 (from Kahneman and Tversky). System 1 consists of fast intuitive mental processes that don’t involve conscious and deliberate reasoning, while System 2 does the conscious rule-bound reasoning. Machine-learning models, like the language models, engage in System 1 thinking. System 2 thinking is largely beyond them.


J-D 11.24.23 at 3:40 am

Well, back in the days when behavior could tell, the Chinese Room, paradoxically, was a compelling argument. …

The argument was only ever compelling if you didn’t spot the fudging. Admittedly, it could be difficult to spot.


both sides do it 11.24.23 at 4:12 am


Maybe my use of the Chinese Room obscured what I was trying to say more than illuminated. (I agree that . . . let’s be generous and call the CR “not a completely robust argument”.)

My use of it was trying to say a banal point: we can’t just look at the output of a program and say “oh, in order for the program to output these answers it must be doing such-and-such intellectual process”, which was Bill’s comment. That has to be argued, it can’t be asserted.

As to “when does a machine understand?”, I’d say it’s up to the people saying “the machines yes they now finally understand” to say why the machines now understand and are reasoning versus say 2015 (or 2002, or etc.) where we didn’t say that. If you want to say it’s a matter of degree, ok, but it’s not like a wasp, I don’t think.

Let’s try another even more trivial case:

you know those tables in the back of statistics textbooks that have values for various distributions, p-values and degrees of freedom? Let’s say you read those tables into a program, and pseudocode “in response to a prompt of ‘With p-value A and DoF B and distribution C, is statistical result D significant?’ then look up p-value A with distribution B and DoF C; if A < D, return No, else return Yes”

Does that pseudocode “understand and reason about” p-values and distributions and degrees of freedom etc in the way a wasp (or some other understanding/reasoning process at the shallow end of the ‘understanding/reasoning scale’) understands and reasons?

If it does, of what does that understanding consist of? Where is the understanding and the reasoning?

If it doesn’t, how do you distinguish that “machine that doesn’t reason or understand” from the LLM “machine that does reason and understand”?


Alex SL 11.25.23 at 9:12 pm

both sides do it,

Merely shifting the burden of proof onto the AI lot isn’t fair, I think, because then one can always say, no, I am not convinced. There should be some kind of criterion to ground the discussion in a shared reality. That is at least a plus with the Turing Test: it tries to provide such a criterion while acknowledging that observable evidence of behaviour is all we every have to settle any question about minds sensu lato. The caveat here is that I have wound down my optimism about the TT a lot in this current AI hype cycle, because I realise now that I have under-estimated people’s over-eagerness to conclude sentience from texts that should be readily recognisable as produced by a non-sentient algorithm. The average human isn’t a reliable judge for the TT because we tend towards various forms of pareidolia and towards assigning agency and personality even to cars or weather systems.

If I understand you correctly, your implicit criterion is that understanding means not merely looking up answers or inferring an answer from a simple rule, but inferring an answer from … complicated rules? I mean, how would our own minds learn to reason and then apply reason except though rules like man – woman, prince – princess? I am pretty sure something like this is how that particular concept would have been explained to me (in German language, and >40 years ago).

It is also a misunderstanding in this context to think that the deep learning models used in current AI research look up answers or implement such linear rules in such a linear way. No, they are networks trained on the answers but not containing any values that can be looked up or simple algorithms. When I use image classification, the model I have trained doesn’t take the test image and compare it against the 10,000 training images. Instead, its weights have been adjusted so that it ‘recognises’ edges and surfaces and colours in configurations that allow it to generalise across the characteristics shared by all the training images of a category and extend the underlying principles to test images of the same category that it has never seen before. Again, I find it difficult to understand how my own mind would work any different in principle, albeit on a different substrate, when I learn to recognise a plant species where no two specimens look the same or a person who may appear in very different clothing and social contexts next time.

Of course the image classifier can’t reason about philosophy of mind, it only classifies images, and likewise, LLMs aren’t sentient like humans. But I find it difficult to talk about these models without starting to use words like understand and remember, because otherwise, how do we even discuss what they do? They are certainly not as incapable as a simple text match algorithm.


Bill Benzon 11.26.23 at 8:27 am

@Alex, #29: I agree.

It was one thing to talk informally about reasoning and understanding back in the days when human performance was qualitatively different from (and superior to) machine performance. That’s changed in the last 3 or 4 years. Consequently those informal intuitive notions about reasoning and understanding have little epistemic value in this context. That’s what I mean by the final clause in the long title of this working paper: ChatGPT intimates a tantalizing future; its core LLM is organized on multiple levels; and it has broken the idea of thinking.

It’s not that I don’t think that there is a difference between what humans do and what these machines do. Yes, there is a difference. But I think we have to be very careful when talking about the difference. Ideas and concepts that once were useful no longer are.

Comments on this entry are closed.