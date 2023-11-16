From Algorithmic Monoculture to Epistemic Monoculture? Understanding the Rise of AI Safety

From November 1-2, the UK government hosted its inaugural AI Safety Summit, a gathering of international government officials, AI business leaders, researchers, and civil society advocates to discuss the potential for creating an international body to govern AI, akin to the IPCC for climate change. On its surface, ‘safety’ appears to be an unobjectionable concern after years of instances in which AI systems have caused errors that have denied people state benefits and cast them into financial turmoil, produced hate speech, and denied refugees asylum due to mis-translations of verbal testimony.

Yet the conception of safety that motivated the Summit is unconcerned with this category of harms, instead looking to a future hundreds of years from now where advanced AI systems could pose an ‘existential risk’ (x-risk) to the continued existence of humanity. The ideas behind the emerging field of ‘AI safety,’ a subset of which operate on the assumption that it is possible to prevent AI x-risks and to ‘align’ AI systems with human interests, have rapidly shifted from a hobbyist interest of a few small communities into becoming a globally influential, well-resourced, and galvanizing force behind media coverage and policymakers’ actions on preventing AI harms.

Where did these ideas originate, what material outcomes are they producing in the world, and what might they herald for the future of how we regulate and live with AI systems?

In spring 2022, my colleagues at Princeton University and I began to track AI safety’s growing influence and sought to map its intellectual origins. How are the ideologies that underpin this new field moving people, money, research, community-building, and career advising—in sum, the activities that people within AI safety refer to as ‘field-building’—towards a utopic vision of living with AI?

We began to see a broad-strokes argument unfolding that as large AI systems such as large language models (LLMs) including ChatGPT scale up, they could develop advanced capabilities beyond those their original creators had anticipated, and cause widespread harm to humanity if left unchecked. Related to this fear of rogue AI systems, AI safety proponents worry about whether a human bad actor could amass vast quantities of computational power (“compute”) to build a bioweapon.

To understand the genesis of these fears, we read texts foundational to x-risk studies, and traced their relationship to both the concept of long-termism— concerns with the future of human existence hundreds of years into the future (and longer yet)— and the effective altruism movement. Effective altruists (EAs) are concerned with how to “do the most good” given finite resources. Drawing from utilitarian philosophy, they seek to optimize returns on expected value in a range of proposed interventions for cause areas including the prevention of pandemics, nuclear wars, and AI x-risks.

Early in our research, I attended a talk by Sam Bowman, an NYU professor of computer science who had taken leave to work at Anthropic, a company founded by former OpenAI staff with the purported aim of building safe AI systems. Bowman highlighted a point that later came to be central to our research: a variety of communities have sprung up around ideas such as EA, x-risk, and long-termism, and people within these communities are in the majority of those advocating for AI safety.

Yet a growing number of people coming to the field of AI safety have no affiliation with these ideas. The latter group nonetheless must pass through institutions that have been shaped by what my colleagues and I refer to as AI safety’s “epistemic culture” (from sociologist Karin Knorr-Cetina): the cultural practices of how knowledge is constituted and disseminated within the sub-communities that unite to work towards a shared idea of AI safety.

To better grasp what is included in this emerging field’s conception of “AI safety,” we started by identifying what research, community cultivation, fellowships , and institution-building in the field were being funded. Then we descriptively analyzed the picture that these funding stream’s outputs formed.

What emerged was a clear image of a tightly networked series of communities with at least four distinct features. The first is online community-building— both through EA web forums such as LessWrong and EA career advising hubs such as the nonprofit 80,000 Hours. Someone who may have read Oxford philosopher Nick Bostrom’s book Superintelligence, for instance, may first encounter the idea that future AI systems can one day ‘surpass’ humans at performing every task and attain artificial general intelligence (AGI).

Then, seeking a community of like-minded others debating the consequences of such a future, they could find a lively set of discussions about this topic on the Effective Altruism Forum, or the more recent Alignment Forum. Career advisories like 80,000 Hours create a pipeline where young people in these networks can find jobs in EA cause areas, for instance working on AI alignment at OpenAI.

The second feature is AI forecasting, which takes two forms. One involves hiring professional forecasters to cast and defend predictions about specific outcomes, such as the date by which a particular model may attain a specific benchmark— if a model outperforms the forecaster’s guess, some in the field see it as a sign that AI systems are developing at a clip that humans attempting to ensure safe deployment cannot keep up with (therefore, the logic goes, this would justify the need to invest in more AI safety research.)

The third feature constitutes this research itself, which focuses on issues such as monitoring emergent capabilities as AI systems scale up and develop unanticipated new functions, developing methods for alignment of AI systems with pre-defined human values to avert x-risk, defending the robustness of these systems against highly improbable but massively destructive events such as crashes of automated financial trading systems, and finally the catchall of “systemic safety,” referring to deployment of systems in context—including issues such as privacy, cybersecurity, and algorithmic bias.

Many of these ideas are disseminated via the fourth feature, prize competitions where entrants can submit code or papers in response to technical challenges in the field. Given the deep coffers of AI safety funders, and the urgency with which they believe AI safety must be addressed, prize competitions in the field carry massive prize pools ranging up to $1 million split across winning participants. While some of these competitions are hosted by AI safety nonprofits, we saw a trend toward competitions being embedded in academic computer science conferences as part of a broader effort to bring AI safety into mainstream academic computer science.

What happens when this field has influence beyond its epistemic community? Media coverage has tended toward replicating AI x-risk narratives, platforming the small handful of companies and figures at the center of this epistemic culture and positioning them as speaking for a larger constituency than they represent. For instance, companies such as OpenAI and DeepMind have long made it their mission to attain AGI— and OpenAI CEO Sam Altman not only presents this as possible, but as a path to a future utopia. The flipside of this premise is often presented within AI safety as one where we fail to rise to the challenge of ‘aligning’ AI with human interests.

Laying bare the ideological scaffolding of AI safety raises the question of whether these ideas will continue to be central to the field going forward, and how essential they are to pursuing the field’s definition of safety in the first place. Rishi Bommasani et al describe the algorithmic monoculture that arises when a small handful of companies like OpenAI or Anthropic produce ‘foundation models’ such as ChatGPT that a plurality of third party actors across the platform economy come to rely on, from Spotify to Zoom.

In parallel to these same companies that are the leading industry actors in the field, do the ideas underpinning AI safety have the potential to become an epistemic monoculture, crowding out other points of view? Furthermore, does buying into one part of this framing of AI safety essentially amount to being indefinitely locked in to a utilitarian, x-risk centered approach? If so, what might the consequences be?

Eight years ago, Oxford philosopher Amia Srinivasan wrote about the effective altruism movement as dodging specificity in favor of general approaches to mitigating x-risks. This is echoed in how organizations in the epistemic community are presumed, based on names like the Future of Life Institute or the Future of Humanity Institute, to see protection of the entirety of humankind as an achievable mission.

A recent coalition of Chinese and US organizations and individuals proposed in light of the UK’s AI Safety Summit that companies spend up to 30% of their research budgets on AI safety. If this happened, what opportunities might prioritizing this generalist approach foreclose? One possibility is that finding solutions to well-documented problems— discriminatory uses of AI tools in hiring, or the wide range of shortcomings in AI functionality — would, in this framework, become downgraded in urgency.

Srinivasan notes that part of the wide appeal of effective altruism is that it ultimately does not challenge, but rather reinforces, the status quo. As with other ideologies that have incubated in Silicon Valley, while many ideas coming out of the AI safety epistemic community can appear at first to run counter to the mainstream, they nonetheless embed neatly into what Henry Farrell and Abraham Newman term the ‘weaponized interdependence’ model of the United States.

For instance, leading figures in the AI safety community have long treated compute” resources as akin to a wartime resource that the United States government must monitor to prevent individual bad actors from amassing them to build bioweapons— a position that has become solidified in the mainstream through the recent White House Executive Order’s inclusion of references to governing foundation models using the Defense Production Act. While some commended the EO for acknowledging the need to protect workers and alleviate social harms of AI systems, the language regarding these known harms that stem from historical injustices was far vaguer, calling for “standards” and “guidelines” whose enforceability remains an open question.

Understanding the epistemic culture of AI safety can help us anticipate what industry leaders and other influential academic and philosophical voices in the epistemic community will advocate for in the future. In a recent interview with Foreign Policy magazine’s editor-in-chief, White House Office of Science and Technology Policy representative, sociologist, and leading architect behind the Blueprint for an AI Bill of Rights Alondra Nelson noted that the government has opened up the conversation about regulating AI to public debate, encouraging citizens to appeal to their representatives for advancing legislation.

What do people advocate for when a prevailing narrative is that AI may one day end humanity, and that we must put faith in a small group of technocrats who claim to speak for all of humanity to prevent that outcome? And how do we avoid a doubling down on the status quo of self-regulation and voluntary agreements that AI safety appears to be amenable to? As we head towards what many are hoping will be a banner year for more concrete regulation of AI, the AI safety epistemic community should be more receptive of external critiques, and aim to be accountable– both for the knowledge they produce that has gained major global influence in a very short period of time, and for the specific, contextual harms that arise in the AI systems they aim to improve.