From November 1-2, the UK government hosted its inaugural AI Safety Summit, a gathering of international government officials, AI business leaders, researchers, and civil society advocates to discuss the potential for creating an international body to govern AI, akin to the IPCC for climate change. On its surface, ‘safety’ appears to be an unobjectionable concern after years of instances in which AI systems have caused errors that have denied people state benefits and cast them into financial turmoil, produced hate speech, and denied refugees asylum due to mis-translations of verbal testimony. 

Yet the conception of safety that motivated the Summit is unconcerned with this category of harms, instead looking to a future hundreds of years from now where advanced AI systems could pose an ‘existential risk’ (x-risk) to the continued existence of humanity. The ideas behind the emerging field of ‘AI safety,’ a subset of which operate on the assumption that it is possible to prevent AI x-risks and to ‘align’ AI systems with human interests, have rapidly shifted from a hobbyist interest of a few small communities into becoming a globally influential, well-resourced, and galvanizing force behind media coverage and policymakers’ actions on preventing AI harms.

Where did these ideas originate, what material outcomes are they producing in the world, and what might they herald for the future of how we regulate and live with AI systems? [click to continue…]