AI Safety Debateâ
It was a real privilege to host, alongside two of the best societies at UCL â AI Society, and UCL Effective Altruism â our AI Safety Debate, on the topic of, âIs AI an existential risk?â
đŹ A full recording can be found here, if you want to watch the whole thing.
Why talk about AI safety?â
I believed it was important to host this debate because I think this question is, potentially, highly important, but also, one which I have deep uncertainties about. Many AI experts like Geoffrey Hinton think that AI should be considered just as risky as pandemics or nuclear war, and that we need to slow down, or pause its development. Others, like Melanie Mitchell, believe that the risks are âalmost vanishingly smallâ. The stakes of the question for humanity warrant a serious (and long, in my opinion) conversation about the respective argumentsâ merits.
After this prelude, my wonderful colleagues Ivana and Maja introduced the speakers. We were lucky enough to have Reuben Adams and Chris Watkins arguing for the âdoomerâ side (as we referred to it, in the WhatsApp group chat). Reuben is a UCL AI PhD student, and host of the wonderful âSteering AIâ Podcast. Chris is a professor at Royal Holloway, and a prominent thinker in the âReinforcement Learningâ field. For the ârisk-skeptical sideâ (the âanti-doomerâ side??), was Jack Stilgoe and Kenn Cukier. Jack is a UCL professor, on âhome-turfâ as he said, lecturing in Science and Technology Studies (STS), and works closely with UK Research and Innovation on the âResponsible AI Programâ. Kenn is a Deputy Executive Editor at The Economist magazine, and hosts the weekly tech podcast, âBabbageâ . Tom Ough, a freelance journalist, whoâs written various pieces about existential risk, including in Prospect Magazine, was moderating. Ivana encouraged our audience to consider how the lack of demographic diversity on the panel could systematically bias the conversation, which (as youâll read) came up in discussions.
At face value, the 4 speakers seemed to argue distinctly opposing points of view. I will briefly give my best effort at summarizing their views, in the order in which they spoke. After, I look to set out promising areas of agreement amongst all the panelists.
The debateâ
Reuben opened the debate. He argued that the new paradigm of deep-learning presents a distinctly new category of AI risk: we are building ever-more intelligent âblack-boxesâ, with novel capabilities we cannot predict. Once there is a âsecond species of intelligenceâ that rivals our own, we are completely ignorant about what will follow. Our current tools for controlling AI systems, like âRLHFâ, are woefully inadequate even at present, and wonât âscale upâ with increasing AI progress. What follows from all this? âI donât understand how you can confidently say that this doesnât end badly.â
After Reuben, came Jack. He eased his way into his argument with several cool anecdotes. (From Reubenâs speech, the day after Earnest Rutherford denied the feasibility of nuclear energy, in September 1933, Leo Szilard conceived of the nuclear chain reaction. Jack said that the concept came to Szilard by Russell Square Station. Go there if you want to conceive of the next big thing). Anyway. Back to the seriousness. Rogue AI scenarios are implausible and belong in science fiction. Instead, âthe idea of existential risk from AI is a form of displacement activityâ, from other more pressing concerns, like the disempowerment of workers or marginalization of minorities. These are the risks that deserve regulatorsâ attention. A more interesting question, for Jack, is why people are drawn towards believing these risks: perhaps peoplesâ positionality, or for some technologists their self-interest. AI is a tool like any other, in that itâs âall about powerâ, so âwe shouldnât be worrying about what robots will do to humanity, instead we should worry about what some people will do to other peopleâ.
After Jack, there was Chris. From his perspective, the algorithmic breakthroughs that enabled ChatGPT are pedestrian. His MSc students are already implementing the âtransformerâ architecture for their coursework, the major breakthrough behind large language models like ChatGPT. Given that tens of thousands of people and 11 figure sums are being directed towards AI, we have no reason to believe that further breakthroughs wonât occur. Instead, we should expect a future of open-ended cognitive advancement. This unknowable future is âbehind a veilâ. While we arenât necessarily destined for doom, there are plausible âside-roadsâ that lead towards it, in particular AI-enabled authoritarian regimes.
Finally, it was Kennâs turn. âThis is bleak!â, he started with. Whilst the risks from AI are serious, they wonât scale to an âexistential catastropheâ. Existing alignment techniques like RLHF put humans in the loop, and will obstruct any âintelligence explosionâ. Humans are unlikely to cede control of political power or nuclear missile systems to AI. Among different possible futures, we can design âloveâ into AI. In contrast, misuse risks do seem concerning. Kenn was anxious when news broke in 2022 that AI had developed 40,000 toxic chemicals in 6 hours. However, there is nuance here. The threat model of âmisuse risksâ from bad actors already exists today. Lethal Autonomous Weapons may make warfare less brutal. So, letâs not be defeatist, and instead focus on âexistential solutionsâ.
(Dis)agreements!â
A key disagreement for the participants seemed to be: Reuben and Chris seemed to acknowledge that exact pathways to catastrophe are unknowable â and would be analogous to bonobos trying to predict how they would be outcompeted by humans. Kenn, and particularly Jack, emphasized this point, and suggested that the ârogue AI storyâ parallels science fiction. Reuben/Chris seem to bite the bullet.
However, amongst these disagreements, there were several areas of agreement, which questions from our moderator, Tom, helped to elucidate:
- Proactive oversight/regulation of AI systems today is necessary to guard against present-day harms, like misinformation.
- Careful evaluations of AI models is an area of potential common ground between those concerned about ânear-termâ and âlong-term risksâ from AI.
- AI represents a new (potentially transformative) era for humanity
- Predicting exactly how the future will unfold is nigh on impossible; speculation about how exactly AI harms might scale to catastrophe or even extinction is very difficult to conceive precisely.
- AI is likely to be a âforce-multiplierâ and may enable bad actors to do worse things
On these points, and others, I think the speakers realized that their worldviews were closer than they might have expected.
I am very grateful to Ivana, Maja, Asmita for helping with the organizing of the event, and to Andrzej and Yadong for helping with the filming.