On Friday, OpenAI, the Microsoft-funded operator of ChatGPT, fired its CEO, Sam Altman. Then, after five days of popcorn-emoji chaos, they hired him back. The sudden move, which billion-dollar investor Microsoft only learned about moments before it was released to the public, seems to have come from a fight between Altman and engineer Ilya Sutskever, who is in charge of “alignment” at the company. Sutskever’s faction, including board member Helen Toner, whose feud with Altman may have precipitated these events, is out. Larry Summers, the former treasury secretary and Harvard president who doubted that women are good at science, is in. Altman’s return means that, in a fight about profit versus safety, profit won.
Or maybe safety did? OpenAI is a weird company, and their renewed charter reemphasizes their original goal: to save humanity from the very technology they are inventing.
Both sides in this fight think artificial general intelligence (“AGI,” or human-level intelligence) is close. Altman said, the day before he was fired, that “four times” — one within the last few weeks — he had seen OpenAI scientists push “the veil of ignorance back and the frontier of discovery forward.” Sutskever worries about AI agents forming megacorporations with unprecedented power, leads employees in the chant “Feel the AGI! Feel the AGI!,” and reportedly burned an effigy of of an “unaligned” AGI to “symbolize OpenAI’s commitment to its founding principles.” Toner hails from Georgetown University by way of University of Oxford’s Future of Humanity Institute, a leading research institute for the perpetuation of pseudoscience fiction ideology run by the philosopher Nick Bostrom.
The question, in this atmosphere, is not if machines are intelligent, but instead whether to accelerate development and distribution of this potential AGI — Altman’s position — or to pump the brakes, hard: Sutskever’s apparent desire.
This debate has now broken out into the open and highlighted the conflict between so-called artificial intelligence (AI) doomers and accelerationists. The doomer question is what the probability of extinction is, your assessment of “p(doom).” Economist Tyler Cowen has pointed out that doomers don’t back up their belief in the AI takeover with actual bets on this outcome, but if tens of billions of dollars hang on this type of fight, it’s hard to see it as unimportant. The goal that emerges from this cocktail of science and religious belief in AGI is to “align” machine intelligence with human values, so that, if it gains sentience, it cannot harm us.
“Alignment,” author Brian Christian tells us, was borrowed by computer science from 1980s “management science” discourse, where providing incentives to create a “value-aligned” corporation was all the rage. Economists have pointed out that “direct alignment” with a single institution is radically different from “social alignment,” which is what OpenAI is focused on. Sutskever’s group there calls their project “superalignment,” pumping the rhetorical stakes even higher. But this is really just vapor, and it betrays a shocking misunderstanding of the very technology these business leaders and engineers are hawking.
Karl Marx said that capitalism seemed straightforward but actually harbored “metaphysical subtleties and theological niceties.” There’s nothing subtle or nice about what’s happening in AI enterprise, though, and we’re not doing a great job of countering it with critique. The events at OpenAI this week are a great example of what I think of as “metaphysics in the C-suite”: an unhinged, reality-free debate driving decisions with sky-high market caps and real, dangerous potential consequences.
The alignment concept is a house of cards that immediately falls apart when its assumptions are revealed. This is because every attempt to frame alignment relies on a background conception of language or knowledge that is “value neutral,” but never makes this fully explicit. One suspects this is because value neutrality, and thus “alignment” itself, has no real definition. Whether you think the good thing is unbiased machines or fending off a machine that learns to kill us, you’re basically missing the fact that AI is already a reflection of actual human values. The fact that that’s not good or neutral needs to be taken far more seriously.
There is a whole industry devoted to AI safety, and much of it is not about metaphysics. It’s not that nothing is wrong. We all read daily about the many, terrifying ills of our automated systems. Curbing actual harm is important, don’t get me wrong. It’s just not clear that “alignment” can help, because it’s not clear that it’s a concept at all.
The alignment debate didn’t begin with generative AI. When Google figured out how to make computers produce meaningful language, one of the first things the machine spit out was the idea that women should be homemakers. The scientists in the room at the time, Christian reports, said, “Hey, there’s something wrong here.” They were rightly horrified by this harmful idea, but they weren’t sure what to do. How could you get a computer to speak to you — something we now take for granted with the rise of ChatGPT — but also conform to values like equality? The goal of alignment is like Isaac Asimov’s famous law of robotics that prevents machines from harming humans. Bias, falsehood, deceit: these are the real harms that machines stand to do to humans today, so aligning AI seems like a pressing problem. But the truth is that AI is very much aligned with human values, we just can’t stand to admit it.
Bostrom, who heads the Future of Humanity Institute, has dominated a great deal of the alignment conversation with his “paper clip” thought experiment, in which an AI designed to maximize paper clip production realizes that the human ability to turn off the AI itself endangers its mission, seals itself off from human intervention, and — in some versions — exterminates the human race. This story has played a major role in AI ethics, influencing figures like Stuart Russell (one of the leading voices in AI ethics). Bostrom’s paper clips are also a major reason that the idea of AI as “existential risk” — the risk of human extinction, which Bostrom pushes in most of his writing — has come to national headlines. But the idea is pure nonsense, science fiction without any of the literary payoff or social insights of a futuristic novel. Worse, it is severely off the mark for the actual AI we are dealing with today. This type of thinking takes place entirely in a counterfactual mode, yet its basic framework informs most AI thinking today.
The other major force in the public AI discussion is not focused on doom, but instead on harm. Its leading idea is that “generative” AI is a “stochastic parrot,” which remixes human language and reproduces its biases. The massive datasets that these algorithms are trained on are “unfathomable,” making alignment an impossible goal. And yet these AI critics operate with an implicit notion of alignment too, simply denying it rather than promoting it. Linguist Emily Bender, for example, argues that the language datasets present an uneven and skewed picture of human language because they are scraped from places like Reddit, where male voices dominate. This definitely is a problem, but it’s not clear where an “unskewed” language would be located, or even that one exists. Solutions suggested include “value design” that tends to local and marginalized voices and contexts. But there is no “local” so small that it makes the inherently conflicted nature of value — the norms from which bias springs — disappear.
Add Reddit, hundreds of thousands of books, Wikipedia, and a ton more World Wide Web sites together, and you form a mental snapshot of the training data. But as the stochastic parrot paper makes clear, that snapshot is really “unfathomable,” far too much data to make any sense of mentally. There’s no quantitative measure of bias in language in the first place. Paradoxically, “lack of alignment” might be inextricable from language. If that’s the case, then AI is capturing cultural bias on an unprecedented scale. It’s just that seeing that bias laid out before us is ugly and disturbing and, as Bender rightly points out, amplifying it is bad.
The third major force in alignment is the conviction that the problems with generative AI specifically come from lack of “grounding.” Yann LeCun, widely seen as a “founding father” of the new AI, has been most vocal about this. He argues that large language models like ChatGPT and LLaMA will never tell the truth — a crucial capacity for any possible alignment — until they ground their “judgments” in perception.
The so-called “hallucination problem,” where AI makes up fictional legal precedents or gives perversely harmful answers to those seeking help, arises, LeCun argues, because language models don’t know about the world. Their language is not grounded in perception, which LeCun thinks is how humans get a sense for how things work. But the computer scientist Ellie Pavlick has responded by pointing out that “grounding” is not really so simple. Legal precedents are not grounded in visual perception, for example. No amount of common-sense knowledge of the world can add up to the complexity of human language. Chatbots didn’t invent misinformation or scams, after all: we did.
All this adds up to a clear conclusion: alignment is a poorly conceived problem. The value of discussions about AI alignment has largely been to show us what human language and culture are not. They are not “value-neutral,” they do not conform to any set of allegedly commonly held norms, and they are not based in scientific evidence or perception. There is no “neutral” standpoint from which to evaluate alignment, because the problem is indeed about values, which is stuff we fight over, where there’s no right answer.
A statistical center point for any given value, even if it could be achieved, is not a solution to any problem other than finding the statistical center point itself. The problem with even the best-intentioned critiques of these systems is that they assume an Archimedean standpoint — from which the “whole world could be moved,” as the philosopher said — about a vague object, something like “all of language” or “all of culture.” No one can adopt that standpoint, whether they believe in AI or want it abolished.
Talk of alignment always comes with a “we.” We want AI to give us equal outcomes, or unbiased speech, or true information. It’s deeply unclear that Altman and Sutskever represent any collective, democratic “we” in this sense. Yet it’s equally hard to see how exactly a democratic “we” can regulate this cultural behemoth to bring it into line. The balance between government and business hasn’t been working for decades anyway, though, and AI is benefitting from capital’s social dominance. Slurping up culture, science, and geopolitics was always the next step.
The problem with alignment is captured perfectly by what happened at OpenAI. It’s a pseudophilosophical concept that implies a perfect knowledge of human and social “values.” But in pragmatic terms, it’s a goal, not an idea. And that goal, even if it’s gift-wrapped in talk about safety driven by metaphysical delusions, is the commercialization of AI.
AI, if not its advocates, reflects a “we” back to us with great accuracy already. Those Google scientists might have had a different reaction to the misogyny of the algorithm. They might have said: wow, our collective language harbors misogyny! Let’s figure out what that means. Rather than moving to an ill-defined concept of “alignment,” maybe they — and we — should have realized that they had an unprecedented tool for understanding bias, culture, and language, in their hands. After all, a computer spitting out misogynistic sentences is only a problem if you are seeking to market it as a product.
Seen in this light, the alignment concept is a perfect illustration of ideology. Theodor W. Adorno thought of ideology as an actually false truth. That’s the kind of thing that alignment is: it’s true that AI can be and even is aligned with us, because of the way it’s trained. The very goal of giving AI a set of norms that somehow match ours is false, not just incorrect but also wrong. The “we” of AI is this contradiction and the world we have to live in as the result of it, not some utopian — or doomer — articulation of “values.”
The rational thing would be to take these bots offline and use them to study our prejudices, the makeup of our ideologies, and the way language works and interacts with computation. Don’t hold your breath.