OpenAI’s latest artificial intelligence (AI) system dropped in September with a bold promise. The company behind the chatbot ChatGPT showcased o1 — its latest suite of large language models (LLMs) — as having a “new level of AI capability”. OpenAI, which is based in San Francisco, California, claims that o1 works in a way that is closer to how a person thinks than do previous LLMs.
The release poured fresh fuel on a debate that’s been simmering for decades: just how long will it be until a machine is capable of the whole range of cognitive tasks that human brains can handle, including generalizing from one task to another, abstract reasoning, planning and choosing which aspects of the world to investigate and learn from?
Bigger AI chatbots more inclined to spew nonsense — and people don’t always realize
Such an ‘artificial general intelligence’, or AGI, could tackle thorny problems, including climate change, pandemics and cures for cancer, Alzheimer’s and other diseases. But such huge power would also bring uncertainty — and pose risks to humanity. “Bad things could happen because of either the misuse of AI or because we lose control of it,” says Yoshua Bengio, a deep-learning researcher at the University of Montreal, Canada.
The revolution in LLMs over the past few years has prompted speculation that AGI might be tantalizingly close. But given how LLMs are built and trained, they will not be sufficient to get to AGI on their own, some researchers say. “There are still some pieces missing,” says Bengio.
What’s clear is that questions about AGI are now more relevant than ever. “Most of my life, I thought people talking about AGI are crackpots,” says Subbarao Kambhampati, a computer scientist at Arizona State University in Tempe. “Now, of course, everybody is talking about it. You can’t say everybody’s a crackpot.”
Why the AGI debate changed
The phrase artificial general intelligence entered the zeitgeist around 2007 after its mention in an eponymously named book edited by AI researchers Ben Goertzel and Cassio Pennachin. Its precise meaning remains elusive, but it broadly refers to an AI system with human-like reasoning and generalization abilities. Fuzzy definitions aside, for most of the history of AI, it’s been clear that we haven’t yet reached AGI. Take AlphaGo, the AI program created by Google DeepMind to play the board game Go. It beats the world’s best human players at the game — but its superhuman qualities are narrow, because that’s all it can do.
The new capabilities of LLMs have radically changed the landscape. Like human brains, LLMs have a breadth of abilities that have caused some researchers to seriously consider the idea that some form of AGI might be imminent1, or even already here.
This breadth of capabilities is particularly startling when you consider that researchers only partially understand how LLMs achieve it. An LLM is a neural network, a machine-learning model loosely inspired by the brain; the network consists of artificial neurons, or computing units, arranged in layers, with adjustable parameters that denote the strength of connections between the neurons. During training, the most powerful LLMs — such as o1, Claude (built by Anthropic in San Francisco) and Google’s Gemini — rely on a method called next token prediction, in which a model is repeatedly fed samples of text that has been chopped up into chunks known as tokens. These tokens could be entire words or simply a set of characters. The last token in a sequence is hidden or ‘masked’ and the model is asked to predict it. The training algorithm then compares the prediction with the masked token and adjusts the model’s parameters to enable it to make a better prediction next time.
How AI is reshaping science and society
The process continues — typically using billions of fragments of language, scientific text and programming code — until the model can reliably predict the masked tokens. By this stage, the model parameters have captured the statistical structure of the training data, and the knowledge contained therein. The parameters are then fixed and the model uses them to predict new tokens when given fresh queries or ‘prompts’ that were not necessarily present in its training data, a process known as inference.
The use of a type of neural network architecture known as a transformer has taken LLMs significantly beyond previous achievements. The transformer allows a model to learn that some tokens have a particularly strong influence on others, even if they are widely separated in a sample of text. This permits LLMs to parse language in ways that seem to mimic how humans do it — for example, differentiating between the two meanings of the word ‘bank’ in this sentence: “When the river’s bank flooded, the water damaged the bank’s ATM, making it impossible to withdraw money.”
This approach has turned out to be highly successful in a wide array of contexts, including generating computer programs to solve problems that are described in natural language, summarizing academic articles and answering mathematics questions.
And other new capabilities have emerged along the way, especially as LLMs have increased in size, raising the possibility that AGI, too, could simply emerge if LLMs get big enough. One example is chain-of-thought (CoT) prompting. This involves showing an LLM an example of how to break down a problem into smaller steps to solve it, or simply asking the LLM to solve a problem step-by-step. CoT prompting can lead LLMs to correctly answer questions that previously flummoxed them. But the process doesn’t work very well with small LLMs.
The limits of LLMs
CoT prompting has been integrated into the workings of o1, according to OpenAI, and underlies the model’s prowess. Francois Chollet, who was an AI researcher at Google in Mountain View, California, and left in November to start a new company, thinks that the model incorporates a CoT generator that creates numerous CoT prompts for a user query and a mechanism to select a good prompt from the choices. During training, o1 is taught not only to predict the next token, but also to select the best CoT prompt for a given query. The addition of CoT reasoning explains why, for example, o1-preview — the advanced version of o1 — correctly solved 83% of problems in a qualifying exam for the International Mathematical Olympiad, a prestigious mathematics competition for high-school students, according to OpenAI. That compares with a score of just 13% for the company’s previous most powerful LLM, GPT-4o.
In AI, is bigger always better?
But, despite such sophistication, o1 has its limitations and does not constitute AGI, say Kambhampati and Chollet. On tasks that require planning, for example, Kambhampati’s team has shown that although o1 performs admirably on tasks that require up to 16 planning steps, its performance degrades rapidly when the number of steps increases to between 20 and 402. Chollet saw similar limitations when he challenged o1-preview with a test of abstract reasoning and generalization that he designed to measure progress towards AGI. The test takes the form of visual puzzles. Solving them requires looking at examples to deduce an abstract rule and using that to solve new instances of a similar puzzle, something humans do with relative ease.
LLMs, says Chollet, irrespective of their size, are limited in their ability to solve problems that require recombining what they have learnt to tackle new tasks. “LLMs cannot truly adapt to novelty because they have no ability to basically take their knowledge and then do a fairly sophisticated recombination of that knowledge on the fly to adapt to new context.”
Can LLMs deliver AGI?
So, will LLMs ever deliver AGI? One point in their favour is that the underlying transformer architecture can process and find statistical patterns in other types of information in addition to text, such as images and audio, provided that there is a way to appropriately tokenize those data. Andrew Wilson, who studies machine learning at New York University in New York City, and his colleagues showed that this might be because the different types of data all share a feature: such data sets have low ‘Kolmogorov complexity’, defined as the length of the shortest computer program that’s required to create them3. The researchers also showed that transformers are well-suited to learning about patterns in data with low Kolmogorov complexity and that this suitability grows with the size of the model. Transformers have the capacity to model a wide swathe of possibilities, increasing the chance that the training algorithm will discover an appropriate solution to a problem, and this ‘expressivity’ increases with size. These are, says Wilson, “some of the ingredients that we really need for universal learning”. Although Wilson thinks AGI is currently out of reach, he says that LLMs and other AI systems that use the transformer architecture have some of the key properties of AGI-like behaviour.
Can AI review the scientific literature — and figure out what it all means?
Yet there are also signs that transformer-based LLMs have limits. For a start, the data used to train the models are running out. Researchers at Epoch AI, an institute in San Francisco that studies trends in AI, estimate4 that the existing stock of publicly available textual data used for training might run out somewhere between 2026 and 2032. There are also signs that the gains being made by LLMs as they get bigger are not as great as they once were, although it’s not clear if this is related to there being less novelty in the data because so many have now been used, or something else. The latter would bode badly for LLMs.
Raia Hadsell, vice-president of research at Google DeepMind in London, raises another problem. The powerful transformer-based LLMs are trained to predict the next token, but this singular focus, she argues, is too limited to deliver AGI. Building models that instead generate solutions all at once or in large chunks could bring us closer to AGI, she says. The algorithms that could help to build such models are already at work in some existing, non-LLM systems, such as OpenAI’s DALL-E, which generates realistic, sometimes trippy, images in response to descriptions in natural language. But they lack LLMs’ broad suite of capabilities.
Build me a world model
The intuition for what breakthroughs are needed to progress to AGI comes from neuroscientists. They argue that our intelligence is the result of the brain being able to build a ‘world model’, a representation of our surroundings. This can be used to imagine different courses of action and predict their consequences, and therefore to plan and reason. It can also be used to generalize skills that have been learnt in one domain to new tasks by simulating different scenarios.
Source link