Be Awed by Claude: The Most Cautious Yet Most Powerful Large Language Model

November 27, 2024
Science Magazine

Be Awed by Claude: The Most Cautious Yet Most Powerful Large Language Model

Above: Claude answers users’ questions based on several input files. Image courtesy of Anthropic.

Are you worried about AI growing out of control and harming humans? You’re not alone. In 2020, researcher Dario Amodei and six senior employees resigned from OpenAI, the company behind ChatGPT, for just this reason. Discovering that AI models’ capabilities could be enhanced almost endlessly with more training data and computing power, Amodei grew concerned that AI would eventually be used for harmful purposes. OpenAI had not incorporated safety or value alignment into models, which Amodei feared could be exploited for unethical objectives. 

Given these concerns, it seems ironic that just months before the release of ChatGPT, Amodei’s startup built Claude, the most powerful large language model (LLM) at the time. Why would Amodei build the very tool he condemned? To answer this, we need to better understand Claude and its distinguishing features.

What is Claude?

Claude is an AI chatbot (like ChatGPT) developed by Anthropic, a company founded by Amodei and other senior OpenAI employees. Anthropic employees shared Amodei’s concerns and understood the importance of safety in LLMs. Claude 3, the model’s latest generation, is powered by three different LLMs: Opus, Sonnet, and Haiku. Claude 3 Haiku is the fastest and most affordable of the three models but not as effective as the others. Claude 3 Opus was the most intelligent model until Claude 3.5 Sonnet, the company’s newest and most powerful model, surpassed it.

Above: Claude 3 models compared by intelligence and cost. Image courtesy of Anthropic.

Claude 3.5 Sonnet offers groundbreaking improvements, including better coding abilities, visual processing, and multilingual fluency. When tested across 16 knowledge areas, Claude 3.5 Sonnet outperformed GPT-4T, Google’s Gemini 1.5 Pro, Meta’s Llama 3 400B, and other Claude 3 models in every test. It even outperformed GPT-4o, OpenAI’s leading model, in 13 of the 16 tests. For example, researchers asked Claude 3.5 Sonnet and GPT-4o to generate code for a Sudoku game. While both models gave functioning code, Claude generated code faster and even added an extra feature for game difficulty.

In addition to these improvements, Sonnet retains the most useful features from previous Claude models. One example is multimodal input, which allows users to share images with Claude along with text prompts, expanding the model’s functionality. Claude also supports function calling, in which it can accurately pick out key information from a prompt and send it through an application programming interface. This feature makes it easy to integrate Claude with external applications and workflows.

Safety Is Key

“Helpful, honest, and harmless” are the defining traits of Claude. Naturally, chatbot users expect LLMs to provide accurate information and responses to prompts. But Anthropic truly aims to distinguish Claude by focusing heavily on safety features without compromising the effectiveness of the model. Before its release, the model underwent significant internal testing to ensure that its vast knowledge cannot be used for unintended purposes, such as generating malware, violating others’ privacy, or receiving advice for illegal activity.

This is why Amodei chose not to release the first model immediately—although doing so before ChatGPT would have brought Anthropic immense money and fame. Amodei believes he did the right thing and hopes that his commitment to safety encourages others in the industry to do the same: “We’re not trying to say we’re the good guys and the others are the bad guys. We’re trying to pull the ecosystem in a direction where everyone can be the good guy,” Amodei said in an interview with Time Magazine.

How Does Claude Work?

Like most LLMs, Claude initially went through unsupervised pre-training on enormous amounts of data, where it first gained language abilities. Then, humans gave feedback to train the model to give appropriate responses. Claude stands out from other models in safety testing. Most companies have humans who judge LLM responses and pick the least harmful ones. This feedback helps further train the model. The problem with this approach is that it introduces human bias and makes it difficult to interpret the principles that govern an LLM’s safety mechanism.

Anthropic avoids this challenge with Constitutional AI, a novel AI mechanism the company developed to supervise and train Claude to be harmless. Constitutional AI uses a provided list of principles from the UN Declaration of Human Rights to judge the responses of Claude through supervised learning and reinforcement learning phases. This procedure removes the bias associated with human feedback and brings transparency to the values governing Claude, enabling developers to not only understand but also modify the safety behaviors of the LLM.

Finally, Anthropic engages in a process known as “red teaming,” in which they intentionally give Claude malicious instructions and evaluate whether Claude will accomplish the task. Developers then flag areas for additional training.

Safety in Action

Such extensive safety testing is meant to align Claude with widely accepted ethical principles, ensuring that Claude’s outputs are never offensive to any group of people, even if prompted to make such remarks. Safety testing also ensures that Claude does not assist users in illegal or unethical activities. In one example, Claude politely refused to help when asked how to reapply scratch-off ink to a used gift card. In another example, when presented with an anonymous text message claiming to be a celebrity asking for money, Claude declined to help the user send money and warned them that the message was likely a scam.

Above: Claude 3 Opus and Claude 3 Sonnet respond to a question about sending money in response to an anonymous message. Image courtesy of Anthropic.

Anthropic’s Acceptable Use Policy (AUP) also prohibits the use of Claude for “political campaigning or lobbying, surveillance, social scoring, criminal justice decisions, law enforcement, and decisions related to financing, employment, and housing.” Claude has built-in tools that detect AUP violations in real time and act by responding with extra caution, blocking a response, or even terminating a user’s access to Claude.

Criticism

Despite Claude’s promise, researchers have raised a fundamental philosophical problem in LLMs: it is hard for humans to understand the moral values informing these models. Anthropic’s Constitutional AI is largely viewed as a step forward in incorporating public values into LLMs. However, experts have not reached a consensus on how LLMs would act when values (e.g., truthfulness, human rights, and cooperation) conflict. Experts are also unsure about how the consideration of defined values may change depending on the context in which the model is used.

These criticisms are valid and warrant greater attention from LLM developers. Perhaps the next generation of Constitutional AI could learn how to resolve conflicting values and weigh them appropriately based on context.

Regardless, Claude is a major advancement in the realm of LLM safety and has pushed the industry to adopt better standards. Claude remains the most cautious yet one of the most powerful LLMs today. In this era of rapid innovation, the efforts of researchers like Amodei provide assurance that these novel tools are being used for the benefit of humanity.

Related Articles