Anthropic studied what gives an AI system its ‘personality’ — and what makes it ‘evil’

The Verge · Aug 1, 2025

Author: Hayden Field

On Friday, Anthropic debuted research unpacking how an AI system's "personality" - as in, tone, responses, and overarching motivation - changes and why. Researchers also tracked what makes a model "evil."

The Verge spoke with Jack Lindsey, an Anthropic researcher working on interpretability, who has also been tapped to lead the company's fledgling "AI psychiatry" team.

"Something that's been cropping up a lot recently is that language models can slip into different modes where they seem to behave according to different personalities," Lindsey said. "This can happen during a conversation - your conversation can lead the model to start beh …

Read the full story at The Verge.

Continue reading...

Search

Search

Anthropic studied what gives an AI system its ‘personality’ — and what makes it ‘evil’

The Verge

New Member