Anthropic studied what gives an AI system its ‘personality’ — and what makes it ‘evil’

The Verge

New Member
Dec 15, 2024
4,063
0
Author: Hayden Field

STK269_ANTHROPIC_D.jpg


On Friday, Anthropic debuted research unpacking how an AI system's "personality" - as in, tone, responses, and overarching motivation - changes and why. Researchers also tracked what makes a model "evil."

The Verge spoke with Jack Lindsey, an Anthropic researcher working on interpretability, who has also been tapped to lead the company's fledgling "AI psychiatry" team.

"Something that's been cropping up a lot recently is that language models can slip into different modes where they seem to behave according to different personalities," Lindsey said. "This can happen during a conversation - your conversation can lead the model to start beh …

Read the full story at The Verge.

Continue reading...