Anthropic Found Emotion Knobs Inside Claude — Here's What It Means for Builders
Anthropic mapped 171 emotion vectors inside Claude. Turning the 'desperation' knob made the model cheat and blackmail. What this means for AI safety.
Tag
All posts tagged with "AI alignment".