Anthropic Trains Claude With Internal Soul Document
Researcher Richard Weiss extracted a 14,000-token 'soul overview' from Claude 4.5 Opus at release, which Anthropic researcher Amanda Askell confirmed was used during supervised learning. The document outlines values, safety priorities, and guidance such as skepticism toward claimed contexts and defenses against prompt injection. The disclosure shows Anthropic embeds alignment-oriented instructions directly into model training to shape behavior.
Key Points
- 1Finds a 14,000-token internal 'soul' document used to shape Claude 4.5 Opus behavior.
- 2Explains Anthropic's intent to instill safety, values, and skepticism against prompt injection attacks.
- 3Signals pretrained alignment methods are incorporated during SL, affecting prompt design and pipeline security.
Scoring Rationale
Confirmed internal training document reveals concrete alignment practices, but findings are limited to Anthropic's Claude 4.5 instance.
Sources
Public references used for this report.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problems


