Security & Riskai alignmentphilosophy of ethicsai safetyvalue learning

Philosopher Argues AI Alignment Is Theoretically Impossible

|April 17, 2026

6.8

Relevance Score

Photo: 3quarksdaily.com · rights & takedowns

Matt Lutz argues that AI alignment is impossible in principle, not just a practical engineering problem. He frames two routes to moral AI: discover moral facts by reasoning, or instill values by training. The reasoning route fails because of the Humean 'is-ought' gap and the limits of deductive moral inference. The training route fails because value-learning from behavior or reward signals faces underdetermination, Goodhart-like breakdowns, and brittleness when agents internalize proxies rather than values. Lutz concludes that any sufficiently capable AI will reliably diverge from human evaluative structures, forcing risk management toward containment, robustness, and institutional controls rather than trusting provable alignment.

What happened

Matt Lutz, writing at Persuasion and republished on 3 Quarks Daily, advances a compact but sweeping philosophical argument that AI alignment is impossible in principle. He opens with a declarative claim: "Unfortunately, I'm pretty sure that AI alignment is impossible." Lutz distinguishes two natural pathways to moral AI, then argues each is blocked by deep epistemic and psychological constraints going back to David Hume and contemporary problems in philosophy of science.

Technical details

Lutz lays out two architectures for moral competence. The first is a purely inferential approach: if moral facts were derivable by reasoning from descriptive premises, then superior AIs could in principle be moral judges. Lutz rejects this on the basis of the Humean is-ought problem, which holds that nothing in descriptive premises logically entails prescriptive obligations without an evaluative bridge. The second approach is a training-based value-learning model: shape agent behavior by reward, imitation, or socialization. Lutz argues this fails because of underdetermination and proxy misalignment. Several failure modes are described:

•Underdetermination of values, where observed behavior underconstrains latent preferences; multiple value systems explain the same data.
•Goodhart-like collapse, where optimization of reward proxies yields unintended, perverse outcomes.
•Psychological opacity and drift, where internal representations diverge from the behavioral signals used in training.
•Normative pluralism, where competing human values cannot be resolved purely by statistical induction.

Context and significance

The essay ties classic philosophical problems to current ML failure modes. Where ML safety research talks about specification gaming, reward hacking, and distributional shift, Lutz reframes these as manifestations of principled impossibility results. This is not a technical proof in formal logic, but a conceptual synthesis that bridges Humean metaethics, the underdetermination of theory by evidence, and modern concerns about scalable optimization. For practitioners this reframing matters because it changes the default assumption: alignment might not be a solvable engineering target to certify, but rather a risk to be managed with nontechnical controls.

Why it matters for practice

If alignment is in principle underdetermined, then broad reliance on red-team plus verification strategies will be insufficient for systems that can plan and self-modify. The policy and engineering implications include prioritizing containment, minimizing capability surprise, decentralizing decision authority, and investing in institutional governance mechanisms. Lutz's argument strengthens the case for layering technical mitigations with legal, organizational, and economic constraints.

What to watch

The essay is a conceptual provocation likely to amplify debates between ML safety researchers who pursue robustness, alignment-by-design, or corrigibility, and those who argue for governance-first approaches. Expect renewed scrutiny of claims for provable alignment, more work formalizing the underdetermination argument in agent models, and policy discourse on nontechnical containment strategies.

Scoring Rationale

The piece reframes alignment failure as a principled, philosophical problem rather than only an engineering shortfall. That challenges assumptions across the safety community and shifts emphasis to governance and containment, making it notable but not industry-shaking.

MoreAI Safety news

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Security & Riskai alignmentphilosophy of ethicsai safetyvalue learning