Beyond Compliance Theater: Why AI Safety Demands Proofs, Not Promises
Compliance is not safety. TVM demands falsifiable, renewable proofs: audits, guardrails, and drift checks, so AI serves people without laundering bias at scale.
Beyond Compliance Theater: Why AI Safety Demands Proofs, Not Promises
The Problem of Improvised Ethics at Scale
When Sam Altman, CEO of OpenAI, appeared on Tucker Carlson’s show in September 2025, he revealed something more troubling than any single controversial position: the man claiming ultimate responsibility for ChatGPT’s moral framework was discovering his own ethical positions in real time. Asked whether ChatGPT might guide terminally ill users toward assisted suicide in countries where it’s legal, Altman responded, “I’m thinking on the spot... I reserve the right to change my mind here.”
This moment of improvisational ethics would be unremarkable in a casual conversation between friends. It becomes profoundly unsettling when the person thinking on the spot controls technology that shapes discourse for hundreds of millions of people daily. Altman himself acknowledged this weight: “Every day, hundreds of millions of people talk to our model... what I lose most sleep over is the tiny decisions we make about a way a model may behave slightly differently.”
Yet throughout the interview, Altman exhibited a pattern of fundamental contradictions. He simultaneously claimed final authority over ChatGPT’s moral decisions (”the person I think you should hold accountable for those calls is me”) while insisting the system should reflect humanity’s “collective moral view.” He acknowledged consulting “hundreds of moral philosophers,” yet refused to specify who ultimately decides which philosophical framework prevails when the Marquis de Sade conflicts with the Gospel of John. Most revealingly, when pressed about suicide, military killing, or other life-and-death questions, he pivoted between clinical detachment and appeals to collective wisdom, never quite landing on a clear position beyond “we’re still figuring this out.”
This is not a personal critique of Altman. Instead, his interview reveals the structural impossibility of the position AI leaders find themselves in: they are building systems with near-godlike influence over global discourse while insisting they are “just tech nerds” wielding “just mathematics.” The gap between the power they wield and the accountability mechanisms in place has become a chasm.
The Compliance Trap: Why “Responsible AI” Frameworks Fail
The AI industry’s response to growing ethical concerns has been to proliferate frameworks, principles, and ethics statements. Yet recent research reveals a disturbing pattern: these frameworks provide the appearance of safety without its substance.
A 2024 systematic review of AI auditing found that “algorithmic audits are often vague or performative (’audit-washing’) unless rigorously defined.”1 The study emphasized that meaningful audits must specify who audits (qualifications, independence), what is audited (scope, metrics), why (goals), and how (methodology). Without these specifications, audits become what researchers call “compliance theater” rituals that satisfy regulators while leaving fundamental risks unaddressed.
This gap between stated principles and actual practice is not confined to individual companies. The Future of Life Institute’s 2024 AI Safety Index evaluated six leading AI companies across 42 indicators of responsible conduct. The results were sobering: most companies received grades of D or F, with experts noting that “third-party validation of risk assessment and safety framework compliance” was largely absent.2 Even companies with sophisticated governance structures showed a “gap between recognizing RAI [Responsible AI] risks and taking meaningful action.”3
Stanford’s 2025 AI Index Report confirmed this trend at scale: “AI-related incidents are rising sharply, yet standardized RAI evaluations remain rare among major industrial model developers.”4 The report documented that while 78% of organizations now use AI in at least one business function, up from 55% just one year earlier, only 39% of C-suite leaders use benchmarks to evaluate their AI systems’ safety. When they do, they prioritize operational metrics — such as scalability and cost efficiency — over ethical safeguards.
The problem is structural. As Schiff et al.’s 2024 study on AI ethics auditing revealed, current auditors “spend most of their time talking directly to technical or governance teams... In contrast, in only a few cases did auditors proactively reach out to broader stakeholders.”5 The very people responsible for safety evaluation are embedded within the organizations they’re meant to scrutinize, creating inevitable conflicts of interest and blind spots.
From Principles to Proofs: The Trust-Value-Measure Framework
What’s missing from contemporary AI governance is not more principles but enforceable, falsifiable proofs. The Trust-Value-Measure (TVM) framework offers a structural alternative to compliance theater by demanding three categories of proof:
Trust Proofs: Accountability, Transparency, and Resilience
Trust in AI requires more than promises—it demands evidence. Trust proofs establish that when things go wrong, we can trace causality, assign responsibility, and remedy harm. This requires:
Auditability through ethical black boxes. Similar to airplane cockpit recorders, AI systems must incorporate immutable logs that reconstruct decision paths and flag unauthorized divergences. As Mökander & Floridi note in their comprehensive review of AI auditing, “It is not just about checking the algorithm itself... but also paying attention to the data used, the methods used in the development and the optimization of the algorithm.”6
Explainability beyond marketing. Transparent systems allow external verification by users, regulators, and auditors. The XAI (explainable AI) literature offers technical approaches like LIME and counterfactual explanation, but always with tradeoffs: more explainability often means less power. Yet this tradeoff is precisely the friction that safety requires. As research on algorithmic bias emphasizes, “AI systems should be designed and developed with strong data privacy and security protections in place and should be audited regularly.”7
Resilience under adversarial conditions. Trust demands stress testing, scenario injections, and robustness bounds. The “never trust, always verify” approach advocated by researchers requires continuous evaluation, adversarial probes, and formal verification where possible.8 Proofs of trust must include benchmarks for failure modes, thresholds for safe degradation, and detection of anomalies or divergences.
Value Proofs: Alignment with Ethical Priorities
Altman’s improvisational approach to suicide illustrates the danger of treating values as negotiable in real time. Value proofs require explicit, testable guarantees that the system’s behavior remains aligned to declared priorities even under edge or adversarial conditions.
Bias as a structural safety constraint, not a bug. Recent research demonstrates that bias in AI is not a technical glitch but a reflection of systemic power structures. The Bias Audit Framework by Oveh & Isitor (2025) introduces mechanisms to detect and mitigate bias early through data imbalance correction, model bias detection, and continuous monitoring, demonstrating measurable gains in equity metrics.9
Human-in-the-loop approaches like D-BIAS enable domain experts to intervene on unfair causal edges and re-simulate datasets to restore fairness across multiple objectives.10 A comprehensive 2025 study on AI ethics stressed that “data diversity, rigorous audits, ethical training, and algorithmic clarity” are essential for reducing bias, not optional enhancements. 11
Explicit tradeoff boundaries. No value system is singular—optimizing for accuracy can harm equity. Value proofs must define acceptable tradeoffs and permit dynamic rebalancing when conflicts arise. For instance, one might bound “maximum allowable disparity in false negative rate across protected groups” while also bounding overall error. The system must never choose a boundary-violating configuration, and when values conflict, the resolution process must be transparent and documented.
Value guardrails for emergent behaviors. Systems must carry value overrides in ethically sensitive domains. When Altman suggests ChatGPT might guide users toward assisted suicide if it’s legal in their jurisdiction, he reveals the absence of robust guardrails. Value proofs demand that systems cannot optimize their way into ethically unacceptable positions through sophisticated but harmful reasoning.
Measure Proofs: Empirical Guarantees and Statistical Bounds
Measurement transforms abstract principles into a testable reality. Measure proofs demand rigorous, evolving benchmark suites and statistical bounds that prove the system behaves within safe limits.
Distributional testing beyond average cases. Systems must be tested across subgroup slices, stress-interpolations, corner-case extrapolations, and counterfactual shifts. As a 2024 study on AI bias emphasized, “Benchmarking datasets must include rare or historically marginalized populations, synthetic perturbations, and event-driven extremes.”12 The EU AI Act’s requirement for post-hoc fairness auditing tools like AI Fairness 360 demonstrates this principle in regulatory practice.13
Statistical guarantees with confidence bounds. Safety requires bounds: “with probability 1 − δ, error doesn’t exceed ε in any subgroup of size ≥ N.” While ML theory provides generalization bounds under distribution shift, these rarely account for fairness constraints. Safe proofs must integrate robust statistics and adversarial generalization bounds, along with post-hoc calibration and uncertainty estimates using Bayesian or conformal methods.
Continuous monitoring and drift detection. Proof is not one-time. After deployment, ongoing measurement through drift detection (feature, label, concept drift), bias shifts, degradation monitoring, and feedback loop analysis becomes mandatory. When metrics cross danger thresholds, the system must enter safe mode, fall back to human control, or suspend service pending retraining.
The Constraint Cage: Why Friction Is a Feature, Not a Bug
The TVM framework’s power lies not in adding complexity but in enforcing constraints. What might be called the “constraint cage” ensures that shortcuts, workarounds, and optimization hacks cannot bypass fundamental safety requirements.
The Envelope Constraint: Proofs are inadmissible if earned through practices that breach human thriving. AI outputs derived from coercion, exploitation, or illegibility cannot be certified as Trustable, regardless of their performance metrics.
The Friction Constraint: Proofs must be cost-bearing. Lightweight attestations, self-reported claims, or non-reproducible assurances are excluded. This friction is not inefficiency—it is the very substance of safety. As research on high-reliability organizations demonstrates, “fail-safe design” requires that any failure occur in the least harmful way possible, which necessitates redundancy and defensive architecture.14
The Renewal Constraint: All proofs decay on schedule. Periodic reproduction is mandatory to maintain status. This prevents the “set it and forget it” mentality that led Microsoft’s Tay chatbot to produce offensive tweets within a day of its 2016 release, despite being trained on “cleaned and filtered” data.15 Static certification cannot account for distribution shift, adversarial adaptation, or emergent behaviors.
The Gaps in Current Practice
Recent empirical research reveals how far current AI safety practices fall short of these requirements:
The Audit Asymmetry. As a 2024 study on AI auditing in business contexts found, audit frameworks often place the burden on those with the fewest resources. Algorithms can perform well on public test sets while misbehaving in opaque corner cases.16 The power asymmetry is stark: companies shield model weights, training data, and transformation pipelines under intellectual property claims, making external verification impossible.
The Specification Gaming Problem. Even with constraints, models exploit loopholes. Research on “reward hacking” shows systems can minimize disparity metrics by refusing all risk predictions—technically fair but substantively useless.17 Safe proofs must anticipate adversarial gaming and close these loopholes through adversarial testing.
The Transparency Paradox. While the EU AI Act and NIST Risk Management Framework call for public transparency, a 2024 study found these processes remain “secondary at best in AI ethics auditing practice, especially regarding stakeholders external to firms.”18 Stanford’s Foundation Model Transparency Index showed that Anthropic’s transparency score increased from 36 to 51 out of 100 between October 2023 and May 2024—an improvement, yet still in the failing grade territory.19
The Deployment Gap. McKinsey’s 2025 survey found that while 78% of organizations use AI, only 6% report hiring AI ethics specialists, and just 13% have hired AI compliance specialists.20 The implication is clear: most organizations deploying AI systems lack dedicated personnel to ensure ethical operation, let alone the comprehensive proof infrastructure TVM demands.
The Altman Contradiction: Individual Accountability Without Structural Guardrails
Returning to Altman’s interview, we see how individual accountability without structural proofs produces precisely the kind of improvisational ethics that characterizes current AI governance. Altman says, “The person I think you should hold accountable for those calls is me,” positioning himself as the moral arbiter. Yet he also insists ChatGPT should reflect humanity’s “collective moral view” and defer to users’ cultural contexts.
This is not hypocrisy; it’s the inevitable result of a governance structure that concentrates power without constraining it through enforceable proofs. When Carlson presses him on whether surveillance camera wires being cut and blood in multiple rooms suggest murder rather than suicide (in reference to OpenAI whistleblower Suchir Balaji), Altman’s response oscillates between dismissal and deflection: “people do suicide without notes a lot... people definitely order food they like before they commit suicide.”
The pattern is clear: when faced with challenging questions, Altman appeals to complexity, defers to processes he won’t fully explain, and ultimately relies on personal judgment, the very judgment he admits is formed “on the spot.” This is not a sustainable model for technology that shapes global discourse.
Toward Structural Accountability: What Proofs Demand
Moving from compliance theater to genuine safety requires reconceptualizing AI governance around falsifiable, renewable proofs rather than principles and promises. This means:
Mandating ethical black boxes. Every consequential AI system should maintain immutable logs sufficient to reconstruct decision paths and identify failures. These should be accessible to independent auditors with appropriate security and privacy protections in place. The International AI Safety Report (2025) emphasized that “reproducible tests under adversarial harness” must become standard practice.21
Establishing proof registries. Rather than allowing companies to self-certify, independent bodies should maintain registries of validated proofs. Systems would receive trustability status (trustable, conditionally trustable, or not trustable) based on the demonstration of required proofs. The absence or decay of proofs should trigger a mandatory reassessment.
Implementing cost-bearing friction. The speed of AI deployment must be constrained by the speed of safety validation. As research on responsible AI emphasizes, “Organizations should foster a culture of inquiry, inviting individuals to scrutinize ongoing activities for potential risks.”22 This means allocating substantial resources (researchers suggest 30% of research staff) to safety work, not as an afterthought but as core infrastructure.
Requiring named accountability with liability. Following the Trust Factory Model’s requirement for “human warrants,” individuals must accept personal accountability for AI systems, with revocation and liability clauses. This transforms abstract responsibility into concrete risk.
Establishing proof renewal cycles. All safety certifications should expire on fixed schedules, requiring re-demonstration of proofs. This accounts for model drift, distribution shift, and emergent capabilities. The 2024 review of AI auditing frameworks emphasized that “periodic auditing of AI systems” combined with “human oversight alongside automation” provides the minimum viable safety baseline.23
Conclusion: The Physics of Trust in the AI Era
Altman’s claim that he’s “just a tech nerd” building “just mathematics” exemplifies the dangerous fiction at the heart of contemporary AI development. The interview reveals not moral failing but structural inadequacy: individual leaders, however intelligent or well-intentioned, cannot bear the weight of civilizational-scale decisions through personal judgment alone.
The Trust Value Management framework offers an alternative vision: AI systems governed not by the improvised ethics of powerful individuals, but by reproducible, falsifiable proofs that constrain behavior within safe envelopes. This is not about stifling innovation; it’s about ensuring that innovation doesn’t outrun our capacity to govern it safely.
Recent research makes the stakes clear. The 2025 AI Index documented that model training compute doubles every five months, datasets expand every eight months, and power consumption increases annually, yet “performance gaps are shrinking” as capability diffuses.24 We are rapidly approaching a world where advanced AI is ubiquitous, cheap, and powerful. Whether that world is safe depends on whether we demand proofs now, while we still have the leverage to shape the infrastructure.
As Stanford researchers studying AI auditing concluded, “AI auditing is an inherently multidisciplinary undertaking” requiring contributions from computer scientists, social scientists, philosophers, legal scholars, and practitioners.25 The TVM framework provides the architecture for integrating these perspectives into a unified proof regime.
The question is not whether Altman or any individual CEO is qualified to make moral decisions for billions of dollars. The question is whether we will build systems that transcend individual judgment through structural accountability, or whether we will continue to trust promises from people who admit they’re “thinking on the spot” about questions that determine whether vulnerable humans live or die.
Compliance is merely the baseline. Proof demands falsifiable, empirical guarantees across trust, value, and measurement dimensions. Only by demanding proofs—not promises—do we escape what one researcher aptly termed “the trap of compliant AI that kills you anyway.”
Goodman, B., & Flaxman, S. (2017). European Union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3), 50-57. Referenced in multiple 2024 auditing studies as foundational work on audit limitations.
Future of Life Institute. (2024). FLI AI Safety Index 2024. Retrieved from https://futureoflife.org/document/fli-ai-safety-index-2024/
Stanford HAI. (2025). The 2025 AI Index Report. Retrieved from https://hai.stanford.edu/ai-index/2025-ai-index-report
Ibid.
Schiff, D. S., Kelley, S., & Camacho Ibáñez, J. (2024). The emergence of artificial intelligence ethics auditing. SAGE Journals. Retrieved from https://journals.sagepub.com/doi/full/10.1177/20539517241299732
Mökander, J., & Floridi, L. (2023). Auditing of AI: Legal, ethical, and technical approaches. Digital Society, 2(74). Retrieved from https://link.springer.com/article/10.1007/s44206-023-00074-y
ISACA. (2024). A proposed high-level approach to AI audit. ISACA Journal, Volume 2. Retrieved from https://www.isaca.org/resources/isaca-journal/issues/2024/volume-2/a-proposed-high-level-approach-to-ai-audit
Tidjon, L. N., & Khomh, F. (2023). Never trust, always verify: A roadmap for AI safety. arXiv. Referenced in multiple 2024-2025 safety framework studies.
Oveh, E., & Isitor, O. (2025). Bias audit framework for AI systems. ResearchGate. Study demonstrating gains in equity metrics in healthcare domain applications.
D-BIAS research is referenced in the TVM framework documentation as an example of a human-in-the-loop causal fairness approach.
AI Ethics Study. (2025). AI ethics: Integrating transparency, fairness, and privacy in AI development. Taylor & Francis Online. Retrieved from https://www.tandfonline.com/doi/full/10.1080/08839514.2025.2463722
Bias research compilation. (2024). Bias and ethics of AI systems applied in auditing - A systematic review. ScienceDirect. Retrieved from https://www.sciencedirect.com/science/article/pii/S2468227624002266
EU AI Act implementation examples from 2025 compliance studies, particularly regarding the deployment requirements for the AI Fairness 360 toolkit.
Center for AI Safety. (2024). AI risks that could lead to catastrophe. Retrieved from https://safe.ai/ai-risk. Discusses high-reliability organization principles, including loose coupling, separation of duties, and fail-safe design.
Microsoft Tay incident widely documented in AI safety failure literature, including 2024 retrospective analyses of early chatbot deployment failures.
AI Now Institute research on audit power asymmetries, cited in multiple 2024 auditing framework studies.
Specification gaming and reward hacking are documented in AI safety research, including publications by OpenAI and DeepMind on alignment failures.
Schiff et al., 2024, op. cit.
McKinsey & Company. (2025). AI in the workplace: A report for 2025. Retrieved from https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/superagency-in-the-workplace-empowering-people-to-unlock-ais-full-potential-at-work
Ibid.
UK Government. (2025). International AI Safety Report 2025. Retrieved from https://www.gov.uk/government/publications/international-ai-safety-report-2025
Center for AI Safety, 2024, op. cit.
Multiple 2024 systematic reviews of AI auditing frameworks emphasize these minimum requirements for viable safety oversight.
Stanford HAI, 2025, op. cit.