AI governance fails in most enterprises for one reason: it is written instead of built. This paper presents the six-layer governance stack we install in client environments, one two-week sprint at a time, with each layer mapped to its EU AI Act and GDPR duties. It is written for technology and compliance leaders who need governance that holds up in an audit, not a binder that holds up a shelf.

Governance programs produce binders; governed systems produce evidence

Walk into most enterprises eighteen months into their AI governance program and you will find the same artifacts. A responsible AI policy, ratified. A risk committee that meets monthly. What you will not find is a single control that executes when an employee pastes customer records into a chatbot at 11 p.m. We have built AI systems for enterprises since 2014, more than 200 production deployments, and we have never seen a document stop a transaction. Software stops transactions.

The gap between governance on paper and governance in production is now measurable. McKinsey's State of AI survey, fielded in mid-2025 with nearly 2,000 respondents across 105 countries, finds that 51 percent of organizations using AI have seen at least one negative consequence from it. The same survey finds organizations now mitigate an average of four AI-related risk types, double the figure from 2022. Awareness is rising faster than enforcement. IBM's 2025 Cost of a Data Breach report finds that 63 percent of breached organizations either have no AI governance policy or are still writing one, and that among organizations that suffered an AI-related breach, 97 percent had no AI access controls in place.

Our answer is a stack, not a program. Six layers, each one a set of running controls inside the systems they govern: an AI inventory at the base, then a data foundation, then data security and access, then model assurance, then human oversight, then compliance and audit at the top. Each layer depends on the layer beneath it (Exhibit 1). You cannot test fairness on a model you have not registered. You cannot produce an audit trail from a system that never emitted a log. Most governance programs fail because they start at layer six, the compliance layer, and try to hang it in mid-air.

Exhibit 1: Each governance layer depends on the layer beneath it, and most failed programs build the top with nothing underneath

The stack reads bottom-up: AI Inventory, Data Foundation, Data Security and Access, Model Assurance, Human Oversight, Compliance and Audit. Arrows run upward: registry entries feed lineage maps, lineage feeds access scopes, access-scoped data feeds model benchmarks, benchmarked models feed human review queues, and every layer writes events into the audit trail at the top. A second panel shows the typical failed program: a compliance layer drawn first, connected to nothing below it.

A policy describes the control you intend to have. The stack is the control. Auditors and attackers both interact with the second one.

Layer 1, AI Inventory: You cannot govern systems you do not know you are running

The base layer is a complete, current register of every AI system in the organization: the sanctioned ones in your architecture diagrams and the unsanctioned ones in your employees' browser tabs. It has five working parts: shadow AI detection, system classification, risk tiering, ownership assignment, and a model registry that acts as the system of record.

The failure mode when this layer is missing now has a price. IBM's 2025 report finds one in five organizations has experienced a breach traced to shadow AI, and organizations with high levels of shadow AI saw an average of $670,000 in added breach cost. Those incidents also leak worse data: 65 percent of shadow AI breaches compromised customer personal information against a 53 percent global average, and 40 percent compromised intellectual property against 33 percent.

In a client environment we start with discovery, not policy. We pull network egress logs against a list of known AI endpoints, reconcile single sign-on data and expense reports against the approved-tool list, and interview the teams whose names surface. The findings populate a model registry. Every entry gets a risk tier aligned to the EU AI Act's categories, from prohibited practices through high-risk, limited-risk, and minimal-risk systems, and a named owner. An unowned system is an ungoverned system, so the registry refuses entries without one.

The regulatory stake is direct. The EU AI Act's penalty regime under Article 99 reaches 35 million euros or 7 percent of total worldwide annual turnover for prohibited practices, and 15 million euros or 3 percent for violations of core obligations. Risk tiering determines which duties apply to which system. Without an inventory you are not out of scope. You are out of visibility, which is worse.

Layer 2, Data Foundation: The EU AI Act regulates your training data, not just your model

The second layer governs the data your AI systems learn from and retrieve against. Its working parts: source tracking, lineage mapping from raw source to model input, quality validation gates, freshness monitoring, and bias screening of training and retrieval data.

Skip this layer and the failure arrives in production wearing plausible language. A retrieval system confidently serves a policy that was superseded eight months ago. A scoring model penalizes one region because its training extract over-sampled a single quarter. Nobody can answer the first question a regulator asks: where did this output come from?

This layer is written into law for high-risk systems. Article 10 of the EU AI Act requires data governance practices that cover design choices, data collection processes and the origin of data, and preparation operations such as annotation, labelling, cleaning, and enrichment. Providers must examine possible biases likely to affect people's health and safety or their fundamental rights, and take measures to detect, prevent, and mitigate them. Training, validation, and testing data sets must be relevant, sufficiently representative, free of errors to the best extent possible, and complete in view of the intended purpose.

What we install is unglamorous and decisive. Every data source feeding a model or retrieval index is registered with its origin, refresh cadence, and steward. Lineage is mapped so each model input traces back to a source system. Quality gates run on schedule and block the pipeline on failure rather than warning into a dashboard nobody reads. Freshness monitors flag stale sources before users find them the hard way. Bias screens run on training extracts before a model ever sees them.

Layer 3, Data Security and Access: Access control is where governance first becomes enforceable

The third layer is the first one that can physically stop a bad outcome. Its working parts: encryption at rest and in transit, anonymization and pseudonymization in data pipelines, role-based access enforced at the point of retrieval, least-privilege service identities, and key management.

The evidence says this is where the gap is widest. IBM's 2025 report finds that among organizations that suffered an AI-related breach, 97 percent lacked AI access controls. The pattern we see in delivery work matches the statistic. Enterprises spend years building document-level permissions in their content platforms, then index everything into a vector store that ignores all of it. The model becomes a side door to the file share.

Our standing rule on retrieval systems: no user may receive, through a model, a document they cannot open directly. We enforce permissions at query time against the source system's access model, not against a copy made at indexing. Service identities run on least privilege. Keys live in managed vaults with rotation. Where personal data feeds training or evaluation, it is pseudonymized first.

GDPR makes most of this a legal duty rather than an engineering preference. Article 32 requires measures appropriate to the risk, and it names pseudonymisation and encryption of personal data alongside the ability to ensure confidentiality, integrity, availability, and resilience of processing systems. Article 25 requires data protection by design and by default, including the default that only personal data necessary for each specific purpose are processed. An AI pipeline that copies the entire data lake into an embedding index fails that default before the first query runs.

Layer 4, Model Assurance: A model without evidence is a liability with an API

The fourth layer produces the evidence that a model does what its owner claims. Its working parts: model cards, performance benchmarks, fairness testing, red-teaming, and drift detection.

Without it, every conversation about a model is an argument about anecdotes. The business swears the system works. The one bad output that reached a customer says otherwise. Nobody has a number, so the decision to renew or kill gets made on the loudest voice in the room.

NIST's AI Risk Management Framework, released in January 2023 for voluntary use, organizes this discipline into four functions: Govern, Map, Measure, and Manage. Measure is where most enterprises are weakest, and it is the one function that cannot be delegated to a vendor's marketing benchmark. You measure on your own data and your own tasks, or you have not measured.

We hold ourselves to this layer commercially. On every engagement we write executable acceptance criteria with the client on day one, build for two weeks in the client's environment, and collect our fee, $10,000 per sprint, only after every criterion passes. That is model assurance applied to ourselves: claims expressed as tests that run in the buyer's environment, with payment gated on the result. We then install the same machinery for the client's own models: a model card per system, a benchmark suite that runs on every deployment, red-team exercises before exposure to real users, fairness tests wherever outputs touch people, and drift monitors that alert when production inputs walk away from the training distribution.

If a governance claim cannot be expressed as a test that passes or fails, it is not a control. It is a hope.

Layer 5, Human Oversight: The EU AI Act assumes a human can say no; most deployments give no one that power

Article 14 of the EU AI Act requires high-risk systems to be designed so the humans overseeing them can understand the system's capacities and limitations, stay alert to the tendency to over-rely on automated output, correctly interpret what the system produces, decide in any particular situation not to use it or to disregard, override, or reverse its output, and stop it through a control that brings it to a halt in a safe state. For certain biometric identification uses the Act goes further: no action may follow an identification unless at least two competent natural persons have separately verified it.

The common failure is oversight theater. A human sits in the loop with no authority and no interface to act. Review becomes a click-through. When something goes wrong, the organization discovers that its escalation path is a chat channel and its override authority is whoever happens to be awake.

We build oversight as workflow, not as a sentence in a policy. Decision review queues, with sampling rates set by risk tier rather than by reviewer stamina. Escalation paths with named roles and response-time targets. Override authority wired into the interface, with every override logged and fed back into evaluation, because an override is the most valuable training signal a governed system produces. Output validation against ground truth where ground truth exists. And an accountability map that names one human for every class of automated decision, because shared accountability is no accountability.

Layer 6, Compliance and Audit: An audit trail assembled after the incident is not an audit trail

The top layer turns everything beneath it into evidence a regulator can use. Its working parts: EU AI Act mapping, GDPR alignment, policy enforced as code, incident reporting, and audit trails. The Act is specific about that last item. Article 12 requires high-risk AI systems to technically allow the automatic recording of events over the lifetime of the system, a duty that takes effect for these systems from August 2, 2026. Logging cannot be retrofitted onto a system that never emitted events. It has to be built in, which is why this layer sits at the top of the stack rather than at the start of the project plan.

Incident reporting now carries deadlines no manual process can meet. Under Article 73, providers of high-risk systems must report serious incidents to market surveillance authorities within 15 days of becoming aware of them, within 10 days where a death is involved, and within 2 days for widespread infringements. An organization that needs three weeks to reconstruct what its model did cannot comply. The reconstruction must already exist, as logs, before anyone asks for it.

This is the layer where our work for a European pharmaceutical regulator lives. The AI compliance scanner we built for them applies 11 rules to every marketing asset and has processed more than 620 assets, at roughly two minutes per asset against the two to three hours each took under manual review. The lesson generalizes well beyond pharma: when the rules execute as code at the moment of review, the audit trail is a byproduct of normal operation, not a quarterly archaeology project.

Every layer of the stack discharges a named duty under the EU AI Act or GDPR

Layer	Primary legal anchor	Duty it discharges
1. AI Inventory	EU AI Act Art. 99 penalty tiers	Knowing which obligations and which fine exposure apply to which system
2. Data Foundation	EU AI Act Art. 10	Data governance, representativeness, and bias examination for training, validation, and testing data
3. Data Security and Access	GDPR Arts. 25 and 32	Data protection by design and by default; pseudonymisation, encryption, and resilience of processing
4. Model Assurance	NIST AI RMF (voluntary)	Measured, documented model performance and fairness on the deployer's own data
5. Human Oversight	EU AI Act Art. 14	Effective oversight, the authority to disregard, override, or reverse output, and a safe-state stop
6. Compliance and Audit	EU AI Act Arts. 12 and 73	Automatic event logs over the system lifetime; serious-incident reports within 2 to 15 days

Governance ships in two-week sprints, not in annual programs

Everything above can be read as a maturity model and shelved. We mean it as a build sequence. Our delivery model is fixed: scope one workflow, sign executable acceptance criteria on day one, build in the client's environment for two weeks, and get paid only when every criterion passes. Larger programs run as repeated two-week sprints on the same terms. Governance fits this model better than almost any other AI work, because every control in the stack reduces to a test that passes or fails.

A six-sprint sequence installs the full stack, and each sprint ends with a control running in production

Sprint	Layer	What ships	Acceptance criterion, example
1	AI Inventory	Discovery scan, model registry, risk tiers, named owners	Registry lists every detected AI system with owner and EU AI Act risk tier; unknown egress to AI endpoints raises an alert within 24 hours
2	Data Foundation	Source register, lineage map, quality gates, freshness monitors	Every model input traces to a registered source; a seeded stale record blocks the pipeline instead of reaching the index
3	Data Security and Access	Encryption, pseudonymization, retrieval-time RBAC, key vaulting	No test account can retrieve, through the model, any document it cannot open directly in the source system
4	Model Assurance	Model cards, benchmark suite, red-team pass, drift monitors	Benchmark suite runs on every deployment; a seeded distribution shift fires a drift alert within one cycle
5	Human Oversight	Review queues, escalation paths, logged override authority	A reviewer can override any output from the interface; the override is logged and changes the decision of record
6	Compliance and Audit	Audit log, AI Act and GDPR mapping, incident runbook	An auditor reconstructs a sampled decision end to end from logs alone, inside the Article 73 two-day window

The sequence flexes with the estate. An enterprise with hundreds of AI systems may spend two sprints on inventory alone, and a single focused deployment may compress layers two and three into one sprint. What does not flex is the rule that a sprint ends with a control running in production, demonstrated against criteria the buyer wrote with us. We have shipped on these terms across estates as large as the AI-driven digital workplace we built for Rockwell Automation, which serves more than 28,000 employees in more than 80 countries. Scale changes the sprint count. It does not change the unit of progress.

We build to SOC 2, HIPAA, and GDPR standards where they apply, and we align every implementation with the EU AI Act. None of that lives in a separate compliance phase at the end of a program. It is installed layer by layer, inside the working system, which is the entire argument of this paper. Governance that ships is governance that was never separable from the build in the first place.

What to do with this on Monday morning

1. Pull 30 days of network egress against known AI endpoints and reconcile it with your approved-tool list. The delta is your shadow AI estate. Put it in a registry by Friday.
2. Assign a named owner and an EU AI Act risk tier to every system in that registry, and refuse new registry entries that arrive without an owner.
3. Run one retrieval query against your AI assistant from your most junior employee's account. If it returns anything that account cannot open directly, halt the rollout and fix access enforcement before any other governance work.
4. Pick your highest-risk AI workflow and write five executable acceptance criteria for its governance controls. Reject any governance workstream whose output cannot be expressed as a passing test.
5. Run a mock Article 73 drill: reconstruct one automated decision end to end from logs alone, and time it. If the answer exceeds two days, your audit layer does not exist yet.
6. Commission the first two-week sprint on the lowest layer you are missing, not the highest layer your board asked about.

Sources

The six-layer AI governance stack: Governance that ships, not governance that stalls