The operating model for in-house AI: roles, governance, and release process
AI as a project ends; AI as a capability compounds
Most enterprise AI initiatives we audit have the same shape: a successful initial deployment, a 9-month plateau, and a slow drift into 'we have AI but nobody really knows what's running in production anymore.' The cause is the absence of an operating model. The deployment was a project; the capability never got built around it; staff turned over; institutional knowledge leaked away; the system kept running but stopped evolving.
An operating model fixes this by giving AI the same institutional plumbing as finance, security, and platform engineering. Defined roles, recurring governance, explicit policies, documented decisions. The work that turns a launched system into a sustained capability is exactly the work that doesn't ship a feature — and exactly the work that determines whether the feature still works in 18 months.
Four roles cover the operational surface, with named owners
Model owner: responsible for a specific model or set of models in production, accountable for performance, cost, and roadmap. Eval owner: maintains the evaluation harnesses for the systems in scope, defines what 'correct' means alongside domain experts, signs off on releases against eval thresholds. Platform owner: operates the inference, retrieval, observability, and governance infrastructure that all models share. Governance chair: leads the recurring forum, owns the decision log, escalates to leadership when needed.
These are roles, not necessarily four separate people. A small organization may have one engineer wear three hats; a larger one will have a team per role. What matters is that each role has a named owner with documented authority. Ambiguity at this layer is where governance fails.
- Defined roles
- 4 model / eval / platform / governance
- Governance forum cadence
- Biweekly binding decisions
- Release gates
- 5–7 eval, safety, cost, latency, governance
- Decision log retention
- 7 yrs audit-ready by construction
The governance forum is where decisions get made, not where status gets reported
Most AI 'governance' meetings we encounter in audits are status updates. The team reports what they shipped; the steering committee nods; nothing decisional happens. That's not governance — that's a meeting. The right structure is a biweekly forum with explicit decision authority: approve new model deployments, rule on eval threshold exceptions, prioritize the next quarter's roadmap, escalate risks that exceed the forum's authority.
Forum membership is small (5–8 people) and includes the four roles above plus one or two business representatives whose work the AI affects. Decisions are recorded with rationale; the decision log is the artifact that survives turnover. When the eval owner leaves, the next eval owner reads the log and catches up; institutional memory persists.
Eval policy is the document that prevents 'we'll just ship and see'
An explicit eval policy specifies: which suites are mandatory before any production deployment (quality, safety, regression, latency, cost), the thresholds each suite must meet, the categories within each suite that block release, the process for requesting a threshold exception, and the cadence of policy review. Without the policy, every release becomes an ad-hoc judgment call and exception becomes the norm.
The policy is owned by the eval owner, approved by the governance forum, and version-controlled alongside the platform code. Updates require forum approval, not unilateral changes. The discipline is that the policy is not advisory — it is binding on the release process and the release pipeline enforces it.
Model release process turns deployment into a documented act
A model release process has explicit gates. (1) Eval suite passes per the eval policy. (2) Safety review is recorded. (3) Cost projection within budget envelope. (4) Latency SLA met. (5) Operational runbook updated. (6) Governance forum approval recorded. (7) Rollback plan documented. Each gate has a named owner and a checklist artifact. Deployment proceeds only when all gates pass.
The process is enforced in CI/CD. A merge to the production branch that doesn't satisfy each gate fails the pipeline. Manual overrides are possible — sometimes necessary — but they generate audit-trail entries with the override reason and the approving authority. There is no path from code to production that bypasses the gates silently.
Incident response is its own process and needs its own owner
Production AI incidents — quality regression, safety failure, cost spike, latency degradation — need a defined response process. Who triages, who has authority to roll back, who communicates to affected stakeholders, who runs the post-incident review. The model owner is usually the on-call lead, with escalation to the governance chair for incidents that exceed the model owner's authority.
Post-incident reviews feed back into eval policy, red-team set, and operational runbook updates. An incident the eval suite missed is an eval gap that gets fixed. An incident the runbook didn't cover is a runbook update. The capability gets stronger after each incident, not after each project. This is what compounding looks like operationally.
Capacity planning and budget discipline live in the operating model, not in finance
AI capacity planning — token budget, GPU reservations, vendor commits, eval-set growth, headcount — has to live somewhere. The operating model owns it because the operating model has the visibility into what's coming, what's drifting, and what needs to expand. The platform owner produces a quarterly capacity plan that the governance forum approves; finance receives it as a downstream consumer, not as the originator.
When AI sits without an operating model, capacity planning becomes a finance-driven exercise that always lags reality. The team scales by surprise, vendor commits expire mid-roadmap, and headcount asks come without context. The operating model produces capacity plans that finance can act on, instead of the other way around.
The operating model is the artifact your auditor will read first
Banking regulators, FDA SaMD reviewers, DoD AI assurance teams, and SOC 2 auditors all read the operating model before they look at the code. The artifacts they expect: roles document, eval policy, release process, decision log, incident playbook, capacity plan. When these exist and are current, audits move quickly. When they don't, audits become evidence-assembly exercises that consume the team for weeks.
We deliver enterprise AI in-house engagements with the operating model as a primary artifact, not an afterthought. The platform stand-up matters; the operating model is what makes the platform sustainable. Skipping the operating model to ship the platform faster is the most expensive shortcut available in this work.
Eighteen months in, our original AI engineer left. The next engineer was up to speed inside two weeks because the decision log told them why every architectural choice was made and the runbook told them how to operate everything. The operating model wasn't paperwork — it was the institutional memory that made the capability survive the turnover.
— Director of AI Platform, financial services client
Frequently asked
Why does an enterprise AI operating model matter?
Because AI as a project ends; AI as a capability compounds. Most enterprise AI initiatives plateau 9 months after launch and drift into 'we have AI but nobody really knows what's running anymore.' The cause is the absence of an operating model — defined roles, recurring governance, explicit policies, documented decisions. The operating model is the institutional plumbing that turns a launched system into a sustained capability.
What roles does the operating model define?
Four. Model owner — accountable for specific models in production. Eval owner — maintains evaluation harnesses, defines correct, signs off on releases. Platform owner — operates inference, retrieval, observability, governance infrastructure. Governance chair — leads the recurring forum, owns the decision log. These are roles, not necessarily four separate people; the discipline is that each has a named owner with documented authority.
What does the governance forum actually do?
Makes decisions, not status reports. Biweekly cadence, 5–8 members including the four core roles plus business representatives. Approves new model deployments, rules on eval threshold exceptions, prioritizes roadmap, escalates risks that exceed forum authority. Decisions are recorded with rationale in a log that survives turnover. The forum is where 'we'll just ship and see' becomes 'we approved deployment with these conditions, recorded.'
What's in an eval policy?
Mandatory suites before production deployment (quality, safety, regression, latency, cost), thresholds per suite, categories that block release, the process for requesting threshold exceptions, and the cadence of policy review. The policy is owned by the eval owner, approved by the governance forum, version-controlled, and binding on the release pipeline. Without it, every release is an ad-hoc judgment call and exception becomes the norm.
What gates does the release process enforce?
Typically 5–7. Eval suite passes per policy. Safety review recorded. Cost projection within budget envelope. Latency SLA met. Runbook updated. Governance forum approval recorded. Rollback plan documented. Each gate has a named owner and a checklist artifact, and the CI/CD pipeline enforces them. Manual overrides exist but generate audit-trail entries with documented justification. There is no silent bypass.
How does the operating model serve audit and regulatory review?
Banking regulators, FDA SaMD reviewers, DoD AI assurance teams, and SOC 2 auditors read the operating model before the code. The expected artifacts — roles document, eval policy, release process, decision log, incident playbook, capacity plan — are exactly what the operating model produces. When current, audits move quickly. When missing, audits become evidence-assembly under time pressure. The operating model is the audit posture by construction.
More from Field Notes
All essays
Strategic Enterprise AI without the vendor lock: a 90-day program for sovereign capability
Operating model, platform stand-up, foundation-model strategy, eval, governance, and team enablement — what 90 days of in-house AI looks like.
Strategic Foundation-model strategy: frontier-hosted, open-weights, or bespoke fine-tunes
How to choose the right mix of frontier-hosted, open-weights, and fine-tuned foundation models for the enterprise — cost, sovereignty, capability tradeoffs.
Strategic Eval harnesses and the governance posture your auditor will accept
How to structure AI evaluation and governance so a SOC 2, FDA, banking-regulator, or DoD audit moves quickly — artifacts, evidence, and the audit-ready posture.