I built a governance layer for AI agents after watching them fail silently in production
Picture this: a healthcare AI agent is triaging patient intake. It's running on a solid model, well-prompted, tested in staging. In production, a patient describes symptoms that match two possible ...

Source: DEV Community
Picture this: a healthcare AI agent is triaging patient intake. It's running on a solid model, well-prompted, tested in staging. In production, a patient describes symptoms that match two possible care pathways — one urgent, one routine. The agent picks routine. No error is thrown. No log entry flags it. No human is notified. The patient waits three days for a callback that should have been a same-day referral. Nobody finds out until a follow-up call two weeks later. I'm not describing a real incident. But I've talked to enough people shipping agents into healthcare, fintech, and legal workflows to know this scenario isn't hypothetical — it's a near-miss waiting in every ungoverned production agent. The actual problem When we started shipping AI agents into regulated environments, the agents themselves weren't the problem. The problem was what surrounded them. Or didn't. No audit trail. When something went wrong, we had inference logs at best — token inputs and outputs, no semantic rec