The Deep Dive

The boring infrastructure that decides whether your AI project works

The interesting work in broadcast AI is not the models. It's the infrastructure around the models. This is the stuff nobody wants to put on a slide because it doesn't sound transformative. It's also where almost every failed implementation quietly fails.

Let's be specific about what "boring infrastructure" actually means.

Context management. Your facility has naming conventions that evolved over decades. Your monitoring stack has quirks that nobody has properly documented. Your scheduling system has edge cases that only the senior ops team remembers. An agent that doesn't have that context will be confidently wrong in ways that look plausible on the surface and fall apart on the fourth alert of the night. Building the context is an enormous, unglamorous, essential piece of work. Most vendor proposals underestimate it by an order of magnitude.

Tool access and permissions. An agent that can observe but can't act is a dashboard. An agent that can act but has too much access is a liability. Getting the blast radius right, what it can do autonomously, what requires confirmation, what's completely off-limits is an architecture problem, not a configuration problem. It needs to be designed upfront and reviewed whenever the scope changes.

Inter-agent handoff. As soon as you have more than one agent, you have a coordination problem. Agent A detects a fault. Agent B has the context about whether that fault matters right now. Agent C has the authority to escalate. If the handoffs between them drop information, mis-route it, or add latency, the system fails invisibly. The operator sees alerts that don't quite line up. Trust erodes.

Human interface design. What does the operator see? When do they see it? What's the action they're being asked to take? What context do they have? What's their escape hatch if the agent is wrong? These questions sound like UX but they're architecture. The answer shapes what data the agent needs to capture, what state it needs to preserve, and what decisions can be pushed to the model versus what must sit in deterministic code.

Error handling and recovery. What happens when the agent is wrong? What happens when a tool it depends on is down? What happens when the underlying model times out, rate-limits, or refuses to answer? Production systems need an answer to each of these that doesn't involve the operator rebooting the stack. Building that requires thinking explicitly about failure modes, which most teams skip because they're building toward a demo.

Observability of reasoning. When something goes wrong in a traditional system, you trace the logs. When something goes wrong in an agentic system, you need to trace the reasoning. Why did it escalate? What was it considering? What context did it have? If you can't answer those questions after the fact, you can't improve the system, you can't debug failures, and you can't build trust with the ops team.

None of that is glamorous. All of it is essential. And all of it is where vendors cut corners in pursuit of a compelling demo.

The boring infrastructure is where the real engineering happens. It's also the difference between a system that quietly gets decommissioned in six months and one that becomes part of how your operation runs.

Spend your budget on the unsexy parts. The sexy parts barely need it.

Off the Record

A real failure mode: the agent that drowned in its own alerts

Worth sharing a composite scenario -- elements drawn from multiple projects, names and specifics changed. It illustrates how these things fail in the real world.

A broadcaster deployed an agent to handle alert triage in their NOC.
Goal: reduce the volume of alerts operators had to manually review.
The system was deployed, trained on historical data, and went live. For the first week it was great. Volume was down, operators were happier, the project lead was ready to write it up as a success.

Week two, something changed upstream. A content delivery provider updated their monitoring output format. The agent started getting a trickle of alerts it didn't recognise. Because it had been built to be cautious, it escalated them all. Because the escalation path went through a ticketing queue the ops team was already monitoring, the tickets piled up there instead of in the alert system they thought they'd offloaded.

Within three weeks, the ops team had a second queue they were watching. Within six weeks, they had manual workarounds to filter the tickets. By month three, the agent was considered "noisy" and slowly removed from the critical path.

The failure wasn't technical, strictly speaking. Every component did what it was supposed to do. The failure was architectural. Nobody had thought about what happens when the agent encounters something genuinely unfamiliar. Nobody had defined the threshold for escalation, or built monitoring to catch when that threshold was being hit too often. Nobody had designed the feedback loop that would have surfaced the upstream format change as a real issue.

The lesson isn't "agents are bad." It's that agents need infrastructure around them that most vendors don't help you build and most buyers don't know to ask for.

Signal Vs Noise

Worth paying attention to: Any vendor proposal that includes a detailed plan for monitoring the agent's own behaviour, not just the systems it's monitoring.

Overhyped right now: "Self-improving" agents. Most are doing light prompt tuning based on feedback signals. Real continuous learning in a production environment is rare, hard, and not what most vendors are actually delivering.

Worth reading: If you're evaluating an agentic system, look up "observability for LLM applications" and read widely. The tooling is maturing fast and it's the backbone of any serious production deployment.
https://medium.com/@zakariabenhadi/a-practical-guide-to-observability-for-llm-applications-logs-traces-and-quality-metrics-c29568ef52eb

---

The Clean Feed is published every Thursday. Forward this to someone who builds broadcast systems.

Keep reading