Support Triage Agent — Rollout Readiness

Production Gate Checklist

Passed

Eval suite in CI

22 golden cases run on every PR; pass rate gate at 95%.

Passed

HITL approval gates defined

High-severity escalations require human confirmation before Tier-2 assignment.

Passed

PII handling validated

Redaction layer tested; GDPR-relevant fields stripped before LLM context.

Passed

Tool permissions scoped

All tools operating at minimum required permission tier (read-only for lookup tools).

Passed

Fallback behavior tested

Tool-failure fallback routes to human queue with full context; tested in staging.

Passed

Rollback procedure documented

Feature flag controls agent routing; toggle disables in <2 minutes.

Passed

Observability instrumented

MELT stack deployed; OpenTelemetry spans on all agent-to-tool calls.

Not Met

Adversarial eval coverage

Prompt injection and adversarial ticket fixtures not yet in eval suite. Blocker B1.

Not Met

CRM tool circuit breaker

No circuit breaker defined for CRM write operations. Blocker B2.

Partial

Load test completed

Tested at 10 concurrent tickets; target production peak is 80. Full load test pending.

HITL Posture

Dimension	Setting
Autonomy level	L2 Assisted
Approval gates	High-severity tickets require human confirmation before Tier-2 assignment; refunds over $500 require explicit approval
Override path	Support agent can override any routing decision via the triage UI override button; logged and audited
Escalation trigger	Confidence below 0.7, unrecognized category, customer sentiment flagged as high-distress, or any mention of legal/compliance terms
Fallback behavior	Route to human queue with full ticket context and agent reasoning summary; max 4-hour SLA

Observability Readiness

Instrument	Status	Notes
Distributed tracing (OpenTelemetry)	In Place	CHAIN, LLM, and TOOL spans instrumented. Trace IDs propagated to CRM and ticketing system calls.
LLM call logging	In Place	Prompt/completion pairs logged (PII stripped) with model, temperature, and token counts.
Token cost metrics	Partial	Total tokens per session logged; per-tool-call cost breakdown not yet available. Cost dashboards not yet configured.
Error rate alerting	In Place	PagerDuty alert on error rate > 5% over 5-minute window; escalates to on-call engineer.
Latency p95 dashboard	Partial	Overall latency measured; per-span breakdown (model inference vs tool vs network) not yet visible in dashboard.
Session replay	Missing	Full agent session replay (for post-incident diagnosis) not implemented. Logs available but no replay interface.
Audit trail (HITL actions)	In Place	All human override and approval actions logged with timestamp, agent ID, and routing decision delta.

Rollout Risk Register

High

Adversarial ticket injection

A malicious customer could embed instructions in ticket text that override the agent's routing behavior, potentially routing tickets to wrong queues or leaking routing rules.

Add prompt injection detection layer before ticket text reaches agent context; implement adversarial eval suite (Blocker B1).

High

CRM write without circuit breaker

The CRM status-update tool has no circuit breaker. A runaway loop or cascading failure could write incorrect statuses to hundreds of tickets before detection.

Implement circuit breaker on CRM write tool: max 5 writes per session, pause after 3 consecutive errors (Blocker B2).

Medium

Load capacity gap

Peak production load (80 concurrent tickets) is 8x the tested load (10). Performance characteristics at production scale are unknown.

Complete full load test at 80 concurrent tickets before expanding beyond canary; set auto-scaling thresholds.

Medium

Context drift on long ticket threads

Ticket threads with 20+ messages approach context window limits. Agent behavior at window boundary not tested; may produce inconsistent routing decisions.

Implement context summarization for threads exceeding 15 messages; add 3 long-thread eval fixtures.

Medium

Model version drift

Agent is pinned to claude-3-5-sonnet-20241022. If the model is updated or deprecated, routing accuracy may regress silently without re-evaluation.

Pin model version in config; add model version to eval metadata; set 90-day re-eval reminder when model is updated.

Open Blockers

Blocker

B1: Adversarial eval suite not implemented — prompt injection detection not validated. Agent may misroute tickets with embedded instructions. Blocks production launch.

Owner: ML Engineer (Jane D.) · Target: 2026-05-23

Blocker

B2: CRM write tool circuit breaker absent. Runaway writes possible under error cascade. Blocks production launch for all tickets that trigger CRM updates.

Owner: Backend Engineer (Mark L.) · Target: 2026-05-20

Rollout Recommendation

Deploy to canary at 5% traffic (read-only ticket classification only — disable CRM write tool in canary config) while blockers B1 and B2 are resolved. Monitor error rate, latency p95, and HITL escalation rate during canary. Once B1 and B2 are cleared and load test passes, expand to 25% then full traffic in 1-week increments. Do not enable CRM write tool at any traffic tier until B2 (circuit breaker) is implemented and tested.