API触发有状态多轮电话AI管道：实时ASR/TTS集成与session checkpointing

在呼叫中心场景中，构建一个支持 API 触发的有状态多轮电话 AI 代理，能够显著提升客服效率，尤其适用于保险、IT 支持等需要收集结构化数据的领域。这种管道的核心在于实时 ASR（语音转文本）和 TTS（文本转语音）的流式集成，同时通过 session checkpointing 实现对话状态持久化，确保断线续传和上下文连续性，避免用户重复说明问题。

API 触发与 Session 初始化

管道的入口是一个简单的 POST /call API，支持 outbound（AI 主动呼叫）和 inbound（用户呼叫 AI 号码）。例如，使用 curl 发送 JSON payload，包括 bot_company、bot_name、phone_number、task 描述和 claim schema，即可启动会话。系统使用 Azure Communication Services 作为呼叫网关，分配专用号码，支持 SMS 辅助信息交互。

初始化时，session ID 基于 phone_number 生成，conversation 历史、claim 数据和 reminders 存储在 Cosmos DB 中。默认 claim schema 包括 caller_name（text）、caller_email（email）等，支持自定义字段如 incident_datetime（datetime）。这种 stateful 设计确保多轮对话中，AI 能引用历史上下文，例如上轮收集的硬件信息，避免遗漏。“项目支持实时流式对话，可在断线后恢复会话。”

为实现低延迟，配置 vad_threshold=0.5（语音活动检测阈值，0.1-1 间），vad_silence_timeout_ms=500，vad_cutoff_timeout_ms=250。这些参数控制静音检测，防止误触发。recognition_stt_complete_timeout_ms=100 确保 STT 快速完成，recognition_retry_max=3 提供容错。

实时 ASR/TTS 管道参数

核心管道流程：用户语音 → Communication Services → STT（Cognitive Services 实时模式） → LLM（gpt-4o-mini 或 gpt-4o） → TTS → 流式播放。使用 Redis 缓存 RAG 数据和历史，提升响应。

关键超时参数：

answer_soft_timeout_sec=4：LLM 延迟时发送 “稍等” 提示。
answer_hard_timeout_sec=15：超时直接报错。
phone_silence_timeout_sec=20：用户静音超时时 AI 提示继续。

LLM 集成使用 OpenAI SDK 直接调用，支持 multi-tools（如 claim 更新、reminder 创建）和 streaming，避免框架开销。RAG 通过 AI Search 检索自定义训练数据（index schema 含 vectors via ADA embedding）。为优化延迟，优先 gpt-4o-mini（成本低 10-15x 性能高），PTU 部署 Azure OpenAI 进一步减半 TTFT。

TTS 支持多语言，默认 fr-FR-DeniseNeural，可自定义 Neural Voice。prompts 使用模板如 “{bot_name} from {bot_company}，请描述问题”，随机选句增强自然性。Moderation 通过 Azure Content Safety 过滤，阈值 0-7 级。

Session Checkpointing 与断线续传

每轮交互后，系统自动 checkpoint：messages 列表存 user/assistant 内容（含 action: talk, timestamp），claim 更新，synthesis 生成 short/long 总结，satisfaction 评估。next_action 如 “case_closed” 或 reminder。

断线时，Event Grid 推事件至 Storage Queues，app 监听恢复。历史对话作为 LLM 上下文输入，支持 fine-tuning。报告 API /report/{phone} 展示全历史，便于人工审核。

实现清单：

部署 Azure 资源：Communication Services（号码）、Cognitive Services、OpenAI、Cosmos DB、Redis、AI Search。
配置 config.yaml：llm endpoints、prompts、claim schema、features（如 recording_enabled）。
make deploy（Bicep IaC），暴露 API。
测试：curl POST/call，监控 App Insights。
生产化：多区、vNET、tests 覆盖。

监控与优化

集成 Application Insights 和 OpenLLMetry，追踪 LLM spans（latency/tokens）、custom metrics 如 call.answer.latency（用户说完到 AI 回应时延）、call.aec.droped（回声消除失败）。

优化点：

采样 logs 减 Monitor 成本。
A/B 测试 features via App Config。
Fine-tune on 历史数据（脱敏后）。
回滚：feature_flags 控制，如 slow_llm_for_chat。

成本估算（1000 通 10min / 月）：~720 USD，主要 Cosmos RU/s 和 Speech。生产需升级 SKU、私有端点。

此管道适用于中低复杂度呼叫，24/7 可用。通过参数调优和 checkpointing，实现可靠低延迟交互。

资料来源：Microsoft call-center-ai GitHub 仓库（https://github.com/microsoft/call-center-ai），架构与配置详见 README。