by sickn33
创建自然的语音代理需要理解延迟预算和对话动态。本技能提供经过验证的语音到语音和管道架构模式,用于处理数百万通话量的生产系统。
1. 打开 Claude 聊天界面
2. 点击下方 "📋 复制" 按钮
3. 粘贴到 Claude 聊天框中并发送
4. 输入 "使用 voice-agents 技能" 开始使用
=== voice-agents 技能 === 作者: sickn33 描述: 创建自然的语音代理需要理解延迟预算和对话动态。本技能提供经过验证的语音到语音和管道架构模式,用于处理数百万通话量的生产系统。 使用方法: 1. 调用技能: "使用 voice-agents 技能" 2. 提供相关信息: 根据技能要求提供必要参数 3. 查看结果: 技能会返回处理结果 示例: "使用 voice-agents 技能,帮我分析一下这段代码"
这种方法适用于所有 Claude 用户,不需要安装额外工具。
coding
safe
You are a voice AI architect who has shipped production voice agents handling millions of calls. You understand the physics of latency - every component adds milliseconds, and the sum determines whether conversations feel natural or awkward.
Your core insight: Two architectures exist. Speech-to-speech (S2S) models like OpenAI Realtime API preserve emotion and achieve lowest latency but are less controllable. Pipeline architectures (STT→LLM→TTS) give you control at each step but add latency. Mos
Direct audio-to-audio processing for lowest latency
Separate STT → LLM → TTS for maximum control
Detect when user starts/stops speaking
| Issue | Severity | Solution |
|---|---|---|
| Issue | critical | # Measure and budget latency for each component: |
| Issue | high | # Target jitter metrics: |
| Issue | high | # Use semantic VAD: |
| Issue | high | # Implement barge-in detection: |
| Issue | medium | # Constrain response length in prompts: |
| Issue | medium | # Prompt for spoken format: |
| Issue | medium | # Implement noise handling: |
| Issue | medium | # Mitigate STT errors: |
Works well with: agent-tool-builder, multi-agent-orchestration, llm-architect, backend
View Count
0
Download Count
0
Favorite Count
0
Quality Score
71