Voice Agent
Enable real-time voice conversations with natural speech recognition and synthesis
Voice Agent
Transform your Support BV into an intelligent voice assistant that can have real-time spoken conversations with users. Voice Agent mode provides natural, human-like voice interactions powered by advanced AI speech recognition and synthesis.
Voice Features
Support BV offers two voice input methods:
- Waveform icon: Activates Voice Agent mode - Real-time AI voice conversations with spoken responses
- Mic icon: Activates Dictate mode - Speech-to-text input for typing messages (no AI voice response)
Voice Agent mode requires microphone permissions and uses additional credits per conversation turn.
Quick Setup
Access Voice Settings
Navigate to Support BV > Voice Settings tab in your chatbot settings.
Select Voice Model
Choose between cost-effective or premium voice models based on your needs.
Choose Voice Personality
Select from 10 unique voices with different tones and characteristics.
Enable Voice Agent Mode
Users can click the waveform icon in the chatbot interface to activate Voice Agent mode and start real-time voice conversations.
Voice Model Selection
Choose the voice model that balances quality and cost for your use case:
GPT-4o Realtime Mini
Best for: Most use cases, cost-conscious deployments
- Cost: 5 credits per conversation turn
- Performance: Fast, efficient voice processing
- Quality: High-quality speech recognition and natural voice synthesis
- Ideal for: Customer support, general inquiries, high-volume interactions
70% cheaper than the premium model while maintaining excellent voice quality and responsiveness.
GPT-4o Realtime
Best for: Premium experiences, complex conversations
- Cost: 15 credits per conversation turn
- Performance: Enhanced natural language understanding
- Quality: Superior voice quality with advanced prosody
- Ideal for: Technical support, complex troubleshooting, executive assistance
Premium voice model with the highest quality speech synthesis and most sophisticated language understanding.
Credits Usage
Voice conversations consume credits per turn (question + answer exchange). A turn includes both user speech input and AI voice response. Credits are deducted from your workspace monthly quota.
Voice Selection
Choose from 10 unique voice personalities to match your brand and audience preferences:
Available Voices
| Voice | Gender | Tone | Best For |
|---|---|---|---|
| Alloy | Neutral | Neutral and balanced | Professional, versatile applications |
| Echo | Male | Warm and friendly | Customer service, welcoming interactions |
| Shimmer | Female | Soft and gentle | Calming support, healthcare, wellness |
| Ash | Neutral | Clear and articulate | Technical support, precise instructions |
| Ballad | Female | Smooth and melodic | Storytelling, content delivery |
| Coral | Female | Bright and energetic | Sales, upbeat engagement |
| Sage | Male | Calm and wise | Educational content, advisory roles |
| Verse | Neutral | Expressive and dynamic | Creative applications, entertainment |
| Marin | Female | Ocean-inspired calm | Meditation, relaxation services |
| Cedar | Male | Natural and grounded | Outdoor brands, authentic communication |
Choosing the Right Voice
Consider your brand personality:
- Professional services: Alloy, Ash, or Sage for clear, authoritative communication
- Customer support: Echo or Shimmer for friendly, approachable interactions
- Healthcare/Wellness: Marin or Shimmer for calming, reassuring tones
- Sales/Marketing: Coral or Ballad for engaging, energetic delivery
- Technical content: Ash or Cedar for clear, precise articulation
Test different voices with your actual content to find the best match for your audience and use case.
How Voice Agent Works
Real-Time Conversation Flow
User Activates Voice Agent Mode
User clicks the waveform icon in the chatbot interface to enable Voice Agent mode for real-time AI voice conversations.
Microphone Permission
Browser requests microphone access. User grants permission to begin voice interaction.
Voice Agent Connects
Voice Agent establishes real-time connection with AI voice service. Status indicator shows "Connecting" then "Ready".
Conversation Begins
- Listening: Voice Agent actively listens to user speech (green indicator)
- Thinking: AI processes speech and generates response (amber indicator)
- Speaking: Voice Agent delivers spoken response (violet indicator)
Continuous Interaction
Conversation continues with natural back-and-forth until user exits voice mode or closes chat.
Status Indicators
Voice Agent provides real-time visual feedback:
| Status | Color | Meaning |
|---|---|---|
| Connecting | Gray | Establishing connection to voice service |
| Ready | Green | Connected and ready for conversation |
| Listening | Green | Actively capturing user speech |
| Thinking | Amber | Processing speech and generating response |
| Speaking | Violet | Delivering AI voice response |
| Error | Red | Connection issue or error occurred |
The circular waveform visualizer responds to audio levels, providing engaging visual feedback during conversations.
Voice Agent vs Dictate Mode
Support BV provides two distinct voice input methods to suit different user needs:
Voice Agent Mode (Waveform Icon)
Full AI voice conversation with spoken responses
- Click the waveform icon to activate
- Real-time two-way voice conversation with AI
- AI listens to your speech AND responds with voice
- Uses advanced voice models (GPT-4o Realtime or GPT-4o Realtime Mini)
- Consumes 5-15 credits per conversation turn
- Provides visual status indicators (Listening, Thinking, Speaking)
- Includes waveform visualizer for audio feedback
- Perfect for hands-free conversations and accessibility
Dictate Mode (Mic Icon 🎤)
Speech-to-text input only (no AI voice response)
- Click the mic icon to activate
- Converts your speech to text in the input field
- AI responds with text only (no voice output)
- Uses browser's built-in Web Speech API
- No additional credits consumed (standard message credits only)
- Red pulsing icon indicates active recording
- Useful for faster typing or hands-free message input
- Works offline in supported browsers
Choose Voice Agent mode when you want natural spoken conversations with AI voice responses. Choose Dictate mode when you just want to speak your message instead of typing, but prefer text-based responses.
Key Features
Natural Speech Recognition
- Understands natural spoken language with high accuracy
- Handles accents, speech patterns, and conversational flow
- Processes speech in real-time without delays
Human-Like Voice Synthesis
- Natural-sounding voices with proper intonation and prosody
- Emotionally appropriate responses matching conversation context
- Smooth, professional delivery without robotic artifacts
Hands-Free Interaction
- Perfect for users who prefer speaking over typing
- Accessibility feature for users with mobility or vision challenges
- Multitasking support - users can speak while doing other activities
Visual Feedback
- Circular waveform visualizer shows real-time audio levels
- Status indicators provide clear conversation state
- Glassmorphic UI with smooth animations and modern design
- Color-coded states for intuitive understanding
Seamless Integration
- Works with all your existing Support BV training data
- Maintains conversation context and memory
- Follows your configured personality and AI settings
- Integrates with Action Map workflows
Requirements
Browser Support
Voice Agent works in modern browsers with Web Audio API and MediaRecorder support:
- Chrome/Edge: Version 80+
- Safari: Version 14+
- Firefox: Version 76+
- Mobile browsers: iOS Safari 14.5+, Chrome Mobile
Microphone Access
Users must grant microphone permissions when activating voice mode:
- Browser displays permission prompt on first use
- User clicks "Allow" to enable microphone access
- Permission is remembered for future sessions
Privacy Note: Audio is processed securely through encrypted connections. No voice data is stored permanently.
Internet Connection
Voice Agent requires stable internet connection for real-time processing:
- Minimum: 1 Mbps upload speed
- Recommended: 3+ Mbps for optimal quality
- Latency: Lower latency improves conversation flow
Best Practices
Choose Appropriate Voice
Match voice selection to your use case:
- Professional contexts: Choose clear, neutral voices (Alloy, Ash)
- Friendly support: Use warm, welcoming voices (Echo, Shimmer)
- Specialized contexts: Select voices that fit your brand personality
Test Thoroughly
Before deploying voice mode:
- Test with different accents and speaking speeds
- Verify responses are appropriate when spoken aloud
- Check that voice personality matches written personality
- Test on various devices and browsers
Monitor Voice Usage
Track voice conversation performance:
- Review voice conversation transcripts in Chats section
- Monitor credit usage for voice interactions
- Identify common voice use cases and optimize
- Gather user feedback on voice experience
Use Cases
Customer Support
Hands-free troubleshooting: Users can describe issues while working on their device or product.
Multi-step guidance: Voice Agent can walk users through complex procedures step-by-step.
Quick status checks: "What's the status of my order?" - instant spoken response.
Accessibility
Vision impairment support: Screen reader users can have natural voice conversations.
Mobility challenges: Users with difficulty typing can speak their questions.
Dyslexia/reading difficulties: Voice mode removes reading/writing barriers.
Mobile Users
On-the-go support: Users can get help while driving (hands-free), walking, or multitasking.
Faster than typing: Speaking is often quicker than mobile keyboard input.
Better experience: More natural interaction on small screens.
Technical Support
Complex troubleshooting: Users can describe technical issues in detail verbally.
Real-time guidance: Voice Agent can provide step-by-step technical instructions.
Diagnostic conversations: Natural back-and-forth to identify and resolve issues.
Healthcare & Wellness
Appointment scheduling: Voice-based booking and confirmation.
Symptom discussions: Patients can describe symptoms naturally.
Medication reminders: Friendly voice reminders and confirmations.
Enterprise & B2B
Executive assistance: Voice-based scheduling, information retrieval.
Internal support: Employees can ask HR/IT questions hands-free.
Voice-activated help desk: Quick access to company information.
Troubleshooting
Advanced Configuration
Voice + Personality Settings
Voice Agent respects your personality configuration:
- Tone: Voice delivery matches configured tone (professional, friendly, casual)
- Response style: Follows answer strategy (direct, conversational, guided)
- Brand voice: Maintains brand personality in spoken responses
- Custom instructions: Applies any custom personality prompts
Monitoring Voice Conversations
Track voice interactions in the Chats section:
- View conversation transcripts (speech-to-text)
- See which users prefer voice mode
- Analyze voice conversation patterns
- Monitor voice-specific issues or errors
- Export voice conversation data