BrandVox AI Documentation

Voice Agent

Enable real-time voice conversations with natural speech recognition and synthesis

Voice Agent

Transform your Support BV into an intelligent voice assistant that can have real-time spoken conversations with users. Voice Agent mode provides natural, human-like voice interactions powered by advanced AI speech recognition and synthesis.

Voice Features

Support BV offers two voice input methods:

  • Waveform icon: Activates Voice Agent mode - Real-time AI voice conversations with spoken responses
  • Mic icon: Activates Dictate mode - Speech-to-text input for typing messages (no AI voice response)

Voice Agent mode requires microphone permissions and uses additional credits per conversation turn.

Quick Setup

Access Voice Settings

Navigate to Support BV > Voice Settings tab in your chatbot settings.

Select Voice Model

Choose between cost-effective or premium voice models based on your needs.

Choose Voice Personality

Select from 10 unique voices with different tones and characteristics.

Enable Voice Agent Mode

Users can click the waveform icon in the chatbot interface to activate Voice Agent mode and start real-time voice conversations.

Voice Model Selection

Choose the voice model that balances quality and cost for your use case:

GPT-4o Realtime Mini

Best for: Most use cases, cost-conscious deployments

  • Cost: 5 credits per conversation turn
  • Performance: Fast, efficient voice processing
  • Quality: High-quality speech recognition and natural voice synthesis
  • Ideal for: Customer support, general inquiries, high-volume interactions

70% cheaper than the premium model while maintaining excellent voice quality and responsiveness.

GPT-4o Realtime

Best for: Premium experiences, complex conversations

  • Cost: 15 credits per conversation turn
  • Performance: Enhanced natural language understanding
  • Quality: Superior voice quality with advanced prosody
  • Ideal for: Technical support, complex troubleshooting, executive assistance

Premium voice model with the highest quality speech synthesis and most sophisticated language understanding.

Credits Usage

Voice conversations consume credits per turn (question + answer exchange). A turn includes both user speech input and AI voice response. Credits are deducted from your workspace monthly quota.

Voice Selection

Choose from 10 unique voice personalities to match your brand and audience preferences:

Available Voices

VoiceGenderToneBest For
AlloyNeutralNeutral and balancedProfessional, versatile applications
EchoMaleWarm and friendlyCustomer service, welcoming interactions
ShimmerFemaleSoft and gentleCalming support, healthcare, wellness
AshNeutralClear and articulateTechnical support, precise instructions
BalladFemaleSmooth and melodicStorytelling, content delivery
CoralFemaleBright and energeticSales, upbeat engagement
SageMaleCalm and wiseEducational content, advisory roles
VerseNeutralExpressive and dynamicCreative applications, entertainment
MarinFemaleOcean-inspired calmMeditation, relaxation services
CedarMaleNatural and groundedOutdoor brands, authentic communication

Choosing the Right Voice

Consider your brand personality:

  • Professional services: Alloy, Ash, or Sage for clear, authoritative communication
  • Customer support: Echo or Shimmer for friendly, approachable interactions
  • Healthcare/Wellness: Marin or Shimmer for calming, reassuring tones
  • Sales/Marketing: Coral or Ballad for engaging, energetic delivery
  • Technical content: Ash or Cedar for clear, precise articulation

Test different voices with your actual content to find the best match for your audience and use case.

How Voice Agent Works

Real-Time Conversation Flow

User Activates Voice Agent Mode

User clicks the waveform icon in the chatbot interface to enable Voice Agent mode for real-time AI voice conversations.

Microphone Permission

Browser requests microphone access. User grants permission to begin voice interaction.

Voice Agent Connects

Voice Agent establishes real-time connection with AI voice service. Status indicator shows "Connecting" then "Ready".

Conversation Begins

  • Listening: Voice Agent actively listens to user speech (green indicator)
  • Thinking: AI processes speech and generates response (amber indicator)
  • Speaking: Voice Agent delivers spoken response (violet indicator)

Continuous Interaction

Conversation continues with natural back-and-forth until user exits voice mode or closes chat.

Status Indicators

Voice Agent provides real-time visual feedback:

StatusColorMeaning
ConnectingGrayEstablishing connection to voice service
ReadyGreenConnected and ready for conversation
ListeningGreenActively capturing user speech
ThinkingAmberProcessing speech and generating response
SpeakingVioletDelivering AI voice response
ErrorRedConnection issue or error occurred

The circular waveform visualizer responds to audio levels, providing engaging visual feedback during conversations.

Voice Agent vs Dictate Mode

Support BV provides two distinct voice input methods to suit different user needs:

Voice Agent Mode (Waveform Icon)

Full AI voice conversation with spoken responses

  • Click the waveform icon to activate
  • Real-time two-way voice conversation with AI
  • AI listens to your speech AND responds with voice
  • Uses advanced voice models (GPT-4o Realtime or GPT-4o Realtime Mini)
  • Consumes 5-15 credits per conversation turn
  • Provides visual status indicators (Listening, Thinking, Speaking)
  • Includes waveform visualizer for audio feedback
  • Perfect for hands-free conversations and accessibility

Dictate Mode (Mic Icon 🎤)

Speech-to-text input only (no AI voice response)

  • Click the mic icon to activate
  • Converts your speech to text in the input field
  • AI responds with text only (no voice output)
  • Uses browser's built-in Web Speech API
  • No additional credits consumed (standard message credits only)
  • Red pulsing icon indicates active recording
  • Useful for faster typing or hands-free message input
  • Works offline in supported browsers

Choose Voice Agent mode when you want natural spoken conversations with AI voice responses. Choose Dictate mode when you just want to speak your message instead of typing, but prefer text-based responses.

Key Features

Natural Speech Recognition

  • Understands natural spoken language with high accuracy
  • Handles accents, speech patterns, and conversational flow
  • Processes speech in real-time without delays

Human-Like Voice Synthesis

  • Natural-sounding voices with proper intonation and prosody
  • Emotionally appropriate responses matching conversation context
  • Smooth, professional delivery without robotic artifacts

Hands-Free Interaction

  • Perfect for users who prefer speaking over typing
  • Accessibility feature for users with mobility or vision challenges
  • Multitasking support - users can speak while doing other activities

Visual Feedback

  • Circular waveform visualizer shows real-time audio levels
  • Status indicators provide clear conversation state
  • Glassmorphic UI with smooth animations and modern design
  • Color-coded states for intuitive understanding

Seamless Integration

  • Works with all your existing Support BV training data
  • Maintains conversation context and memory
  • Follows your configured personality and AI settings
  • Integrates with Action Map workflows

Requirements

Browser Support

Voice Agent works in modern browsers with Web Audio API and MediaRecorder support:

  • Chrome/Edge: Version 80+
  • Safari: Version 14+
  • Firefox: Version 76+
  • Mobile browsers: iOS Safari 14.5+, Chrome Mobile

Microphone Access

Users must grant microphone permissions when activating voice mode:

  1. Browser displays permission prompt on first use
  2. User clicks "Allow" to enable microphone access
  3. Permission is remembered for future sessions

Privacy Note: Audio is processed securely through encrypted connections. No voice data is stored permanently.

Internet Connection

Voice Agent requires stable internet connection for real-time processing:

  • Minimum: 1 Mbps upload speed
  • Recommended: 3+ Mbps for optimal quality
  • Latency: Lower latency improves conversation flow

Best Practices

Choose Appropriate Voice

Match voice selection to your use case:

  • Professional contexts: Choose clear, neutral voices (Alloy, Ash)
  • Friendly support: Use warm, welcoming voices (Echo, Shimmer)
  • Specialized contexts: Select voices that fit your brand personality

Test Thoroughly

Before deploying voice mode:

  • Test with different accents and speaking speeds
  • Verify responses are appropriate when spoken aloud
  • Check that voice personality matches written personality
  • Test on various devices and browsers

Monitor Voice Usage

Track voice conversation performance:

  • Review voice conversation transcripts in Chats section
  • Monitor credit usage for voice interactions
  • Identify common voice use cases and optimize
  • Gather user feedback on voice experience

Use Cases

Customer Support

Hands-free troubleshooting: Users can describe issues while working on their device or product.

Multi-step guidance: Voice Agent can walk users through complex procedures step-by-step.

Quick status checks: "What's the status of my order?" - instant spoken response.

Accessibility

Vision impairment support: Screen reader users can have natural voice conversations.

Mobility challenges: Users with difficulty typing can speak their questions.

Dyslexia/reading difficulties: Voice mode removes reading/writing barriers.

Mobile Users

On-the-go support: Users can get help while driving (hands-free), walking, or multitasking.

Faster than typing: Speaking is often quicker than mobile keyboard input.

Better experience: More natural interaction on small screens.

Technical Support

Complex troubleshooting: Users can describe technical issues in detail verbally.

Real-time guidance: Voice Agent can provide step-by-step technical instructions.

Diagnostic conversations: Natural back-and-forth to identify and resolve issues.

Healthcare & Wellness

Appointment scheduling: Voice-based booking and confirmation.

Symptom discussions: Patients can describe symptoms naturally.

Medication reminders: Friendly voice reminders and confirmations.

Enterprise & B2B

Executive assistance: Voice-based scheduling, information retrieval.

Internal support: Employees can ask HR/IT questions hands-free.

Voice-activated help desk: Quick access to company information.

Troubleshooting

Advanced Configuration

Voice + Personality Settings

Voice Agent respects your personality configuration:

  • Tone: Voice delivery matches configured tone (professional, friendly, casual)
  • Response style: Follows answer strategy (direct, conversational, guided)
  • Brand voice: Maintains brand personality in spoken responses
  • Custom instructions: Applies any custom personality prompts

Monitoring Voice Conversations

Track voice interactions in the Chats section:

  • View conversation transcripts (speech-to-text)
  • See which users prefer voice mode
  • Analyze voice conversation patterns
  • Monitor voice-specific issues or errors
  • Export voice conversation data

Next Steps

Voice Agent