Voice · Vapi

Vapi

Voice agent infrastructure. Build a phone-agent in a weekend.

FREEMIUMAPIWeb

Production voice-agent platform — telephony, STT, LLM, TTS, and interrupt handling stitched together so you call an endpoint and get a working phone agent. Pluggable models at every layer.

Model support

Multi-model

Claude
GPT
Gemini
ElevenLabs
Deepgram

Compose your own STT + LLM + TTS stack.

Where it runs

Tags

#voice-agents
#telephony
#phone
#real-time

Open Vapi Docs Pricing

Related in Voice

View ElevenLabs details
VoiceFREEMIUMVetted
ElevenLabs
ElevenLabs
Frontier TTS, voice cloning, and dubbing. Industry default.
Hosted speech synthesis at near-human quality — TTS, voice cloning, multilingual dubbing, and conversational voice agents. Default choice when you need a voice that sounds like a person, not a robot.
- tts
- voice-cloning
- dubbing
- multilingual
Open
View Cartesia details
VoiceFREEMIUM
Cartesia
Cartesia
Low-latency streaming TTS. Sub-100ms first audio.
Streaming-first speech synthesis built around the Sonic family of state-space models. Aims at real-time agent voices where latency between turns is the product. Strong choice for sub-200ms voice loops.
- tts
- streaming
- low-latency
- real-time
Open
View Deepgram details
VoiceFREEMIUM
Deepgram
Deepgram
Production speech-to-text. The STT default for many companies.
End-to-end speech recognition platform — real-time streaming, batch transcription, speaker diarization, and language detection. Strong on accented speech, telephony audio, and long-form recordings.
- stt
- transcription
- streaming
- diarization
Open

Open Vapi

Multi-model

ElevenLabs

Cartesia

Deepgram