Skip to content

Voice · Vapi

Vapi

Voice agent infrastructure. Build a phone-agent in a weekend.

FREEMIUMAPIWeb

Production voice-agent platform — telephony, STT, LLM, TTS, and interrupt handling stitched together so you call an endpoint and get a working phone agent. Pluggable models at every layer.

Model support

Multi-model

  • Claude
  • GPT
  • Gemini
  • ElevenLabs
  • Deepgram

Compose your own STT + LLM + TTS stack.

Where it runs

  • API
  • Web

Tags

  • #voice-agents
  • #telephony
  • #phone
  • #real-time
Open VapiDocsPricing

Related in Voice

  • View ElevenLabs details
    VoiceFREEMIUMVetted

    ElevenLabs

    ElevenLabs

    Frontier TTS, voice cloning, and dubbing. Industry default.

    Hosted speech synthesis at near-human quality — TTS, voice cloning, multilingual dubbing, and conversational voice agents. Default choice when you need a voice that sounds like a person, not a robot.

    • tts
    • voice-cloning
    • dubbing
    • multilingual
  • View Cartesia details
    VoiceFREEMIUM

    Cartesia

    Cartesia

    Low-latency streaming TTS. Sub-100ms first audio.

    Streaming-first speech synthesis built around the Sonic family of state-space models. Aims at real-time agent voices where latency between turns is the product. Strong choice for sub-200ms voice loops.

    • tts
    • streaming
    • low-latency
    • real-time
  • View Deepgram details
    VoiceFREEMIUM

    Deepgram

    Deepgram

    Production speech-to-text. The STT default for many companies.

    End-to-end speech recognition platform — real-time streaming, batch transcription, speaker diarization, and language detection. Strong on accented speech, telephony audio, and long-form recordings.

    • stt
    • transcription
    • streaming
    • diarization