Text-to-Speech WebSocket Streaming

Overview

The Telnyx Text-to-Speech (TTS) WebSocket API provides real-time audio synthesis from text input. This streaming endpoint allows you to send text and receive synthesized audio incrementally, enabling low-latency voice generation for real-time applications.

Video Demos

Watch these demonstrations to see the Telnyx Text-to-Speech in action:

Convert text to speech in REAL TIME | Python | TTS websocket streaming

Telnyx Text-to-Speech API Use-case Demo

Telnyx TTS Audio Reader

WebSocket Endpoint

Connection URL

wss://api.telnyx.com/v2/text-to-speech/speech?voice={voice_id}

Query Parameters

Parameter	Type	Required	Description
`voice`	string	Yes	Voice identifier (e.g., `Telnyx.NaturalHD.astra`)
`inactivity_timeout`	integer	No	Time without message to keep the WebSocket open (default: 20 seconds)

Authentication

Include your Telnyx API token as an Authorization header in the connection request:

Authorization: Bearer YOUR_TELNYX_TOKEN

Example Connection

import websockets

url = "wss://api.telnyx.com/v2/text-to-speech/speech?voice=Telnyx.NaturalHD.astra"
headers = {
    "Authorization": "Bearer YOUR_TELNYX_TOKEN"
}

websocket = await websockets.connect(url, extra_headers=headers)

Available Voices

Telnyx offers high-quality text-to-speech voices across multiple models, languages, and voice types, including AWS Polly Text-to-Speech services and Azure AI Speech. Use the interactive explorer below to browse and filter Telnyx voices by model, language, and gender characteristics.

For AWS Polly voices, see the complete AWS Polly voice list. For Azure AI Speech voices, explore the Azure AI Speech voice gallery.

Loading voices...

Connection Flow

The TTS WebSocket follows this lifecycle:

Connect - Establish WebSocket connection with authentication.
Initialize - Send initialization frame with space character.
Send Text - Send one or more text frames to synthesize.
Receive Audio - Receive audio frames with base64-encoded mp3 data.
Stop - Send empty text frame to signal completion.
Close - Connection closes after processing completes.

Flow Diagram

Client                          Server
  |                               |
  |------- Connect -------------->|
  |<------ Connected -------------|
  |                               |
  |------- Init Frame ----------->|
  |       {"text": " "}           |
  |                               |
  |------- Text Frame ----------->|
  |       {"text": "Hello"}       |
  |                               |
  |<------ Audio Frame -----------|
  |       {"audio": "base64..."}  |
  |<------ Audio Frame -----------|
  |       {"audio": "base64..."}  |
  |                               |
  |------- Stop Frame ----------->|
  |       {"text": ""}            |
  |                               |
  |<------ Close -----------------|

{
  "text": " "
}

Example:

import json

init_frame = {"text": " "}
await websocket.send(json.dumps(init_frame))

Notes:

Must be sent first after connection.
Contains a single space character.
Required to begin the session.

2. Text Frame

Purpose: Send text content to be synthesized into speech

Format:

{
  "text": "Your text content here"
}

Example:

text_frame = {"text": "Hello, this is a test of the Telnyx TTS service."}
await websocket.send(json.dumps(text_frame))

Multiple Text Frames:

# You can send multiple text frames sequentially
frames = [
    {"text": "First sentence."},
    {"text": "Second sentence."},
    {"text": "Third sentence."}
]

for frame in frames:
    await websocket.send(json.dumps(frame))
    await asyncio.sleep(0.5) 

Notes:

Can send multiple text frames in one session.
Each frame is processed and synthesized separately.
Audio is returned incrementally for each text frame.

3. Stop Frame

Purpose: Signal completion of text input and end the session

Format:

{
  "text": ""
}

Example:

stop_frame = {"text": ""}
await websocket.send(json.dumps(stop_frame))

Notes:

Contains an empty string.
Signals the server to finish processing.
Should be sent after all text frames.

Inbound Frames (Server → Client)

The server sends JSON text messages containing synthesized audio data.

Audio Frame

Purpose: Deliver synthesized audio data

Format:

{
  "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAAB9AAACABAAZGF0YQAAAAA="
}

Processing Audio:

import base64

async for message in websocket:
    data = json.loads(message)

    if "audio" in data:
        # Decode base64 audio
        audio_bytes = base64.b64decode(data["audio"])

        # Save or process audio
        with open("output.mp3", "ab") as f:
            f.write(audio_bytes)

Audio Specifications:

Property	Value
Format	mp3
Sample Rate	16 kHz
Bit Depth	16-bit
Channels	Mono (1)
Encoding	Base64

Notes:

Multiple audio frames may be received for a single text input.
Each audio chunk is a complete mp3 file with headers.
Chunks should be concatenated in the order received.
Use append mode when saving to file to preserve all audio.

Complete Example

Here's a complete example showing all frame types in sequence:

import asyncio
import json
import base64
import websockets

async def tts_example():
    # 1. Connect to WebSocket
    url = "wss://api.telnyx.com/v2/text-to-speech/speech?voice=Telnyx.NaturalHD.astra"
    headers = {
        "Authorization": "Bearer YOUR_TELNYX_TOKEN"
    }

    async with websockets.connect(url, extra_headers=headers) as ws:
        print("Connected to TTS WebSocket")

        # 2. Send initialization frame
        init_frame = {"text": " "}
        await ws.send(json.dumps(init_frame))
        print("Sent: Initialization frame")

        # 3. Send text frame
        text_frame = {"text": "Hello, welcome to Telnyx Text-to-Speech streaming."}
        await ws.send(json.dumps(text_frame))
        print("Sent: Text frame")

        # 4. Receive audio frames
        audio_count = 0
        async for message in ws:
            data = json.loads(message)

            if "audio" in data:
                audio_count += 1
                audio_bytes = base64.b64decode(data["audio"])

                # Append audio chunks to file
                with open("output.mp3", "ab") as f:
                    f.write(audio_bytes)

                print(f"Received: Audio frame #{audio_count} ({len(audio_bytes)} bytes)")

                # After receiving audio, send stop frame
                if audio_count >= 10:  # Adjust based on your needs
                    # 5. Send stop frame
                    stop_frame = {"text": ""}
                    await ws.send(json.dumps(stop_frame))
                    print("Sent: Stop frame")

        print("Connection closed")

asyncio.run(tts_example())

Expected Output:

Connected to TTS WebSocket
Sent: Initialization frame
Sent: Text frame
Received: Audio frame #1 (8192 bytes)
Received: Audio frame #2 (6144 bytes)
Received: Audio frame #3 (4096 bytes)
Sent: Stop frame
Connection closed

Configuration Summary

Required Configuration

# WebSocket URL
ENDPOINT = "wss://api.telnyx.com/v2/text-to-speech/speech"
VOICE_ID = "Telnyx.NaturalHD.astra"
URL = f"{ENDPOINT}?voice={VOICE_ID}"

# Authentication Header
HEADERS = {
    "Authorization": f"Bearer {TELNYX_TOKEN}"
}

Message Sequence

# 1. Initialization
{"text": " "}

# 2. Text to synthesize (can send multiple)
{"text": "Your text here"}

# 3. Stop signal
{"text": ""}

Demo Project

A complete Python implementation is available under the link.

Troubleshooting

Issue	Solution
Connection fails	Verify token format: `Bearer YOUR_TOKEN`
No audio received	Ensure initialization frame sent first
Audio is garbled	Check base64 decoding and file append mode
Empty audio file	Confirm text frame contains valid content

Text-to-Speech WebSocket Streaming

Overview

Video Demos

Convert text to speech in REAL TIME | Python | TTS websocket streaming

Telnyx Text-to-Speech API Use-case Demo

Telnyx TTS Audio Reader

WebSocket Endpoint

Connection URL

Query Parameters

Authentication

Example Connection

Available Voices

Connection Flow

Flow Diagram

Frame Types

Outbound Frames (Client → Server)

1. Initialization Frame

2. Text Frame

3. Stop Frame

Inbound Frames (Server → Client)

Audio Frame

Complete Example

Configuration Summary

Required Configuration

Message Sequence

Demo Project

Troubleshooting

Additional Resources