Speech-to-Text with Voice API and TeXML
Introduction
In this tutorial, we will cover how to get a speech-to-text transcription of your calls using Voice API and TeXML.
Before starting, please ensure your Voice API or TeXML application is correctly configured.
Voice API
The transcription can be enabled for the Voice API calls using a dedicated endpoint in the following way:
NoteDon't forget to update
YOUR_API_KEY
here.
curl -i -X POST \
'https://api.telnyx.com/v2/calls/{call_control_id}/actions/transcription_start' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"language": "en",
"client_state": "aGF2ZSBhIG5pY2UgZGF5ID1d",
"command_id": "891510ac-f3e4-11e8-af5b-de00688a4901"
"transcription_engine" = "A/B "
}'
Telnyx offers two different speech-to-text engines that can be used to process the audio from the call into a transcription:
- A (default) - Google speech-to-text engine that offers additional features like interim results.
- B - In-house Telnyx speech-to-text engine with significantly better transcription accuracy and lower latency.
The results are sent as a webhook delivered to the webhook defined for the Voice API application:
"data": {
"record_type": "event",
"event_type": "call.transcription",
"id": "0ccc7b54-4df3-4bca-a65a-3da1ecc777f0",
"occurred_at": "2018-02-02T22:25:27.521992Z",
"payload": {
"call_control_id": "v2:7subYr8fLrXmaAXm8egeAMpoSJ72J3SGPUuome81-hQuaKRf9b7hKA",
"call_leg_id": "5ca81340-5beb-11eb-ae45-02420a0f8b69",
"call_session_id": "5ca81eee-5beb-11eb-ba6c-02420a0f8b69",
"client_state": null,
"connection_id": "1240401930086254526",
"transcription_data": {
"confidence": 0.977219,
"is_final": true,
"transcript": "hello this is a test speech"
}
}
}
TeXML
You can enable transcription on your TeXML calls by including a <Transcription>
verb in the TeXML instructions:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Start>
<Transcription language="en" transcriptionCallback="/transcription" transcriptionEngine=”B” />
</Start>
</Response>
The transcription results are sent in the callback in the following format:
%{
"AccountSid" : "6d547b4f-993a-4e87-b95c-2d9460b3824b",
"CallSid" : "v3:xIscDTsILHoErg5d4BfFWITg7vHmTvTRm-4YEeOgrwESDQsDWQNxvw",
"CallSidLegacy" : "v3:xIscDTsILHoErg5d4BfFWITg7vHmTvTRm-4YEeOgrwESDQsDWQNxvw",
"Confidence" : "0.9822598695755005",
"ConnectionId" : "1614262910593271041",
"From" : "+18727726007",
"IsFinal" : "true",
"To" : "+48664087895",
"Transcript" : "let's hear some music"
}