Realtime Transcript
Stream live meeting transcripts with speaker identification as the meeting happens.
Overview
When realtimeTranscript is enabled on a bot, the bot streams audio to Gladia's Live API during the meeting and delivers labeled transcript chunks in realtime via WebSocket. Speaker names are automatically matched to transcript segments using the bot's participant detection.
Enabling Realtime Transcript
Set realtimeTranscript: true when creating a bot:
curl -X POST https://botapi.syntrimeet.com/api/v1/bots \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"botDisplayName": "SyntriBot",
"meetingInfo": {
"platform": "google",
"meetingUrl": "https://meet.google.com/abc-defg-hij"
},
"realtimeTranscript": true
}'
Subscribing to Live Transcript
Connect to the WebSocket endpoint to receive transcript chunks in realtime.
Endpoint
wss://botapi.syntrimeet.com/api/v1/bots/{botId}/transcript/live?token=JWT_TOKEN
Authentication
| Method | How |
|---|---|
| JWT Token | Pass as query parameter: ?token=eyJ... |
| API Key | Send in header: X-API-Key: sk_... |
JavaScript Example
const botId = 134;
const token = "your-jwt-token";
const ws = new WebSocket(
`wss://botapi.syntrimeet.com/api/v1/bots/${botId}/transcript/live?token=${token}`
);
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
switch (message.type) {
case "transcript":
console.log(
`[${message.data.speaker.name || 'Speaker ' + message.data.speaker.id}]: ${message.data.transcript}`
);
break;
case "speaker_update":
console.log("Speaker mappings updated:", message.data.mappings);
break;
case "stream_ended":
console.log("Meeting ended:", message.data.reason);
ws.close();
break;
}
};
ws.onclose = () => console.log("Disconnected");
ws.onerror = (err) => console.error("WebSocket error:", err);
Python Example
import asyncio
import json
import websockets
async def listen_transcript(bot_id: int, token: str):
uri = f"wss://botapi.syntrimeet.com/api/v1/bots/{bot_id}/transcript/live?token={token}"
async with websockets.connect(uri) as ws:
async for message in ws:
data = json.loads(message)
if data["type"] == "transcript":
chunk = data["data"]
speaker = chunk["speaker"]["name"] or f"Speaker {chunk['speaker']['id']}"
print(f"[{speaker}]: {chunk['transcript']}")
elif data["type"] == "stream_ended":
print(f"Stream ended: {data['data']['reason']}")
break
asyncio.run(listen_transcript(bot_id=134, token="your-jwt-token"))
Message Types
transcript
Delivered for each transcript segment. Includes both interim (partial) and final results.
{
"type": "transcript",
"data": {
"transcript": "Hello everyone, welcome to the meeting",
"speaker": {
"id": 0,
"name": "Shashank Shahare",
"confidence": 0.85
},
"isFinal": true,
"timestamp": 12.5,
"words": [
{ "word": "Hello", "start": 12.5, "end": 12.8, "confidence": 0.98 },
{ "word": "everyone", "start": 12.9, "end": 13.3, "confidence": 0.95 }
]
}
}
| Field | Type | Description |
|---|---|---|
transcript | string | The transcribed text |
speaker.id | number | Speaker ID (0, 1, 2...) |
speaker.name | string | null | Matched participant name, or null if unknown |
speaker.confidence | number | Confidence of speaker name match (0.0 - 1.0) |
isFinal | boolean | true for final results, false for interim/partial |
timestamp | number | Timestamp in seconds from start of recording |
words | array | Word-level timing and confidence (optional) |
speaker_update
Sent when speaker name mappings change.
{
"type": "speaker_update",
"data": {
"mappings": {
"0": { "name": "Shashank Shahare", "confidence": 0.85 },
"1": { "name": "Lucky", "confidence": 0.72 }
}
}
}
stream_ended
Sent when the bot leaves the meeting. The WebSocket connection will close after this message.
{
"type": "stream_ended",
"data": {
"reason": "meeting_ended"
}
}
Latency
| Component | Latency |
|---|---|
| Audio capture to Gladia | ~100ms |
| Gladia processing | ~200-300ms |
| Server relay to client | ~50ms |
| Total end-to-end | ~300-500ms |
Limitations
- No replay: If you connect after the meeting starts, you only receive transcript from that point forward
- No persistence: Realtime chunks are not stored. Use the post-meeting transcript API for the full record
- Cost: Gladia Live API charges per audio-second streamed. Only enabled when
realtimeTranscript: true - Speaker names: Accuracy improves over the first 30-60 seconds as the confidence scoring stabilizes
Best Practices
- Handle interim results: Filter on
isFinal: trueif you only want complete sentences - Use speaker updates: Listen for
speaker_updateevents to retroactively relabel earlier interim results - Reconnect gracefully: If the WebSocket disconnects, reconnect and you'll resume receiving new chunks
- Close on stream_ended: When you receive
stream_ended, the meeting is over — close your connection