Realtime Transcript

Stream live meeting transcripts with speaker identification as the meeting happens.

Overview

When realtimeTranscript is enabled on a bot, the bot streams audio to Gladia's Live API during the meeting and delivers labeled transcript chunks in realtime via WebSocket. Speaker names are automatically matched to transcript segments using the bot's participant detection.

Enabling Realtime Transcript

Set realtimeTranscript: true when creating a bot:

curl -X POST https://botapi.syntrimeet.com/api/v1/bots \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "botDisplayName": "SyntriBot",
    "meetingInfo": {
      "platform": "google",
      "meetingUrl": "https://meet.google.com/abc-defg-hij"
    },
    "realtimeTranscript": true
  }'

Subscribing to Live Transcript

Connect to the WebSocket endpoint to receive transcript chunks in realtime.

Endpoint

wss://botapi.syntrimeet.com/api/v1/bots/{botId}/transcript/live?token=JWT_TOKEN

Authentication

Method	How
JWT Token	Pass as query parameter: `?token=eyJ...`
API Key	Send in header: `X-API-Key: sk_...`

JavaScript Example

const botId = 134;
const token = "your-jwt-token";

const ws = new WebSocket(
  `wss://botapi.syntrimeet.com/api/v1/bots/${botId}/transcript/live?token=${token}`
);

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);

  switch (message.type) {
    case "transcript":
      console.log(
        `[${message.data.speaker.name || 'Speaker ' + message.data.speaker.id}]: ${message.data.transcript}`
      );
      break;

    case "speaker_update":
      console.log("Speaker mappings updated:", message.data.mappings);
      break;

    case "stream_ended":
      console.log("Meeting ended:", message.data.reason);
      ws.close();
      break;
  }
};

ws.onclose = () => console.log("Disconnected");
ws.onerror = (err) => console.error("WebSocket error:", err);

Python Example

import asyncio
import json
import websockets

async def listen_transcript(bot_id: int, token: str):
    uri = f"wss://botapi.syntrimeet.com/api/v1/bots/{bot_id}/transcript/live?token={token}"

    async with websockets.connect(uri) as ws:
        async for message in ws:
            data = json.loads(message)

            if data["type"] == "transcript":
                chunk = data["data"]
                speaker = chunk["speaker"]["name"] or f"Speaker {chunk['speaker']['id']}"
                print(f"[{speaker}]: {chunk['transcript']}")

            elif data["type"] == "stream_ended":
                print(f"Stream ended: {data['data']['reason']}")
                break

asyncio.run(listen_transcript(bot_id=134, token="your-jwt-token"))

Message Types

`transcript`

Delivered for each transcript segment. Includes both interim (partial) and final results.

{
  "type": "transcript",
  "data": {
    "transcript": "Hello everyone, welcome to the meeting",
    "speaker": {
      "id": 0,
      "name": "Shashank Shahare",
      "confidence": 0.85
    },
    "isFinal": true,
    "timestamp": 12.5,
    "words": [
      { "word": "Hello", "start": 12.5, "end": 12.8, "confidence": 0.98 },
      { "word": "everyone", "start": 12.9, "end": 13.3, "confidence": 0.95 }
    ]
  }
}

Field	Type	Description
`transcript`	string	The transcribed text
`speaker.id`	number	Speaker ID (0, 1, 2...)
`speaker.name`	string \| null	Matched participant name, or null if unknown
`speaker.confidence`	number	Confidence of speaker name match (0.0 - 1.0)
`isFinal`	boolean	`true` for final results, `false` for interim/partial
`timestamp`	number	Timestamp in seconds from start of recording
`words`	array	Word-level timing and confidence (optional)

`speaker_update`

Sent when speaker name mappings change.

{
  "type": "speaker_update",
  "data": {
    "mappings": {
      "0": { "name": "Shashank Shahare", "confidence": 0.85 },
      "1": { "name": "Lucky", "confidence": 0.72 }
    }
  }
}

`stream_ended`

Sent when the bot leaves the meeting. The WebSocket connection will close after this message.

{
  "type": "stream_ended",
  "data": {
    "reason": "meeting_ended"
  }
}

Latency

Component	Latency
Audio capture to Gladia	~100ms
Gladia processing	~200-300ms
Server relay to client	~50ms
Total end-to-end	~300-500ms

Limitations

No replay: If you connect after the meeting starts, you only receive transcript from that point forward
No persistence: Realtime chunks are not stored. Use the post-meeting transcript API for the full record
Cost: Gladia Live API charges per audio-second streamed. Only enabled when realtimeTranscript: true
Speaker names: Accuracy improves over the first 30-60 seconds as the confidence scoring stabilizes

Best Practices

Handle interim results: Filter on isFinal: true if you only want complete sentences
Use speaker updates: Listen for speaker_update events to retroactively relabel earlier interim results
Reconnect gracefully: If the WebSocket disconnects, reconnect and you'll resume receiving new chunks
Close on stream_ended: When you receive stream_ended, the meeting is over — close your connection

Overview​

Enabling Realtime Transcript​

Subscribing to Live Transcript​

Endpoint​

Authentication​

JavaScript Example​

Python Example​

Message Types​

transcript​

speaker_update​

stream_ended​

Latency​

Limitations​

Best Practices​