Start a session
POST /api/agents/agent-session/This is the endpoint the widget calls when the user clicks Start Call. It mints a LiveKit JWT, dispatches the voice agent worker to the room, and returns everything the browser needs to connect.
You can call this endpoint from your own backend to start sessions programmatically (e.g. for native app integrations or server-initiated calls).
Request
All fields except agent are optional. Any field you provide overrides the character’s default for this session only.
curl -X POST https://api.oshara.ai/api/agents/agent-session/ \
-H "Content-Type: application/json" \
-H "Origin: https://yoursite.com" \
-d '{
"agent": "support-bot",
"language": "en",
"greeting": "Hi! How can I help you today?",
"metadata": { "user_id": "u_123", "plan": "pro" }
}'Request body
| Field | Type | Required | Description |
|---|---|---|---|
agent | string | ✓ | Character slug (e.g. "support-bot"). |
language | string | BCP-47 language code. Overrides character default for STT and TTS. | |
system_prompt | string | Override the character’s system prompt for this session only. | |
greeting | string | Override the agent’s opening greeting. | |
voice_model | string | Named voice model (e.g. "chatterbox-en-v1"). | |
reference_audio_url | string | URL of a WAV/MP3 file for voice cloning reference audio. | |
mcp_headers | object | Key-value headers merged into all MCP server calls for this session. | |
origin_url | string | The page URL that initiated the call (auto-sent by the widget; used for logging). | |
metadata | object | Arbitrary key-value pairs passed through to the agent and stored with the session. |
Response
{
"token": "eyJhbGci...",
"livekit_url": "wss://audio-inference.oshara.ai",
"room_name": "oshara-voice-support-bot-user123-abc12",
"participant_identity": "user-123-abc12",
"session_id": "sess_a1b2c3",
"expires_in_seconds": 3600,
"character_slug": "support-bot",
"system_prompt": "You are a helpful support agent...",
"greeting": "Hi! How can I help you today?"
}| Field | Description |
|---|---|
token | LiveKit JWT. Pass this to livekit-client’s Room.connect(). |
livekit_url | WebSocket URL of the LiveKit SFU. |
room_name | Unique room identifier for this session. |
participant_identity | The identity assigned to this browser participant. |
session_id | Oshara session ID. Use this to retrieve form responses and logs. |
expires_in_seconds | Token TTL in seconds (default 3600). |
character_slug | Confirmed character slug. |
system_prompt | The effective system prompt (character default or override). |
greeting | The effective greeting the agent will speak. |
Errors
| Status | Meaning |
|---|---|
403 Forbidden | Origin not in the character’s allowed_origins list. |
404 Not Found | Character slug not found or inactive. |
429 Too Many Requests | Billing limit reached for concurrent sessions. |
Connecting with the LiveKit SDK
Once you have a token, connect using the official LiveKit client:
import { Room } from "livekit-client";
const room = new Room();
await room.connect(response.livekit_url, response.token);The widget handles this automatically. The pattern above is for custom native-app or server-side integrations.
Passing user context to the agent
Use the metadata field to pass arbitrary context about the current user. The voice agent receives this metadata and can reference it in tool calls and its reasoning:
{
"agent": "support-bot",
"metadata": {
"user_id": "u_42",
"account_tier": "enterprise",
"previous_tickets": 3
}
}The agent worker embeds metadata in its internal context; individual tool calls can include metadata fields as arguments if your tools are configured to accept them.