Oshara Docs

Start a session


POST /api/agents/agent-session/

This is the endpoint the widget calls when the user clicks Start Call. It mints a LiveKit JWT, dispatches the voice agent worker to the room, and returns everything the browser needs to connect.

You can call this endpoint from your own backend to start sessions programmatically (e.g. for native app integrations or server-initiated calls).

Request

All fields except agent are optional. Any field you provide overrides the character’s default for this session only.


curl -X POST https://api.oshara.ai/api/agents/agent-session/ \
  -H "Content-Type: application/json" \
  -H "Origin: https://yoursite.com" \
  -d '{
    "agent": "support-bot",
    "language": "en",
    "greeting": "Hi! How can I help you today?",
    "metadata": { "user_id": "u_123", "plan": "pro" }
  }'

Request body

Field	Type	Required	Description
`agent`	string	✓	Character slug (e.g. `"support-bot"`).
`language`	string		BCP-47 language code. Overrides character default for STT and TTS.
`system_prompt`	string		Override the character’s system prompt for this session only.
`greeting`	string		Override the agent’s opening greeting.
`voice_model`	string		Named voice model (e.g. `"chatterbox-en-v1"`).
`reference_audio_url`	string		URL of a WAV/MP3 file for voice cloning reference audio.
`mcp_headers`	object		Key-value headers merged into all MCP server calls for this session.
`origin_url`	string		The page URL that initiated the call (auto-sent by the widget; used for logging).
`metadata`	object		Arbitrary key-value pairs passed through to the agent and stored with the session.

Response


{
  "token": "eyJhbGci...",
  "livekit_url": "wss://audio-inference.oshara.ai",
  "room_name": "oshara-voice-support-bot-user123-abc12",
  "participant_identity": "user-123-abc12",
  "session_id": "sess_a1b2c3",
  "expires_in_seconds": 3600,
  "character_slug": "support-bot",
  "system_prompt": "You are a helpful support agent...",
  "greeting": "Hi! How can I help you today?"
}

Field	Description
`token`	LiveKit JWT. Pass this to `livekit-client`’s `Room.connect()`.
`livekit_url`	WebSocket URL of the LiveKit SFU.
`room_name`	Unique room identifier for this session.
`participant_identity`	The identity assigned to this browser participant.
`session_id`	Oshara session ID. Use this to retrieve form responses and logs.
`expires_in_seconds`	Token TTL in seconds (default 3600).
`character_slug`	Confirmed character slug.
`system_prompt`	The effective system prompt (character default or override).
`greeting`	The effective greeting the agent will speak.

Errors

Status	Meaning
`403 Forbidden`	Origin not in the character’s `allowed_origins` list.
`404 Not Found`	Character slug not found or inactive.
`429 Too Many Requests`	Billing limit reached for concurrent sessions.

Connecting with the LiveKit SDK

Once you have a token, connect using the official LiveKit client:


import { Room } from "livekit-client";
 
const room = new Room();
await room.connect(response.livekit_url, response.token);

The widget handles this automatically. The pattern above is for custom native-app or server-side integrations.

Passing user context to the agent

Use the metadata field to pass arbitrary context about the current user. The voice agent receives this metadata and can reference it in tool calls and its reasoning:


{
  "agent": "support-bot",
  "metadata": {
    "user_id": "u_42",
    "account_tier": "enterprise",
    "previous_tickets": 3
  }
}

The agent worker embeds metadata in its internal context; individual tool calls can include metadata fields as arguments if your tools are configured to accept them.