What is the Knowledge Base?
The Knowledge Base (KB) is a document store with semantic search. You upload PDFs, Word documents, or plain text files; Oshara automatically chunks them and creates vector embeddings. During a call, the agent can search the KB using natural language queries and synthesise answers from the retrieved passages.
Pipeline
Upload document (PDF / DOCX / TXT)
│
▼
Text extraction
│
▼
Chunking (≈500 tokens per chunk)
│
▼
Embedding (OpenAI text-embedding-ada-002)
│
▼
pgvector storage (cosine similarity index)
│
▼
Agent calls kb tool → semantic search → top-k passages → LLM answerDocument statuses
| Status | Meaning |
|---|---|
PENDING | Document uploaded, awaiting processing. |
PROCESSED | Chunks extracted and embedded. Ready for search. |
FAILED | Processing error (unsupported format, corrupt file, etc.). |
Attaching the KB to an agent
Documents are attached per-character. Upload documents to a character’s slug and they are automatically available to any kb tool defined on that character.
See:
- Documents API — upload and list documents
- Query API — run a semantic search programmatically
- Knowledge Base Tools — how to define a
kbtool on your character
Best practices
| Tip | Details |
|---|---|
| One topic per file | Splitting content by topic (e.g. refund-policy.pdf, shipping-faq.pdf) improves retrieval precision. |
| Descriptive filenames | The filename is returned in search results as metadata.source — the LLM uses it to cite sources. |
| Clean text | PDFs with complex layouts, scanned images, or lots of tables may extract poorly. Use TXT or DOCX for structured content. |
| Remove boilerplate | Legal disclaimers and headers repeated across many pages consume embedding budget and dilute results. |
| Size limit | Individual files up to 50 MB. No limit on total documents per character. |
Last updated on