Knowledge Base Overview

What is the Knowledge Base?

The Knowledge Base (KB) is a document store with semantic search. You upload PDFs, Word documents, or plain text files; Oshara automatically chunks them and creates vector embeddings. During a call, the agent can search the KB using natural language queries and synthesise answers from the retrieved passages.

Pipeline


Upload document (PDF / DOCX / TXT)
        │
        ▼
  Text extraction
        │
        ▼
  Chunking (≈500 tokens per chunk)
        │
        ▼
  Embedding (OpenAI text-embedding-ada-002)
        │
        ▼
  pgvector storage (cosine similarity index)
        │
        ▼
Agent calls kb tool → semantic search → top-k passages → LLM answer

Document statuses

Status	Meaning
`PENDING`	Document uploaded, awaiting processing.
`PROCESSED`	Chunks extracted and embedded. Ready for search.
`FAILED`	Processing error (unsupported format, corrupt file, etc.).

Attaching the KB to an agent

Documents are attached per-character. Upload documents to a character’s slug and they are automatically available to any kb tool defined on that character.

See:

Documents API — upload and list documents
Query API — run a semantic search programmatically
Knowledge Base Tools — how to define a kb tool on your character

Best practices

Tip	Details
One topic per file	Splitting content by topic (e.g. `refund-policy.pdf`, `shipping-faq.pdf`) improves retrieval precision.
Descriptive filenames	The filename is returned in search results as `metadata.source` — the LLM uses it to cite sources.
Clean text	PDFs with complex layouts, scanned images, or lots of tables may extract poorly. Use TXT or DOCX for structured content.
Remove boilerplate	Legal disclaimers and headers repeated across many pages consume embedding budget and dilute results.
Size limit	Individual files up to 50 MB. No limit on total documents per character.