Vectra indexes the unglamorous guts of a business — customer threads, supplier emails, SOPs, decision logs — into a hybrid retrieval system that explains its answers.
Last quarter's supplier dispute resolution. The decision log that explains why we dropped a vendor. The SOP that says how to handle a wholesale order over $20K with a tight deadline.
All written down. None of it findable when it matters. Vectra fixes that — not by replacing your tools, but by making everything you've already written queryable.
Markdown headings stay attached to their text. Code blocks are wrapped before any split, then restored after — splitters never cut through a function. Frontmatter is parsed out and lifted into structured metadata (case IDs, supplier names, SKUs, dates) so they're queryable independent of body text.
LangChain MarkdownHeaderTextSplitter · RecursiveCharacterTextSplitter
974 chunks from 200 docs · ~1,400 char target · 200 char overlap
Summaries name concrete entities — supplier, SKU, customer, dollar amount — so retrieval has dense anchors. The four hypothetical questions function as HyDE: they extend the chunk's findability past its literal wording. Content type (policy · procedure · decision · escalation · incident · product · runbook · communication) becomes a hard filter.
gpt-4o-mini · response_format: json_object · temperature 0.28 chunks per call · 4 parallel workersWe embed the content, the summary, and the joined hypothetical questions separately. At query time, the embedding fans out across all three to find chunks that match by raw text, by abstracted meaning, or by anticipated question.
text-embedding-3-small · 1536dpgvector HNSW · m=16, ef_construction=64 · cosine3 × 1536 floats per chunk · ~36KB before compressionEach query runs through five rankers in parallel: three semantic searches (over content, summary, question vectors), an English full-text search via tsvector, and a trigram similarity check for typos and partial names like "Marisol" matching "Marisol Ceramics". Reciprocal Rank Fusion merges the five rankings into a single ordering.
hybrid_search_vectra() · Postgres SQL functionsemantic 1.0 · FTS 1.2 · trigram 0.5section · content_type · product_tags · supplier_tags
The synthesizer is given the top-k chunks and a strict prompt: cite every claim, refuse to answer when context is insufficient, prefer newer sources when SOPs contradict decision logs. The answer references each source as [^N], hot-linked to the source card on the page.
gpt-4o-mini · temperature 0.2"If sources don't answer, say so plainly. Do not guess."Vector search alone misses exact strings, supplier names, case IDs. Full-text search alone misses paraphrase. Trigram alone is too noisy. Vectra runs all five in parallel and fuses them with Reciprocal Rank Fusion — a tiny, robust function that doesn't need calibration.
score(d) = Σr wr / (60 + rankr(d)) RRF — Cormack et al., 2009
Today: parent/child edges within each document, so retrieval can climb up to the parent SOP or drill into a sub-section. Planned: precedent_for between similar past decisions, supersedes between old and new policy versions, implements_policy from a customer case to the SOP it followed, about_supplier from any chunk that names a vendor.
The graph is what turns "find the chunk" into "find the lineage of this decision."
The corpus below is for a fictional brand, Verdant Home Goods. The retrieval, ranking, and synthesis are real — every answer comes from a chunk indexed in Postgres.
The hard part of an AI-assisted operations platform isn't the LLM. It's the contract between memory, workflows, and execution — deterministic where it has to be, probabilistic where it pays off, with every AI action traceable to the retrieved evidence that triggered it.
What you're looking at is the memory layer. The pipeline is domain-agnostic — swapping the operational system underneath is a config change, not a rewrite.