Operational intelligence

Operational memory,
made first-class.

Vectra indexes the unglamorous guts of a business — customer threads, supplier emails, SOPs, decision logs — into a hybrid retrieval system that explains its answers.

974chunks indexed
1536vector dimensions
108graph edges
<1sretrieval
The problem

The institutional memory of most businesses
lives in places nobody reads.

Last quarter's supplier dispute resolution. The decision log that explains why we dropped a vendor. The SOP that says how to handle a wholesale order over $20K with a tight deadline.

All written down. None of it findable when it matters. Vectra fixes that — not by replacing your tools, but by making everything you've already written queryable.

01
Chunk
Header-aware splitting, code blocks protected
02
Enrich
Summary, 4 hypothetical questions, classification
03
Embed
Three vectors: content, summary, questions
04
Retrieve
Hybrid: semantic + FTS + trigram, RRF-fused
05
Synthesize
Grounded answer with inline citations
The pipeline

Five stages. No magic.

01 · Chunk

Header-aware splitting that protects what matters.

Markdown headings stay attached to their text. Code blocks are wrapped before any split, then restored after — splitters never cut through a function. Frontmatter is parsed out and lifted into structured metadata (case IDs, supplier names, SKUs, dates) so they're queryable independent of body text.

Tools LangChain MarkdownHeaderTextSplitter · RecursiveCharacterTextSplitter
Output 974 chunks from 200 docs · ~1,400 char target · 200 char overlap
02 · Enrich

Every chunk gets three indices: a summary, four hypothetical questions, and a type.

Summaries name concrete entities — supplier, SKU, customer, dollar amount — so retrieval has dense anchors. The four hypothetical questions function as HyDE: they extend the chunk's findability past its literal wording. Content type (policy · procedure · decision · escalation · incident · product · runbook · communication) becomes a hard filter.

Modelgpt-4o-mini · response_format: json_object · temperature 0.2
Batch8 chunks per call · 4 parallel workers
03 · Embed

Three vectors per chunk, because users phrase questions three different ways.

We embed the content, the summary, and the joined hypothetical questions separately. At query time, the embedding fans out across all three to find chunks that match by raw text, by abstracted meaning, or by anticipated question.

Modeltext-embedding-3-small · 1536d
Indexpgvector HNSW · m=16, ef_construction=64 · cosine
Storage3 × 1536 floats per chunk · ~36KB before compression
04 · Retrieve

Vector similarity isn't enough. Hybrid fusion is.

Each query runs through five rankers in parallel: three semantic searches (over content, summary, question vectors), an English full-text search via tsvector, and a trigram similarity check for typos and partial names like "Marisol" matching "Marisol Ceramics". Reciprocal Rank Fusion merges the five rankings into a single ordering.

Functionhybrid_search_vectra() · Postgres SQL function
Weightssemantic 1.0 · FTS 1.2 · trigram 0.5
Filterssection · content_type · product_tags · supplier_tags
05 · Synthesize

The model never invents. It can only cite what was retrieved.

The synthesizer is given the top-k chunks and a strict prompt: cite every claim, refuse to answer when context is insufficient, prefer newer sources when SOPs contradict decision logs. The answer references each source as [^N], hot-linked to the source card on the page.

Modelgpt-4o-mini · temperature 0.2
Guardrail"If sources don't answer, say so plainly. Do not guess."
Hybrid retrieval

Five rankers,
one ordering.

Vector search alone misses exact strings, supplier names, case IDs. Full-text search alone misses paraphrase. Trigram alone is too noisy. Vectra runs all five in parallel and fuses them with Reciprocal Rank Fusion — a tiny, robust function that doesn't need calibration.

score(d) = Σr wr / (60 + rankr(d)) RRF — Cormack et al., 2009

query "defective ceramics from Marisol"
semantic · contentw 1.0
semantic · summaryw 0.8
semantic · questionsw 0.9
FTS · Englishw 1.2
trigramw 0.5
RRF
1sup_001 · marisol cracked dinnerwareincident
2po_046 · vendor profile · marisolvendor_profile
3po_001 · marisol mug · internal notesproduct
Embedding space

All 974 chunks,
one constellation.

Each dot is a real chunk from the corpus, projected from 1,536-dimensional embedding space down to two principal components. Hover any dot to see what it is. Colors group by content type. Clusters here are the geometry of what your business knows — supplier disputes pulling toward one neighborhood, wholesale policy toward another, customer-thread escalations forming their own arc.

PCA explains roughly 8–12% of variance in the first two components on this corpus — enough to see clusters, not enough to capture all the structure. Real retrieval uses the full 1,536 dimensions.

Knowledge graph

Chunks are nodes.
Relationships are edges.

Today: parent/child edges within each document, so retrieval can climb up to the parent SOP or drill into a sub-section. Planned: precedent_for between similar past decisions, supersedes between old and new policy versions, implements_policy from a customer case to the SOP it followed, about_supplier from any chunk that names a vendor.

The graph is what turns "find the chunk" into "find the lineage of this decision."

Live, not staged

Ask the institutional memory
of a business that doesn't exist.

The corpus below is for a fictional brand, Verdant Home Goods. The retrieval, ranking, and synthesis are real — every answer comes from a chunk indexed in Postgres.

The thesis

Operational memory
is infrastructure,
not a chatbot.

The hard part of an AI-assisted operations platform isn't the LLM. It's the contract between memory, workflows, and execution — deterministic where it has to be, probabilistic where it pays off, with every AI action traceable to the retrieved evidence that triggered it.

What you're looking at is the memory layer. The pipeline is domain-agnostic — swapping the operational system underneath is a config change, not a rewrite.