Vectra — Operational Intelligence Architecture

Chunk

Header-aware splitting, code blocks protected

Enrich

Summary, 4 hypothetical questions, classification

Embed

Three vectors: content, summary, questions

Retrieve

Hybrid: semantic + FTS + trigram, RRF-fused

Synthesize

Grounded answer with inline citations

The pipeline

Five stages. No magic.

01 · Chunk

Header-aware splitting that protects what matters.

Markdown headings stay attached to their text. Code blocks are wrapped before any split, then restored after — splitters never cut through a function. Frontmatter is parsed out and lifted into structured metadata (case IDs, supplier names, SKUs, dates) so they're queryable independent of body text.

Tools LangChain MarkdownHeaderTextSplitter · RecursiveCharacterTextSplitter

Output 974 chunks from 200 docs · ~1,400 char target · 200 char overlap

02 · Enrich

Every chunk gets three indices: a summary, four hypothetical questions, and a type.

Summaries name concrete entities — supplier, SKU, customer, dollar amount — so retrieval has dense anchors. The four hypothetical questions function as HyDE: they extend the chunk's findability past its literal wording. Content type (policy · procedure · decision · escalation · incident · product · runbook · communication) becomes a hard filter.

Modelgpt-4o-mini · response_format: json_object · temperature 0.2

Batch8 chunks per call · 4 parallel workers

03 · Embed

Three vectors per chunk, because users phrase questions three different ways.

We embed the content, the summary, and the joined hypothetical questions separately. At query time, the embedding fans out across all three to find chunks that match by raw text, by abstracted meaning, or by anticipated question.

Modeltext-embedding-3-small · 1536d

Indexpgvector HNSW · m=16, ef_construction=64 · cosine

Storage3 × 1536 floats per chunk · ~36KB before compression

04 · Retrieve

Vector similarity isn't enough. Hybrid fusion is.

Each query runs through five rankers in parallel: three semantic searches (over content, summary, question vectors), an English full-text search via tsvector, and a trigram similarity check for typos and partial names like "Marisol" matching "Marisol Ceramics". Reciprocal Rank Fusion merges the five rankings into a single ordering.

Functionhybrid_search_vectra() · Postgres SQL function

Weightssemantic 1.0 · FTS 1.2 · trigram 0.5

Filterssection · content_type · product_tags · supplier_tags

05 · Synthesize

The model never invents. It can only cite what was retrieved.

The synthesizer is given the top-k chunks and a strict prompt: cite every claim, refuse to answer when context is insufficient, prefer newer sources when SOPs contradict decision logs. The answer references each source as [^N], hot-linked to the source card on the page.

Modelgpt-4o-mini · temperature 0.2

Guardrail"If sources don't answer, say so plainly. Do not guess."

Knowledge graph

Chunks are nodes.
Relationships are edges.

Today: parent/child edges within each document, so retrieval can climb up to the parent SOP or drill into a sub-section. Planned: precedent_for between similar past decisions, supersedes between old and new policy versions, implements_policy from a customer case to the SOP it followed, about_supplier from any chunk that names a vendor.

The graph is what turns "find the chunk" into "find the lineage of this decision."

Operational memory,
made first-class.

The institutional memory of most businesses
lives in places nobody reads.

Five stages. No magic.

Header-aware splitting that protects what matters.

Every chunk gets three indices: a summary, four hypothetical questions, and a type.

Three vectors per chunk, because users phrase questions three different ways.

Vector similarity isn't enough. Hybrid fusion is.

The model never invents. It can only cite what was retrieved.

Five rankers,
one ordering.

All 974 chunks,
one constellation.

Chunks are nodes.
Relationships are edges.

Ask the institutional memory
of a business that doesn't exist.

Operational memory
is infrastructure,
not a chatbot.

Operational memory, made first-class.

The institutional memory of most businesses lives in places nobody reads.

Five stages. No magic.

Header-aware splitting that protects what matters.

Every chunk gets three indices: a summary, four hypothetical questions, and a type.

Three vectors per chunk, because users phrase questions three different ways.

Vector similarity isn't enough. Hybrid fusion is.

The model never invents. It can only cite what was retrieved.

Five rankers,one ordering.

All 974 chunks, one constellation.

Chunks are nodes. Relationships are edges.

Ask the institutional memory of a business that doesn't exist.

Operational memory is infrastructure, not a chatbot.

Operational memory,
made first-class.

The institutional memory of most businesses
lives in places nobody reads.

Five rankers,
one ordering.

All 974 chunks,
one constellation.

Chunks are nodes.
Relationships are edges.

Ask the institutional memory
of a business that doesn't exist.

Operational memory
is infrastructure,
not a chatbot.