Open Source · AGPL-3.0 · Agent-Native

Activate Data Brain

Semantic Storage
Built for AI Agents

Agent-Native Semantic Storage — ingest, understand, remember, and audit. One unified engine from document to decision, with hybrid retrieval, reranking, and full causal-chain traceability.

Agent-Native Built for AI Agents
PB-Scale Born to Scale
<10ms Query Latency
AGPL-3.0 Open Source

Defining a New Category

AI agents need more than a vector database. They need a storage engine that speaks their language — natively.

Agent-Native

Not retrofitted for agents — built from day one as the data layer agents think through. Semantic processing, memory, and audit are first-class primitives, not plugins.

Semantic Storage

Beyond vector search. Documents are parsed, chunked, embedded, and indexed in a unified semantic layer — queryable by meaning, not just keywords.

Born to Scale: PB to EB

Architectured for petabyte-to-exabyte workloads from the ground up. Horizontal scaling, namespace isolation, and storage tiering are native — not afterthoughts.

One Engine. Complete Agent Data Stack.

Cortrix consolidates semantic ingestion, hybrid retrieval with reranking, interaction memory, and causal-chain audit into a single engine with REST, MCP, and SDK interfaces.

Semantic Processing Chain (SPC)

Ingest any document — PDF, Word, Markdown — through an automated pipeline: Docling parsing with OCR fallback, parent-child chunking, NER + summary enrichment, embedding, and indexing in one step.

Core

Hybrid Query + Reranker

Vector similarity (P-HNSW with WAL persistence) + BM25 keyword search fused via RRF, then precision-reranked with bge-reranker-v2-m3 — for best-in-class retrieval accuracy.

Core

AI Interaction Memory

Persistent conversation memory with session management, LLM-based fact extraction, typed memory (fact / preference / event) with decay, per-user isolation, transparency APIs, and full audit trail.

Core

Agent Observability & Self-Learning

Session, trace, and agent headers flow end-to-end. Retrieval feedback captures which chunks actually helped — every chunk accumulates a useful_ratio score that quietly reranks future results. Your storage gets smarter with every query, no retraining required.

Core

Causal-Chain Traceability

Open-source content-level provenance, not just call indexes. Every answer links back to the exact chunks, documents, and conversation turns that produced it — built-in citation tracing, retrieval attribution, and reasoning chain for debugging and trust.

Core

Namespaces + Cross-NS Query

Isolate data by project, team, or tenant with independent storage and indexes. Query across any subset of namespaces in parallel via scatter-gather, with reranker-unified ranking.

Core

Advanced RAG Techniques

Parent-Child chunking for precision + context, Contextual Retrieval for chunk disambiguation, RAG-Fusion multi-query, CRAG retrieval grading, and HyPE hypothetical-question indexing — built in, not bolted on.

Core

pgCortrix Extension

Semantic storage as a PostgreSQL extension. Bring agent-native capabilities directly into your existing Postgres infrastructure — no separate service needed.

Integration

MCP Server

Model Context Protocol server exposes Cortrix capabilities to Claude Code, Cursor, and any MCP-compatible AI tool out of the box.

Integration

Built-in Embedding (bge-m3)

ONNX Runtime integration with bge-m3 model for multilingual embeddings. No external embedding service needed.

Built-in

Connector Ecosystem

Plug into LangChain, Dify, RAGFlow, and any MCP/CLI/REST workflow. Connectors for HTTP upload, directory watch, DB import, and custom data sources.

Ecosystem

Python SDK + REST + Web UI

pip install cortrix for type-safe Python clients, a clean HTTP API for any language, and a built-in dashboard for document management, search, and AI chat.

Interface

Architecture

A vertically integrated engine — from document ingestion to hybrid retrieval with reranking to causal-chain audit — designed for PB-scale agent workloads.

Cortrix Architecture — Agent-native semantic storage engine with SPC pipeline, hybrid query engine with reranker, memory and self-learning, storage layer, and connectors

Docker or PG Extension

Deploy via docker-compose one-command stack, standalone server, or pgcortrix PostgreSQL extension — whatever fits your stack.

Learns From Every Query

Retrieval feedback closes the loop: each chunk accumulates a useful_ratio based on what actually helped the agent, quietly lifting relevance over time — no fine-tuning, no extra pipeline.

Universal Connectors

Native integration with LangChain, Dify, RAGFlow, MCP, and CLI. Plus HTTP upload and filesystem watchers.

Get Started in Minutes

From zero to semantic search in three commands.

1

Pull & Run

Terminal
docker pull cortrix/cortrix:latest
docker run -d -p 8080:8080 --name cortrix cortrix/cortrix:latest
2

Upload Documents

Terminal
curl -X POST http://localhost:8080/api/v1/documents/upload \
  -F "file=@your-document.pdf" \
  -F "namespace=default"
3

Semantic Search

Terminal
curl -X POST http://localhost:8080/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How does authentication work?", "namespace": "default"}'

Or use the Python SDK:

Python
pip install cortrix

from cortrix import Client
client = Client("http://localhost:8080")
client.documents.upload("your-document.pdf", namespace="default")
results = client.query("How does authentication work?", namespace="default")

Built for the Agents That Matter

Cortrix is the data backbone for autonomous AI agents — not coding assistants, but agents that run real business processes.

Autonomous AI Agents (OpenClaw & Beyond)

The next wave of AI agents — like OpenClaw — need persistent semantic memory and auditable decision trails. Cortrix provides both natively.

AI Employee Audit & Compliance

Full causal-chain traceability for AI workers. Every decision, every data source, every reasoning step — stored and retrievable at content level, not just index level.

Workflow Orchestration

Integrate with LangChain, Dify, RAGFlow, and any MCP/CLI-based workflow. Cortrix acts as the semantic layer your orchestrator reads and writes to.

Enterprise Knowledge Infrastructure

From PostgreSQL (pgCortrix) to standalone engine — deploy semantic storage wherever your data lives. CDC connectors keep everything in sync.

How Cortrix Compares

A unified engine vs. assembling pieces.

Capability Cortrix Vector DB + RAG Framework
Document Ingestion Built-in SPC pipeline Separate parser + chunker
Embedding Built-in (bge-m3, ONNX) External API call
Vector Search P-HNSW, in-process Separate vector DB
Keyword Search FTS5 + BM25 Often missing or separate
Hybrid Fusion RRF built-in Custom glue code
Reranker bge-reranker-v2-m3 built-in External service or missing
Cross-Namespace Query Scatter-gather + unified ranking Client-side orchestration
Advanced RAG Parent-Child / Contextual / CRAG / HyPE DIY or framework plugins
AI Memory Typed memory with decay + LLM extraction Not included
Causal-Chain Traceability Content-level citation tracing Index-level only (if any)
Agent Observability Session / trace / feedback signals Not included
Self-Learning Retrieval useful_ratio feedback · no retraining Not included
PostgreSQL Integration pgCortrix extension Separate service
Workflow Connectors LangChain / Dify / RAGFlow / MCP Framework-specific
MCP Server Built-in Not available
Scale Target PB ~ EB native GB ~ TB typical
Deployment Single binary / Docker / PG ext Multiple services

Join the Community

Cortrix is open source and community-driven. We welcome contributions of all kinds.