The Sovereign Modular
AI Stack
Kompile projects crawl your data and immediately compile knowledge.
Three pillars. One platform. Models, knowledge, and applications — compiled.

Real-time crawl in the Kompile RAG Console — live pipeline stages, graph extraction, embedding, and activity log
Kompile projects crawl your data and immediately compile it into structured knowledge your teams can act on.
Three pillars of compiling — models, knowledge, and applications — form a sovereign AI stack you fully own.
THREE PILLARS OF COMPILING
MODELS. KNOWLEDGE. APPLICATIONS.
Compile Models
Reduce costs by running models locally. Download, convert, optimize, and execute models on your own infrastructure — swap providers without changing a line of business logic.
Learn more →Compile Knowledge
Crawl everything — documents, APIs, databases — and compile it into sorted ontologies and knowledge graphs your AI can reason over instantly.
Learn more →Compile Applications
One unified modular stack. Build a single interface and CLI harness that works against any provider — no rewrites when you switch.
Learn more →THE STARTING POINT
Kompile Projects
A project is the self-contained unit at the center of everything Kompile does. It crawls your data, compiles it into knowledge, and gives your AI a structured world to reason over.
1. Init
kompile init scaffolds a project with config, directory structure, default pipelines, and model assignments — ready to crawl in seconds.
2. Crawl & Compile
Point the project at your sources — Confluence, Jira, Slack, local files, databases, web — and Kompile automatically crawls, chunks, embeds, and indexes everything into sorted ontologies and knowledge graphs.
3. Act
Chat, query, or build agents against the compiled knowledge. Every project carries its own scoped config, model assignments, vector indexes, and knowledge graph — run multiple projects on the same machine.
kompile init → kompile crawl → kompile chat — from zero to answering questions against your compiled knowledge in three commands.
COMPILE MODELS
Optimize. Train. Deploy. On Your Hardware.
Download models from anywhere, compile them into optimized execution graphs, fine-tune on your proprietary data, and serve them locally — cutting inference costs while keeping full control.
Multi-GPU Automatic Scheduling
Kompile automatically routes workloads across your GPUs. Per-service device routing lets you pin embeddings, LLM inference, and vision models to specific devices, while the resource-aware scheduler handles memory reservation, priority preemption, and admission control.
Device Routing
Route embedding, LLM, VLM encoder, VLM decoder, ingest, and vector population workloads to specific CUDA devices. Auto-route vision models to the largest available GPU.
Dynamic Batching
Continuous batching with per-model priority queues, configurable batch sizes, and max queue delay — maximize throughput without sacrificing latency.
Memory Management
Reservation-based GPU memory pools with admission control, concurrent load limits, and KV cache management with prefix indexing and priority eviction.
RUNS ON YOUR HARDWARE
Graph Optimizations
Raw models are compiled through a multi-pass optimization pipeline that eliminates waste, fuses operations, and targets your specific hardware.
Cleanup & Simplification
Dead code elimination, constant folding, identity removal, and algebraic simplifications (add-zero, multiply-one, subtract-self, divide-one) strip unnecessary computation.
Attention & Activation Fusion
Fuse Q*K*V attention patterns with causal masking, merge matmul+add into single ops, and collapse Sigmoid*Mul into SwiGLU and RMSNorm patterns.
Hardware Targeting
CuDNN kernel selection, Triton GPU compilation with warp and stage tuning, and automatic quantization to INT8, FP16, and BFloat16.
Performance Profiles
Choose from presets — Debug, Balanced, Max Performance, LLM Optimal — or compose your own pass pipeline for full control over the compilation.
Training & Fine-Tuning
Customize models on your own data without fragmented external tooling. Every method runs natively inside Kompile.
PEFT / Adapters
LoRA, QLoRA, AdaLoRA, DyLoRA, DoRA, IA3, Prompt Tuning, and Prefix Tuning — with native weight merging when you're ready to ship.
Alignment
DPO, KTO, ORPO, PPO, and GRPO alignment methods with reward model support and streaming training logs.
Distillation
Teacher-student distillation with logit, feature, attention, and combined modes — compress large models into production-sized versions.
Registry & Air-Gapping
A proper model registry with full lifecycle management. Import from HuggingFace, package into .karch archives, and deploy to fully air-gapped environments.
.karch Archives
Self-contained model archives with manifests and SHA-256 checksums. Export, import, publish, and download via CLI or API. Move models across air-gapped boundaries with a single file.
Model Lifecycle
Full promote, replace, convert, and delete workflows. Import from ONNX, TensorFlow, Keras, and GGUF/GGML formats. Support for LLaMA, Mistral, Mixtral, Phi, Qwen, Gemma, Falcon, and more.
IMPORT FROM FRAMEWORKS YOU TRUST
COMPILE KNOWLEDGE
Crawl Everything. Build Graphs. Reason Instantly.
Kompile projects crawl your data sources and compile them into structured ontologies and knowledge graphs your AI can reason over — no manual curation required.
RAG Pipeline
A full retrieval-augmented generation pipeline with pluggable stages. Embed, retrieve, rerank, and generate — each step swappable independently.
Query Transformers
HyDE (hypothetical document embeddings), multi-query generation, query expansion, compression, and step-back prompting — automatically reformulate queries for better retrieval.
Contextual Enrichment
LLM-based chunk enrichment adds surrounding context to each retrieved passage before generation, reducing hallucination and improving answer quality.
Guardrails
Built-in input guards (PII detection, prompt injection, toxicity, topic filtering) and output guards (hallucination detection, relevancy scoring, format enforcement).
Evaluation Harness
Measure RAG quality with built-in evaluators, experiment tracking, eval suites, and dataset management — know when your pipeline is actually improving.
GraphRAG
Go beyond flat vector search. GraphRAG extracts entities and relationships from your documents, builds a graph, detects communities, and uses graph structure to answer questions that require reasoning across multiple sources.
Entity & Relation Extraction
Multi-agent extraction with LLM-based and pattern-based agents working in parallel to identify entities and relationships from unstructured text.
Community Detection
Louvain community detection, PageRank, betweenness centrality, and LLM-generated community summaries for hierarchical graph reasoning.
Neo4j & Native Storage
Run against Neo4j for production graph queries with Cypher, or use the built-in adjacency matrix graph for embedded deployments.
Knowledge Graphs
Compile your entire data estate into typed, versioned knowledge graphs with entity resolution, schema enforcement, and graph embeddings.
Automated Construction
LLM-driven or manual graph building with concept extraction, entity resolution, and graph compaction. Named graphs, fact sheets, and schema enforcement modes keep your ontology clean.
Graph Embeddings
Native TransE and RotatE knowledge graph embedding models for link prediction and entity similarity — turn your graph into a queryable vector space.
Export & Interop
Export to CSV, JSON, JSON-LD, GraphML, Cypher, HTML, SVG, Wiki, and Obsidian vault. Merge and sync graphs across environments.
Data Crawlers
Crawl Confluence, Jira, Notion, Slack, Discord, Google Workspace, OneDrive, Reddit, email inboxes, and the web — all compiled into your graph automatically.
COMPILE APPLICATIONS
One Interface. Every Provider.
Build one application and one CLI harness against Kompile's unified interface. Swap LLM providers, vector stores, embedding models, and data sources without rewriting a single line of business logic.
LLM Providers
Every provider speaks the same interface. Switch from OpenAI to a local Ollama instance or a self-hosted vLLM server — your application code stays identical. Any OpenAI-compatible endpoint works as a drop-in backend.
API PROVIDERS
CLI AGENT BACKENDS
Embeddings & Vector Stores
The same retrieval code works across all embedding and storage backends. Run fully local with Anserini and SameDiff, or connect to managed services — one API for all.
EMBEDDING MODELS
VECTOR STORES
Data Sources & Crawlers
Crawl your entire data estate through a unified ingest pipeline. Every source feeds into the same chunking, embedding, and indexing stages.
Orchestration & Compute Engines
Go beyond simple chains. Plug in visual workflow engines, business rule systems, or graph databases — all through the same Kompile interface.
Agent-to-Agent Protocol (A2A)
Kompile agents can communicate directly with each other via the A2A protocol, enabling multi-agent architectures where specialized agents coordinate without centralized orchestration.
JOIN THE WAITLIST
Kompile is currently in Early Access Only mode.
Join the waitlist & unlock the full potential of the Modular AI Stack on your own infrastructure.