THE APPLICATION LAYER

Kompile App

A modular application framework with 40+ swappable modules, 44+ MCP tools, and 20+ data connectors. Knowledge graphs with MEBN probabilistic reasoning, causal inference, 9 maintenance primitives, and full MCP tool access for AI agents — swap any provider without touching business logic.

Modular by Design

Every capability is a separate module behind a shared interface. Swap any layer by changing a config — your pipeline code, agent logic, and API endpoints stay identical.

40+

Swappable Modules

44+

MCP Tools

20+

Data Connectors

Export Formats

The Crawl-to-Graph Pipeline

An 8-phase pipeline that ingests data, classifies content, extracts entities, builds knowledge graphs, resolves duplicates, and indexes vectors — with adaptive memory-aware parallelism at every step.

1. Load

Pull data from 20+ sources (Confluence, Slack, web, S3, email, SQL) or load from local files. OAuth2 for cloud, robots.txt for web.

2. Classify & Route

Auto-classify content: text PDFs, image-based PDFs, mixed, tables, audio, email. Route each to its specialized pipeline (text, visual, tables, email).

3. Preprocess

Ordered preprocessor chain: language detection, translation, boilerplate removal, Unicode normalization, PII redaction, content-hash + SimHash dedup.

4. Chunk

Split into retrieval-sized passages with 5 strategies (sentence, recursive, markdown, token, table-aware). Register SNIPPET graph nodes with CONTAINS edges.

5. Extract Graph

Multi-agent LLM + pattern-based entity extraction with cost-balanced batch planning. Schema enforcement (None/Lenient/Strict). Adaptive memory-aware parallelism.

6. Resolve & Compute

Entity resolution via Levenshtein + embedding cosine + MEBN probabilistic scoring. Compute shared-entity and embedding-similarity edges across documents.

7. Index & Embed

Generate vector embeddings (BGE, Arctic, SPLADE++) and store in Anserini HNSW, pgvector, Vespa, or Chroma. Hybrid BM25 + dense retrieval ready.

8. Enrich

Post-crawl enrichment: deduplicate, prune, validate, normalize entities. Discover domain taxonomy via LLM, categorize entities, rebuild search indexes.

Knowledge Graphs, Reasoning & Agents

Kompile graphs are more than storage — they support probabilistic reasoning, causal inference, automated maintenance, and full MCP tool access for AI agents.

GraphRAG

Local ego-network, global community-level, and hybrid search modes with configurable vector weight. Three backends: JPA, Neo4j Cypher, or ND4J matrix. Louvain community detection with LLM-generated summaries.

Knowledge Graphs

Seven node levels, eleven edge types, named graph scoping, and multi-tenant fact sheet isolation. LLM or manual construction with schema enforcement (None/Lenient/Strict). Export to 9 formats including JSON-LD, GraphML, Cypher, and Obsidian.

Bayesian Networks

Multi-Entity Bayesian Networks (MEBN) with MFrag templates for EntityRelevance, CausalInfluence, InformationFlow, and RiskPropagation. Situation-specific grounding via BFS from seed nodes. Variable elimination for posterior inference.

Causal Inference

Eight W3C PROV-DM causal edge types: Causes, Enables, Triggers, Contributes To, Prevents, Correlates With, Influences, Derived From. Temporal chain extraction, attribution paths, and counterfactual modeling.

Graph Maintenance

Nine automated primitives: TTL sweep, orphan cleanup, confidence pruning, component pruning, contradiction detection, provenance validation, entity re-resolution, stats refresh, and community rebuild. Full mutation audit with before/after snapshots and WebSocket broadcasting.

MCP Graph Tools

30+ MCP tool operations: node/edge CRUD, bulk create, merge nodes, BFS traversal, ego networks, hybrid search, shortest path, PageRank, centrality, Louvain communities, and LLM community summaries — all callable by AI agents.

Graph Embeddings

Native TransE and RotatE knowledge graph embeddings trained with margin ranking loss and self-adversarial negative sampling. Link prediction, entity similarity, and head/tail prediction backed by ND4J tensors.

ReAct Agents & A2A

Reason-Act-Observe agents with pluggable interfaces. Agent-to-Agent (A2A) protocol at /.well-known/agent-card.json. KClaw browser-based agent runner with MCP injection and permission management.

Guardrails & Enforcer

Input guards (PII, injection, toxicity, topic filtering) and output guards (hallucination, relevancy, format). Enforcer wraps any agent with keyword, LLM, or subprocess judge modes for policy-governed execution.

Module Directory

Every layer of the stack is a swappable module. Pick what you need.

LLM Providers

Swap between hosted providers, local SameDiff models, or CLI-based agents. Every provider implements the same Spring AI interface — your code never changes.

OpenAIAnthropicGoogle GeminiSameDiff Local (SmolLM, Phi-2)CLI Agents (Claude, Codex, Gemini, Qwen)

Embedding Models

Dense embeddings (BGE, Arctic, E5), sparse neural retrieval (SPLADE++, UniCoil), and cross-encoder rerankers (MiniLM, TinyBERT) — all running natively via SameDiff.

OpenAI EmbeddingsBGE / Arctic (SameDiff)SPLADE++ SparseSentence TransformersPostgresMLCross-Encoder Rerankers

Vector Stores

Hybrid BM25 keyword + dense HNSW vector retrieval with Anserini, or connect to pgvector, Vespa, or ChromaDB. One API, four backends.

Anserini / Lucene HNSWPostgreSQL pgvectorVespaChromaDB

Document Loaders

PDF with table extraction, Office docs, Excel with formula dependency graphs, email inboxes (IMAP/POP3/MBOX/PST/EML), audio transcription, and BFS web crawling.

PDF + Tables (Tabula)Microsoft OfficeExcel + Formula GraphsEmail (IMAP, POP3, MBOX, PST)Audio (Whisper)Apache TikaWeb Crawler

Data Sources

20+ connectors with OAuth2 authentication. Crawl SaaS tools, cloud storage, email, remote folders, and databases — all compiled into the graph automatically.

ConfluenceJiraNotionSlackDiscordGoogle WorkspaceOneDriveRedditGmailS3 / SFTP / SMBSQL Databases

Text Chunkers

Five pluggable strategies. Table-aware chunking keeps tables atomic and never splits them. Sentence detection supports 30+ languages via OpenNLP.

Sentence (OpenNLP)Recursive CharacterMarkdown-AwareToken-BasedTable-Aware

Knowledge Graphs

Full knowledge graph stack with entity extraction, Bayesian reasoning (MEBN), causal inference, graph algorithms, TransE/RotatE embeddings, and export to JSON-LD, GraphML, Cypher, Obsidian, and more.

Neo4j (APOC)Native JPA + ND4J MatrixTransE / RotatE EmbeddingsMEBN / Bayesian NetworksCausal Inference9 Export Formats

MCP Tools & Agents

Expose 44+ tools via MCP stdio/SSE to Claude Code, Codex, Gemini CLI, and Qwen. Graph CRUD, traversal, algorithms, and community detection — all callable by AI agents natively.

44+ MCP ToolsGraph Mutation ToolsGraph Traversal ToolsGraph Algorithm ToolsAgent-to-Agent (A2A)ReAct Agent FrameworkKClaw Agent Runner

Guardrails & Evaluation

Input guards (PII, injection, toxicity, topic filtering) and output guards (hallucination, relevancy, format). Enforcer wraps any agent with keyword, LLM, or subprocess judge modes.

PII DetectionPrompt InjectionToxicity FilterHallucination DetectionRelevancy ScoringEnforcer (keyword/LLM/subprocess)

Compute Graphs & Workflows

Orchestrate complex multi-step pipelines beyond simple chains. Plug in workflow engines, rule systems, or visual builders through the same Kompile interface.

Apache Camel

Enterprise integration patterns for routing, transformation, and mediation across 300+ connectors.

Drools

Business rules engine for decision logic that changes independently of application code.

n8n

Visual workflow automation for building complex multi-step data and AI pipelines.

Xircuits

Visual component-based workflows for ML pipelines with drag-and-drop composition.

Excel Workflows

Excel-driven workflows with formula dependency graphs extracted into executable compute graphs.

Scripting

Script-based workflows for custom pipeline logic with Python, Groovy, or JavaScript steps.

Ready to compile your stack?

Get early access and start building on the modular AI platform.

Request Early Access