Salta ai contenuti
← Torna al sito
Soleri | Docs

Under the Hood

Questi contenuti non sono ancora disponibili nella tua lingua.

Your Soleri agent is an MCP tool server. It runs locally, stores everything in files on your machine, and exposes tools that your AI editor calls when needed. This page explains how each piece works so you can trust what’s happening.

When your AI editor starts, it launches your agent as a child process. The agent registers its tools (22 engine modules with 350+ operations) over the Model Context Protocol (MCP).

Your AI editor decides when to call these tools based on your conversation. When you ask “what do we know about error handling?”, your AI editor recognizes this as a knowledge question and calls the agent’s search_intelligent tool. The agent returns ranked results, and your AI editor uses them in its response.

The agent never acts on its own. It only responds to tool calls from your AI editor.

The vault is a SQLite database with FTS5 (full-text search) enabled. It stores knowledge entries (patterns, anti-patterns, rules, playbooks) as structured records.

Each entry has:

  • Type: pattern, anti-pattern, rule, playbook, workflow, principle, reference
  • Domain: the knowledge area (frontend, security, infrastructure, or your custom domains)
  • Severity: critical, warning, or suggestion
  • Tags: free-form labels for discovery
  • Description: the actual knowledge, in your words

The vault file lives in your agent’s data directory. It’s a regular SQLite file. You can inspect it with any SQLite client if you’re curious.

When your AI editor calls the search tool, the agent doesn’t just do a keyword match. It combines six signals into a single relevance score:

SignalWhat it measures
TF-IDF text relevanceHow well the query matches the entry’s text, weighted by term rarity
Severity weightCritical entries get a boost over suggestions
RecencyRecently added or modified entries rank slightly higher
Tag overlapEntries whose tags match the query context score higher
Domain matchEntries in the matching domain get a boost
Usage historyEntries that have been useful before rank higher over time

This is why a critical security pattern about JWT storage outranks a suggestion about loading spinners when you search for “authentication”, even if both mention tokens.

For the full technical deep dive (FTS5 configuration, BM25 weights, hybrid vector search, adaptive weight learning, and federated tier search), see Search Architecture.

The brain sits on top of the vault. It tracks which patterns actually work and uses that information to improve recommendations.

Every vault entry has a strength score tracked by the brain. Strength increases when:

  • The pattern is found in a search and used in work
  • A plan that included this pattern completes successfully
  • You give positive feedback on a search result

Strength decreases when:

  • The pattern is found but dismissed
  • Plans that used the pattern have high drift (things didn’t go as planned)
  • The entry goes unused for a long period (decay)

The brain maintains a TF-IDF (Term Frequency-Inverse Document Frequency) index across all vault entries. This is what makes search work well: it knows that “authentication” is more meaningful than “the” and ranks accordingly.

The index rebuilds automatically when you add entries. You can trigger a manual rebuild after bulk imports.

When the agent creates a plan, it asks the brain: “what patterns are relevant to this task?” The brain returns recommendations ranked by strength. A pattern with strength 0.9 (proven across many successful sessions) gets recommended more confidently than one with strength 0.3 (untested).

These recommendations appear as decisions in the plan. The agent tells you what it knows before you start working.

After a plan completes, the brain examines the session and proposes new knowledge. Here’s the process:

  1. Session scan: the brain reviews the session record. Which tools were called, what files were modified, how long each step took, and whether the outcome matched the plan.
  2. Pattern detection: it looks for repeatable signals. The same tool sequence appearing across multiple sessions, steps that consistently take longer than estimated (planning insight), and solutions that resolved drift in past plans.
  3. Proposal generation: detected patterns are proposed as new vault entries with auto-inferred type, severity, and tags. They enter the governance pipeline. In moderate mode, suggestions are auto-approved while warnings and critical entries go to proposal review.
  4. Strength initialization: newly extracted patterns start with a moderate strength score. If the same pattern is independently extracted from multiple sessions, its confidence increases.

This is the automatic part of the compound loop. You don’t have to manually capture everything. The brain finds patterns in your work and proposes them. You can review what was extracted after any plan completes:

You: “What patterns were extracted from the last session?”

Agent: 2 patterns proposed: “GraphQL resolvers should validate input before calling service layer” (auto-approved, suggestion), “Always add deprecation headers before removing REST endpoints” (pending review, warning).

Long-running sessions accumulate context: tool calls, token usage, elapsed time. At some point the session needs to rotate so your AI editor can start fresh without losing track of what was happening. Compaction policies control when that rotation happens.

A compaction policy has three thresholds. If any threshold is exceeded, the session rotates:

ThresholdDefaultWhat it measures
maxRuns200Number of tool calls / interactions in the session
maxInputTokens2,000,000Cumulative input tokens consumed
maxAge72hWall-clock time since the session started

The evaluator checks thresholds in that order and triggers on the first one that fires. When it does, the agent generates a handoff note (what was in progress, key decisions, files modified) so the next session can pick up where the previous one left off.

You configure compaction in your agent.yaml under the engine block. All fields are optional, so you only override what you need:

engine:
compactionPolicy:
maxRuns: 100 # rotate after 100 tool calls
maxInputTokens: 1000000 # rotate after 1M input tokens
maxAge: '24h' # rotate after 24 hours

Duration strings support ms, s, m, h, and d suffixes. So 30m, 7d, and 500ms all work.

The final policy is resolved by merging three levels, with higher levels winning on a per-field basis:

  1. Your agent.yaml config (highest priority)
  2. Adapter defaults (the runtime adapter can supply its own defaults)
  3. Engine defaults (the hardcoded fallback shown in the table above)

This means you can override just maxAge in your agent.yaml and the other two thresholds will still use their defaults. There’s no all-or-nothing replacement.

Everything your agent knows persists in files:

  • Vault: SQLite database with knowledge entries
  • Brain state: JSON file with strength scores, session history, TF-IDF vocabulary
  • Plans: JSON file with plan history and reconciliation reports
  • Config: your agent’s identity, domains, and settings

When you close your AI editor and open it again, nothing is lost. The agent loads its state from these files and picks up exactly where it left off.

If you work on multiple projects, you can link them:

"Link this project to my-other-project as related"

Linked projects can search each other’s vaults. A security pattern you captured in one project becomes discoverable in another.

You can also promote high-value patterns to a global pool, available to every agent you run across all projects.

Not everything you capture goes straight into the vault. The governance layer evaluates each capture:

  • Quotas: prevent the vault from growing without bounds
  • Proposal gates: certain entry types require approval before becoming active
  • Duplicate detection: if a similar entry already exists, the capture may be rejected or merged

This keeps the vault clean and useful over time. Without governance, knowledge bases tend to accumulate noise: duplicates, contradictions, entries nobody needs.

The curator is an automated maintenance system that keeps vault quality high:

  • Deduplication: finds entries that say the same thing and proposes merging them
  • Decay scanning: identifies entries that haven’t been used in a long time
  • Health audits: reports on overall vault quality, including duplicate rate, staleness, and gaps
  • Tag normalization: cleans up inconsistent tags across entries

You can run curator operations manually, or let them happen as part of the brain’s lifecycle.

The engine registers 22 modules as MCP tools. Each module is a single tool with op-based dispatch:

ModulePurpose
VaultKnowledge store — CRUD, search, FTS5 indexing
BrainIntelligence — pattern strength, TF-IDF, recommendations
PlanPlanning lifecycle — create, approve, split, reconcile
MemorySession history — capture, search, cross-project
AdminHealth checks, tool listing, diagnostics
CuratorVault maintenance — dedup, decay, health audits
LoopIterative validation cycles
OrchestrateHigh-level workflow — plan + execute + complete
ControlIntent routing, behavior morphing
ContextEntity extraction, knowledge retrieval
AgencyProactive file watching, pattern surfacing
ChatChat transport — sessions, bridge, notifications
OperatorOperator profiling — expertise, corrections, adaptation
ArchiveVault snapshots, backup, restore, optimization
SyncGit push/pull, Obsidian sync
ReviewGovernance review workflow — submit, approve, reject
IntakeExternal ingestion — URLs, text, PDFs, batch
LinksZettelkasten connections — link, traverse, orphan detection
BranchingVault branching — isolate, experiment, merge
TierMulti-tier vault connections — external sources
EmbeddingEmbedding management — vector storage and retrieval
DreamMemory consolidation — dedup, archive stale entries, resolve contradictions

Domain packs add additional modules on top of these (e.g., @soleri/domain-design adds design-specific facades).

Soleri ships with a test suite to verify all of this works correctly:

  • Unit tests (npm test): test individual modules within each package
  • E2E tests (npm run test:e2e): 800+ integration tests across 30 files that exercise every engine feature. Vault persistence, brain intelligence, curator health audits, governance policies, plan state machine, MCP transport, HTTP/WebSocket servers, CLI commands, scaffold pipeline, operator profiling, knowledge traceability, and concurrent operations.

The E2E suite uses real SQLite databases, real MCP stdio transport, and real scaffolded agents, not mocks. When tests pass, the engine works.

For details on running and writing tests, see Testing.


Next: Security & Privacy — understand where your data lives and who can access it.