v0.1 · MCP server · local-first

Stop wasting tokens
on file navigation.

Sema indexes your codebase once. Your AI assistant searches it forever — 4–9× fewer tokens per question, no API keys, runs locally.

$ sema index . && sema init --claude

GitHub

Works with

Claude Code ·

OpenAI Codex

~/w/sema — zsh

Measured · hoppscotch · 1,172 files

One question. "How does magic-link auth work?"

8×

fewer tokens. same answer.

Without Sema

6 tool calls · find, cat, read…

6,475tok

With Sema

3 calls · search_code → get_code

837tok

Numbers measured via tiktoken (cl100k_base) on the real hoppscotch repo. Run sema against fastapi-users and the ratio holds at 9×.

How it works

Index once.
Search forever.

Sema parses every function, class, and method, embeds them locally with SBERT, and stores them in a ChromaDB index on disk. Your AI assistant connects over MCP and queries the index instead of reading files.

tree-sitter

Parse

Every function, class, method, and section becomes a Chunk — with its full source, signature, line range, and call list.

all-MiniLM-L6-v2

Embed

SBERT runs locally — no API key, no network. ~80MB model downloaded once and cached globally.

.sema/index/

Store

ChromaDB persists vectors + source bodies on disk. SHA-256 hashes skip unchanged files on re-index.

MCP · stdio

Serve

Claude Code and Codex call search_code and get_code instead of running grep.

6 MCP tools · stdio

Six new tools your AI
gets the moment you install.

search_code()

Find a function, class, or method by natural-language description. Returns signatures + file locations, no bodies.

~180 tok · semantic + BM25 RRF

get_code()

Fetch the full source body of a symbol by exact name. Returns every implementation if the name appears in multiple files.

~300 tok

repo_map()

Compressed architecture overview — files with their exported symbols. The fastest way to orient a new session.

~600 tok

find_usages()

Locate every call site and reference to a symbol. Returns signatures only — load bodies on demand.

~220 tok

explain_file()

Summary of what a file exports — classes, functions, imports — without dumping the source.

~150 tok

impact_analysis()

Bidirectional call graph: what the symbol calls, and what calls it. Run it before refactoring to see the blast radius.

~250 tok · BFS depth 1–3

Supported languages

AST-aware where it counts.
Text-aware everywhere else.

AST-aware

Symbol-level extraction via tree-sitter. Real function and class granularity.

TypeScript JavaScript Python Go .ts .tsx .js .jsx .py .go

Text-aware

Heading- and section-based chunking. Fully searchable; no named symbols.

Markdown JSON / YAML / TOML CSS / SCSS Shell SQL / GraphQL Dockerfile · Makefile

The shift

AI assistants navigate by reading.
Sema teaches them to search.

	find · cat · read	sema
Tool calls per question	4–8 reads	3 (search → fetch → fetch)
Tokens for "how does X work?"	5,000–15,000	500–1,500
Scales with repo size	✗ cost grows linearly	✓ constant per query
Symbol-level fetch	✗ whole file or nothing	✓ one function at a time
Call graph / blast radius	✗ manual grep across files	✓ impact_analysis()
Stays consistent across sessions	✗ every cold start re-explores	✓ index persists on disk
Data leaves your machine	depends on assistant	✓ 100% local

Roadmap

Shipping in the open.

v0.2 Done

Tool improvements

impact_analysis with call graph
explain_file with import graph
Better stale-index errors

v0.3 Done

Incremental indexing

sema watch — re-index on save
Workspace support for monorepos
SHA-256 hash store, 20× faster re-index

v0.4 Now

More AST parsers

Rust · tree-sitter-rust
Java / Kotlin
Ruby · C# · C/C++

v1.0 Next

Public release

Publish to PyPI
Homebrew formula
Cursor, Copilot, Windsurf auto-config

Stop wasting tokens on file navigation.

Index once.Search forever.