cpk scan
cpk scan indexes your codebase so agents can find symbols and trace imports in ~200 tokens per query instead of burning thousands on grep and read cycles.
Why scan
Without an index, an agent exploring an unfamiliar codebase burns 6,000-40,000 tokens on discovery. Each grep returns dozens of matches, each ls burns another batch, and reading 3-5 files to find a single function definition takes thousands more. The index collapses all of that into cheap point lookups:
| Task | Without scan | With scan |
|---|---|---|
Find where AuthService is defined | 3-5 file reads (~2,000 tokens) | cpk code symbols --name AuthService (~200 tokens) |
See what guard.ts imports | Read the whole file (~1,500 tokens) | cpk code imports --file ... (~200 tokens) |
Find every file that uses types.ts | grep + read results (~4,000 tokens) | cpk code dependents --file ... (~200 tokens) |
| Orient in an unfamiliar repo | glob + sample reads (~10,000 tokens) | cpk code summary (~60 tokens) |
How it works
cpk scan:
- Walks your project tree, respecting
.gitignoreand.codepaktignore - Parses each supported file with tree-sitter WASM (no native compilation, works with
npx) - Extracts symbols (functions, classes, interfaces, types, methods, exported variables) and imports
- Replaces the project’s data in the local
.codepakt/data.dbSQLite file
No server required. Scan opens the SQLite file directly. The daemon is only needed for task coordination.
Supported languages (v0.2)
| Language | Extensions | Status |
|---|---|---|
| TypeScript | .ts, .tsx | Full — functions, classes, interfaces, types, methods, exports, imports |
| JavaScript | .js, .jsx, .mjs, .cjs | Full (same extractor as TypeScript) |
| Python | .py | Grammar loaded, extractor stub (ships in a future release) |
| Go | .go | Grammar loaded, extractor stub (ships in a future release) |
Unknown extensions are silently skipped. Partial extraction never fails the scan — broken files are skipped and everything else continues.
Usage
Full scan
cpk scan
{"files_scanned":230,"symbols":958,"imports":1223,"duration_ms":388,"languages":["typescript","javascript"],"incremental":false}
A full scan replaces all existing symbols and imports for the project in a single transaction. Runs at roughly 500-1,500 files/second depending on hardware.
Incremental scan
cpk scan --incremental
{"files_scanned":3,"symbols":27,"imports":41,"duration_ms":12,"languages":["typescript"],"incremental":true}
Incremental mode processes only the files staged in git (git diff --cached --name-only). Use this after editing a few files to refresh just those entries. Much faster than a full scan (milliseconds vs seconds).
If nothing is staged, incremental is a no-op:
{"files_scanned":0,"symbols":0,"imports":0,"duration_ms":0,"languages":[],"incremental":true}
Install the pre-commit git hook
cpk scan --install-hook
Installed pre-commit hook at .git/hooks/pre-commit
This writes a pre-commit hook that runs cpk scan --incremental automatically on every git commit. Your index stays fresh without any manual intervention.
Hook behavior:
- Runs only when
cpkis onPATH - Failures never block commits (
|| truefallback) - Preserves existing
pre-commithooks — the codepakt block is added between markers and can be removed cleanly - No network, no daemon — pure local operation
Hook content:
#!/bin/sh
# >>> codepakt hook begin >>>
# codepakt: incremental code index update
if command -v cpk >/dev/null 2>&1; then
cpk scan --incremental 2>/dev/null || true
fi
# <<< codepakt hook end <<<
Remove the hook
cpk scan --remove-hook
If the hook file contained only codepakt’s block, it’s removed entirely. If there are other hook commands present, only the codepakt block is stripped.
Flags
| Flag | Description |
|---|---|
--incremental | Only scan files in the git staged index |
--changed-only | Alias for --incremental |
--install-hook | Install pre-commit git hook |
--remove-hook | Remove pre-commit git hook |
--human | Human-readable output |
What gets ignored
Always skipped regardless of .gitignore:
node_modules, .git, dist, build, out, .next, .nuxt, .svelte-kit, target, vendor, .codepakt, .venv, venv, __pycache__, .pytest_cache, .mypy_cache, coverage, .cache
Plus anything matched by the project’s .gitignore or .codepaktignore file.
Recommended workflow
# One-time setup per project
cpk init --name my-project
cpk scan --install-hook
# First scan
cpk scan
# After that, every git commit automatically refreshes the index.
# No further action needed — agents always see fresh data.
Performance
Measured on a 230-file NestJS monorepo (~1.4MB source):
| Operation | Time |
|---|---|
| Full scan | 388ms |
| Incremental scan (1 file) | 4ms |
| Incremental scan (0 files, no-op) | 0ms |
Scan runs in a single SQLite transaction, so partial progress is never visible. Either the full update committed or nothing changed.
Related
- cpk code — query the index after scanning
- Architecture — how the scanner and SQLite schema fit together