cpk scan

cpk scan indexes your codebase so agents can find symbols and trace imports in ~200 tokens per query instead of burning thousands on grep and read cycles.

Why scan

Without an index, an agent exploring an unfamiliar codebase burns 6,000-40,000 tokens on discovery. Each grep returns dozens of matches, each ls burns another batch, and reading 3-5 files to find a single function definition takes thousands more. The index collapses all of that into cheap point lookups:

Task	Without scan	With scan
Find where `AuthService` is defined	3-5 file reads (~2,000 tokens)	`cpk code symbols --name AuthService` (~200 tokens)
See what `guard.ts` imports	Read the whole file (~1,500 tokens)	`cpk code imports --file ...` (~200 tokens)
Find every file that uses `types.ts`	grep + read results (~4,000 tokens)	`cpk code dependents --file ...` (~200 tokens)
Orient in an unfamiliar repo	glob + sample reads (~10,000 tokens)	`cpk code summary` (~60 tokens)

How it works

cpk scan:

Walks your project tree, respecting .gitignore and .codepaktignore
Parses each supported file with tree-sitter WASM (no native compilation, works with npx)
Extracts symbols (functions, classes, interfaces, types, methods, exported variables) and imports
Replaces the project’s data in the local .codepakt/data.db SQLite file

No server required. Scan opens the SQLite file directly. The daemon is only needed for task coordination.

Supported languages (v0.2)

Language	Extensions	Status
TypeScript	`.ts`, `.tsx`	Full — functions, classes, interfaces, types, methods, exports, imports
JavaScript	`.js`, `.jsx`, `.mjs`, `.cjs`	Full (same extractor as TypeScript)
Python	`.py`	Grammar loaded, extractor stub (ships in a future release)
Go	`.go`	Grammar loaded, extractor stub (ships in a future release)

Unknown extensions are silently skipped. Partial extraction never fails the scan — broken files are skipped and everything else continues.

Usage

Full scan

cpk scan

{"files_scanned":230,"symbols":958,"imports":1223,"duration_ms":388,"languages":["typescript","javascript"],"incremental":false}

A full scan replaces all existing symbols and imports for the project in a single transaction. Runs at roughly 500-1,500 files/second depending on hardware.

Incremental scan

cpk scan --incremental

{"files_scanned":3,"symbols":27,"imports":41,"duration_ms":12,"languages":["typescript"],"incremental":true}

Incremental mode processes only the files staged in git (git diff --cached --name-only). Use this after editing a few files to refresh just those entries. Much faster than a full scan (milliseconds vs seconds).

If nothing is staged, incremental is a no-op:

{"files_scanned":0,"symbols":0,"imports":0,"duration_ms":0,"languages":[],"incremental":true}

Install the pre-commit git hook

cpk scan --install-hook

Installed pre-commit hook at .git/hooks/pre-commit

This writes a pre-commit hook that runs cpk scan --incremental automatically on every git commit. Your index stays fresh without any manual intervention.

Hook behavior:

Runs only when cpk is on PATH
Failures never block commits (|| true fallback)
Preserves existing pre-commit hooks — the codepakt block is added between markers and can be removed cleanly
No network, no daemon — pure local operation

Hook content:

#!/bin/sh
# >>> codepakt hook begin >>>
# codepakt: incremental code index update
if command -v cpk >/dev/null 2>&1; then
  cpk scan --incremental 2>/dev/null || true
fi
# <<< codepakt hook end <<<

Remove the hook

cpk scan --remove-hook

If the hook file contained only codepakt’s block, it’s removed entirely. If there are other hook commands present, only the codepakt block is stripped.

Flags

Flag	Description
`--incremental`	Only scan files in the git staged index
`--changed-only`	Alias for `--incremental`
`--install-hook`	Install pre-commit git hook
`--remove-hook`	Remove pre-commit git hook
`--human`	Human-readable output

What gets ignored

Always skipped regardless of .gitignore:

node_modules, .git, dist, build, out, .next, .nuxt, .svelte-kit, target, vendor, .codepakt, .venv, venv, __pycache__, .pytest_cache, .mypy_cache, coverage, .cache

Plus anything matched by the project’s .gitignore or .codepaktignore file.

Recommended workflow

# One-time setup per project
cpk init --name my-project
cpk scan --install-hook

# First scan
cpk scan

# After that, every git commit automatically refreshes the index.
# No further action needed — agents always see fresh data.

Performance

Measured on a 230-file NestJS monorepo (~1.4MB source):

Operation	Time
Full scan	388ms
Incremental scan (1 file)	4ms
Incremental scan (0 files, no-op)	0ms

Scan runs in a single SQLite transaction, so partial progress is never visible. Either the full update committed or nothing changed.

cpk code — query the index after scanning
Architecture — how the scanner and SQLite schema fit together