Docs Case Studies

cpk scan

cpk scan indexes your codebase so agents can find symbols and trace imports in ~200 tokens per query instead of burning thousands on grep and read cycles.

Why scan

Without an index, an agent exploring an unfamiliar codebase burns 6,000-40,000 tokens on discovery. Each grep returns dozens of matches, each ls burns another batch, and reading 3-5 files to find a single function definition takes thousands more. The index collapses all of that into cheap point lookups:

TaskWithout scanWith scan
Find where AuthService is defined3-5 file reads (~2,000 tokens)cpk code symbols --name AuthService (~200 tokens)
See what guard.ts importsRead the whole file (~1,500 tokens)cpk code imports --file ... (~200 tokens)
Find every file that uses types.tsgrep + read results (~4,000 tokens)cpk code dependents --file ... (~200 tokens)
Orient in an unfamiliar repoglob + sample reads (~10,000 tokens)cpk code summary (~60 tokens)

How it works

cpk scan:

  1. Walks your project tree, respecting .gitignore and .codepaktignore
  2. Parses each supported file with tree-sitter WASM (no native compilation, works with npx)
  3. Extracts symbols (functions, classes, interfaces, types, methods, exported variables) and imports
  4. Replaces the project’s data in the local .codepakt/data.db SQLite file

No server required. Scan opens the SQLite file directly. The daemon is only needed for task coordination.

Supported languages (v0.2)

LanguageExtensionsStatus
TypeScript.ts, .tsxFull — functions, classes, interfaces, types, methods, exports, imports
JavaScript.js, .jsx, .mjs, .cjsFull (same extractor as TypeScript)
Python.pyGrammar loaded, extractor stub (ships in a future release)
Go.goGrammar loaded, extractor stub (ships in a future release)

Unknown extensions are silently skipped. Partial extraction never fails the scan — broken files are skipped and everything else continues.

Usage

Full scan

cpk scan
{"files_scanned":230,"symbols":958,"imports":1223,"duration_ms":388,"languages":["typescript","javascript"],"incremental":false}

A full scan replaces all existing symbols and imports for the project in a single transaction. Runs at roughly 500-1,500 files/second depending on hardware.

Incremental scan

cpk scan --incremental
{"files_scanned":3,"symbols":27,"imports":41,"duration_ms":12,"languages":["typescript"],"incremental":true}

Incremental mode processes only the files staged in git (git diff --cached --name-only). Use this after editing a few files to refresh just those entries. Much faster than a full scan (milliseconds vs seconds).

If nothing is staged, incremental is a no-op:

{"files_scanned":0,"symbols":0,"imports":0,"duration_ms":0,"languages":[],"incremental":true}

Install the pre-commit git hook

cpk scan --install-hook
Installed pre-commit hook at .git/hooks/pre-commit

This writes a pre-commit hook that runs cpk scan --incremental automatically on every git commit. Your index stays fresh without any manual intervention.

Hook behavior:

  • Runs only when cpk is on PATH
  • Failures never block commits (|| true fallback)
  • Preserves existing pre-commit hooks — the codepakt block is added between markers and can be removed cleanly
  • No network, no daemon — pure local operation

Hook content:

#!/bin/sh
# >>> codepakt hook begin >>>
# codepakt: incremental code index update
if command -v cpk >/dev/null 2>&1; then
  cpk scan --incremental 2>/dev/null || true
fi
# <<< codepakt hook end <<<

Remove the hook

cpk scan --remove-hook

If the hook file contained only codepakt’s block, it’s removed entirely. If there are other hook commands present, only the codepakt block is stripped.

Flags

FlagDescription
--incrementalOnly scan files in the git staged index
--changed-onlyAlias for --incremental
--install-hookInstall pre-commit git hook
--remove-hookRemove pre-commit git hook
--humanHuman-readable output

What gets ignored

Always skipped regardless of .gitignore:

node_modules, .git, dist, build, out, .next, .nuxt, .svelte-kit, target, vendor, .codepakt, .venv, venv, __pycache__, .pytest_cache, .mypy_cache, coverage, .cache

Plus anything matched by the project’s .gitignore or .codepaktignore file.

# One-time setup per project
cpk init --name my-project
cpk scan --install-hook

# First scan
cpk scan

# After that, every git commit automatically refreshes the index.
# No further action needed — agents always see fresh data.

Performance

Measured on a 230-file NestJS monorepo (~1.4MB source):

OperationTime
Full scan388ms
Incremental scan (1 file)4ms
Incremental scan (0 files, no-op)0ms

Scan runs in a single SQLite transaction, so partial progress is never visible. Either the full update committed or nothing changed.

  • cpk code — query the index after scanning
  • Architecture — how the scanner and SQLite schema fit together