Merge branch 'thedotmack/add-lang-parsers' into integration/validation-batch

Adds 24-language support for smart-explore: Kotlin, Swift, Elixir,
Lua, Scala, Bash, Haskell, Zig, CSS, SCSS, TOML, YAML, SQL, Markdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-04-07 13:50:46 -07:00
7 changed files with 635 additions and 33 deletions
+16 -1
View File
@@ -14,7 +14,22 @@
"tree-sitter-python": "^0.25.0",
"tree-sitter-ruby": "^0.23.1",
"tree-sitter-rust": "^0.24.0",
"tree-sitter-typescript": "^0.23.2"
"tree-sitter-typescript": "^0.23.2",
"tree-sitter-kotlin": "^0.3.8",
"tree-sitter-swift": "^0.7.1",
"tree-sitter-php": "^0.24.2",
"tree-sitter-elixir": "^0.3.5",
"@tree-sitter-grammars/tree-sitter-lua": "^0.4.1",
"tree-sitter-scala": "^0.24.0",
"tree-sitter-bash": "^0.25.1",
"tree-sitter-haskell": "^0.23.1",
"@tree-sitter-grammars/tree-sitter-zig": "^1.1.2",
"tree-sitter-css": "^0.25.0",
"tree-sitter-scss": "^1.0.0",
"@tree-sitter-grammars/tree-sitter-toml": "^0.7.0",
"@tree-sitter-grammars/tree-sitter-yaml": "^0.7.1",
"@derekstride/tree-sitter-sql": "^0.3.11",
"@tree-sitter-grammars/tree-sitter-markdown": "^0.3.2"
},
"engines": {
"node": ">=18.0.0",
+48
View File
@@ -125,3 +125,51 @@ get_observations(ids=[11131, 10942, 10855], orderBy="date_desc")
- **Full observation:** ~500-1000 tokens each
- **Batch fetch:** 1 HTTP request vs N individual requests
- **10x token savings** by filtering before fetching
## Smart-Explore Language Support
Smart-explore tools (`smart_search`, `smart_outline`, `smart_unfold`) use tree-sitter AST parsing. The following languages are supported out of the box.
### 24 Bundled Languages
JS, TS, Python, Go, Rust, Ruby, Java, C, C++, Kotlin, Swift, PHP, Elixir, Lua, Scala, Bash, Haskell, Zig, CSS, SCSS, TOML, YAML, SQL, Markdown
### Markdown Special Support
Markdown files get structure-aware parsing beyond generic tree-sitter:
- **Heading hierarchy** -- `#`/`##`/`###` headings are extracted as nested symbols (sections contain subsections)
- **Code block detection** -- fenced code blocks are surfaced as `code` symbols with language annotation
- **Section-aware unfold** -- `smart_unfold` on a heading returns the full section content (heading through all subsections until the next heading of equal or higher level)
### User-Installable Grammars via `.claude-mem.json`
Add custom tree-sitter grammars for languages not in the bundled set. Place `.claude-mem.json` in the project root:
```json
{
"grammars": {
"gleam": {
"package": "tree-sitter-gleam",
"extensions": [".gleam"]
},
"protobuf": {
"package": "tree-sitter-proto",
"extensions": [".proto"],
"query": ".claude-mem/queries/proto.scm"
}
}
}
```
**Fields:**
- `package` (string, required) -- npm package name for the tree-sitter grammar
- `extensions` (array of strings, required) -- file extensions to associate with this language
- `query` (string, optional) -- path to a custom `.scm` query file for symbol extraction. If omitted, a generic query is used.
**Rules:**
- User grammars do NOT override bundled languages. If a language is already bundled, the entry is ignored.
- The npm package must be installed in the project (`npm install tree-sitter-gleam`).
- Config is cached per project root. Changes to `.claude-mem.json` take effect on next worker restart.
File diff suppressed because one or more lines are too long