CAMEL-23872: Add JFR Old Object Sample panel for memory leak diagnosis#24367
Conversation
|
🌟 Thank you for your contribution to the Apache Camel project! 🌟 🐫 Apache Camel Committers, please review the following items:
|
|
🧪 CI tested the following changed modules:
🔬 Scalpel shadow comparison — compile: +587, test: +561Maveniverse Scalpel detected 587 affected modules via effective POM comparison (vs 0 from grep-based detection). Skip-tests mode would test 561 modules (8 direct + 553 downstream), skip tests for 26 (generated code, meta-modules) Modules Scalpel would test (561)
Modules with tests skipped (26)
💡 Manual integration tests recommended:
Build reactor — dependencies compiled but only changed modules were tested (592 modules)
|
edba90e to
44b7de9
Compare
gnodet
left a comment
There was a problem hiding this comment.
Nice feature — the dual recording with trend comparison is a really thoughtful addition for memory leak diagnosis, and the JFR API handling (multi-JDK field fallbacks, reference chain traversal, readable array names) is solid work.
A few items worth considering:
Thread.sleep for recording waits (CLI, MCP, TUI): The long Thread.sleep((dur + 3) * 1000L) calls (up to ~123 seconds in dual mode) could be replaced with status polling. The dev console already has auto-stop via scheduler.schedule() and a doStatus() endpoint that reports recording vs completed. Polling status every 2–3 seconds would eliminate the fragile +3 padding, detect early failures, and avoid blocking threads for extended periods — especially relevant for the MCP tool where callers may have their own timeouts. The existing ActionBaseCommand.getJsonObject() poll pattern (100ms loop with StopWatch timeout) could serve as a model. The 500ms sleeps after System.gc() in the dev console are fine — standard JFR diagnostic practice.
Tests: The readableClassName() method is package-visible and has non-trivial edge cases (multi-dimensional arrays, primitive descriptors, L...; class descriptors) — a unit test would be easy to add and would protect against regressions. The applyFilters() and aggregation logic are also good candidates.
See inline comments for specifics.
This review does not replace specialized static analysis tools such as SonarCloud, CodeRabbit, or Sourcery.
This review was generated by an AI agent (Claude Code on behalf of forgebot) and may contain inaccuracies. Please verify all suggestions before applying.
| return 1; | ||
| } | ||
|
|
||
| Thread.sleep((dur1 + 3) * 1000L); |
There was a problem hiding this comment.
This Thread.sleep((dur1 + 3) * 1000L) blocks the calling thread for the full recording duration (up to 63+ seconds with defaults). The dev console already auto-stops via its scheduler.schedule() and reports completion through the status command.
Consider polling status instead — similar to how ActionBaseCommand.getJsonObject() polls with a StopWatch loop:
// Poll status until recording completes (or times out)
StopWatch watch = new StopWatch();
long timeout = (dur1 + 30) * 1000L;
while (watch.taken() < timeout) {
Thread.sleep(2000);
JsonObject sts = sendAction(pid, "status", 0);
if (sts != null && !"recording".equals(sts.getString("status"))) {
break;
}
}This would detect early completion/failure immediately and eliminate the +3 second padding that could be too short on a loaded system. Same applies to the second recording sleep below.
| } catch (InterruptedException e) { | ||
| Thread.currentThread().interrupt(); | ||
| throw new ToolCallException("Interrupted during recording 1", null); | ||
| } |
There was a problem hiding this comment.
The MCP tool blocks for (dur1 + 3) * 1000L here and again below for recording 2 — potentially 96+ seconds total with 30s default. MCP clients may have their own timeouts, and a blocked tool call gives no feedback to the caller.
Same suggestion as in the CLI command: poll the status action every 2–3 seconds instead. This also applies to the second sleep at line ~437.
| } | ||
| startDaemonThread("jfr-poll-" + pid, () -> { | ||
| try { | ||
| Thread.sleep((dur + 3) * 1000L); |
There was a problem hiding this comment.
Polling status at shorter intervals (e.g., every 2–3 seconds) instead of sleeping for the full (dur + 3) seconds would also let the TUI refresh the progress bar with actual server-reported elapsed/remaining time, rather than relying on locally-calculated estimates.
| } | ||
|
|
||
| private JsonObject doStart(Map<String, Object> options) { | ||
| if (activeRecording != null) { |
There was a problem hiding this comment.
Minor: doStart() is not synchronized, but doStopRecordingAndParse() is. The check-then-act on activeRecording here has a theoretical race — two concurrent callers could both pass the null check and create two Recording instances, leaking the first. Making doStart synchronized (or using compareAndSet on an AtomicReference) would close the gap. Low severity since concurrent calls to this console are unlikely in practice.
oscerd
left a comment
There was a problem hiding this comment.
Really nice, complete feature — dev-console + CLI + MCP + TUI, catalog/metadata fully regenerated, and the generated CLI doc + index are in place. A few things worth addressing:
- Tests: there are no tests in the PR (0 of 23 files). Most of it (live JFR + terminal rendering) is genuinely hard to unit-test, and I see the same no-test pattern in the precedent CAMEL-23837 — but two pieces are pure and cheap to cover:
readableClassName(JfrOldObjectSampleDevConsole.java:857-887) and the trend classification indoCompare(:474-495).camel-consolealready hasAbstractDevConsoleTest+ ~19*DevConsoleTest, so at least a small unit test for those two would be valuable. - MCP dual-mode default duration mismatch:
doDualJfrRecordingusesint dur = 30and clampsdur <= 0→ 30 (RuntimeTools.java:1805, 1809-1810), but the@ToolArgdescription says "default 60, use 0 for manual stop" (:1777) and the CLI--durationdefaults to 60. So the MCP dual default (30s + 60s) diverges from the CLI (60s + 120s), and "0 for manual stop" doesn't hold in dual mode (0 becomes 30). Worth aligning the doc and behavior. - CI: no checks have run on this branch yet (0 reported) — just make sure the formatter / impsort / generated-sources gates go green before merge.
(For anyone grepping the diff: the Thread.sleep(...) calls and System.gc() here are in the JFR recording paths, not tests, and are intentional — no Awaitility concern.)
Reviewed with Claude Code on behalf of Andrea Cosentino. This review was generated by an AI agent and may contain inaccuracies; please verify all suggestions before applying.
Adds JFR-based old object sampling across dev console, CLI connector, CLI command, TUI tab, and MCP tool. JFR OldObjectSample tracks objects surviving multiple GC cycles and captures reference chains back to GC roots, complementing the existing heap histogram. Key features: - Dev console manages JFR recording lifecycle (start/stop/query) - Aggregates samples by class + stack trace fingerprint (count, totalSize) - Converts JVM array descriptors to human-readable names - CLI supports --stacktrace, --min-size, --top options - TUI tab with recording progress, sortable table, detail panel with reference chains and allocation stack traces, min-size filter - MCP tool for AI agent access - JDK/Jakarta stack trace frames dimmed in TUI detail panel Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
…lters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Renamed the JSON field and all user-facing labels from TOTAL to SAMPLED to clarify that sizes are sampled during the recording window, not total heap usage. Added explanatory notes to TUI F1 help, CLI footer, and MCP tool description. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Adds a dual recording mode across TUI, CLI, and MCP that runs two sequential JFR recordings (Xs then 2Xs) and compares trends to detect memory leaks. Replaces CLI --compare flag with --mode single|dual for consistency across all interfaces. Also adds compare command to dev console, changes CLI --duration default to 30s, and adds GC trigger before recording start for cleaner baselines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claus Ibsen <davsclaus@apache.org> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
…l mode Three-tier trend classification: growing (>=1.2), suspicious (>=1.1), stable (0.8-1.1), shrinking (<0.8). Default 1KB min-size filter in compare to suppress noise. CLI --stacktrace flag now works with dual mode comparison output showing reference chains and allocation traces. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claus Ibsen <davsclaus@apache.org> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
- Fix race condition in TUI dual recording where state check used async runOnRenderThread result, causing second recording to not start - Make sendStopAndLoadResults return boolean for synchronous success check - Make dual mode the default across TUI, CLI, and MCP - Auto-detect existing dual recordings when TUI loads via hasComparisonData - Add mode-specific idle screen descriptions for dual vs single - Show 0s instead of milliseconds for sub-second durations - Add 1KB default min-size filter in compare to suppress noise Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Compute elapsed and remaining as whole seconds from a single base so they always sum to the duration exactly. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Longer baseline gives JFR sampler more time to capture objects, producing more stable growth ratios in dual mode. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Enable jdk.GarbageCollection event alongside OldObjectSample to count GC cycles during each recording. Displayed in TUI title bars as gc:N (single mode) or gc:N/M (comparison mode, run1/run2). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Bundled example that simulates a memory leak (growing HashMap cache and ArrayList buffers) for testing the JFR Old Object Sample tool in TUI, CLI, and MCP. Added sync entries in pom.xml. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Rename file to MemoryLeak.java so public class name matches filename. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Move JVM class name conversion (e.g. [B -> byte[]) to StringHelper for reuse. Apply to heap histogram dev console. Fix duration format to omit space (1m0s -> 1m) and use dot for ratio decimal separator. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Use BufferLeakProcessor (named class) for the buffer leak and keep lambda for the cache leak, so JFR stack traces show the contrast between lambda vs named class in diagnostics. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Rename to user-friendly "Memory Leak" title. Stabilize sample group key by using top 3 frames without line numbers and normalizing lambda class names. Update F1 help to explain JFR technique and reference jmap/jhat/Eclipse MAT for deep analysis. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Rename camel cmd jfr-old-objects to camel cmd memory-leak and MCP tool to camel_runtime_memory_leak. Update MCP description to explain JFR technique, its limitations, and reference jmap/Eclipse MAT for deep analysis. Internal action ID unchanged for wire compatibility. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
… percentage - Rename dev console from jfr-old-objects to jfr-memory-leak - Rename source files: JfrOldObjectSampleDevConsole -> JfrMemoryLeakDevConsole, JfrOldObjectSampleTab -> MemoryLeakTab, CamelJfrOldObjects -> CamelMemoryLeak - Show growth ratio as percentage (+10%) instead of multiplier (1.1x) - Update all wire protocol action strings to jfr-memory-leak Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
- Match samples across JFR runs using first user-code frame instead of top 3 frames, skipping java/javax/jakarta/jdk/sun/camel internals - Rename recording panel title to Memory Leak Recording - Improve idle screen intro text to explain what the tab does - Show connected instead of unknown for unnamed MCP clients Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
…ntages - Flag entries with low sample counts (<5) or divergent sample ratios - Show warning emoji and ~ prefix on growth in TUI, CLI, and MCP - Add explanation in TUI detail panel and F1 help - Use percentage format in CLI output matching TUI Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
…ix More menu order Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
… doStart - Dev console: make doStart() synchronized to prevent race on activeRecording - CLI, MCP, TUI: replace Thread.sleep((dur+3)*1000) with polling loop that checks recording status every 3 seconds, avoiding blocking thread for the full recording duration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
44b7de9 to
ba1a8cc
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Summary
camel cmd jfr-old-objects) with--stacktrace,--min-size,--topoptionsFeatures
[B,[Ljava.lang.String;) converted to human-readable form--min-size 1MBTest plan
--query,--stacktrace,--min-sizeoptions🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com