fix(embed): don't use a non-leading heading as embedded notebook title#14639
Draft
cderv wants to merge 8 commits into
Draft
fix(embed): don't use a non-leading heading as embedded notebook title#14639cderv wants to merge 8 commits into
cderv wants to merge 8 commits into
Conversation
Collaborator
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
…tle (#14577) When embedding a notebook with no YAML title and no markdown heading cell, the "Source" link and sidebar showed a Python comment (e.g. `# plt.savefig(...)`) instead of the notebook filename. findTitle() in jupyter-embed.ts scans every cell's markdown, including code cells whose markdown is their fenced source. partitionMarkdown is not fence-aware, so a `#` comment line inside the fence matches the ATX-heading regex and was returned as the title. The non-embed code path already guards against this: the Dec-2023 "Improve title snipping in notebooks" work (24672a4, fixes #5363/#6411) made fixupFrontMatter only promote a heading to a notebook title when no content precedes it. findTitle predates that change and was never brought in line, so it still promoted any heading-like line. Apply the same `!contentBeforeHeading` gate here by exposing contentBeforeHeading (already computed by markdownWithExtractedHeading) through PartitionedMarkdown. A fenced code block's opening delimiter always counts as content before the comment, so the comment is no longer mistaken for a title and the filename is used instead.
The teardowns called raw Deno.removeSync, which throws NotFound when a preview artifact is already absent (e.g. a prior teardown partially ran, or a sibling test cleaned a shared path). A throwing teardown aborts the test and leaves the fixture directory half-cleaned, so the next run trips over stale state. safeRemoveSync (deno_ral/fs.ts) tolerates already-removed paths and only rethrows genuine errors, keeping cleanup idempotent.
postRenderCleanup registered entries were removed with a non-recursive Deno.removeSync, so an entry could only be a single file. Embedded-notebook tests produce a `*_files` support directory alongside the preview HTML, which could not be declared for cleanup and was left behind. Use safeRemoveSync with recursive removal so a registered entry can be a directory as well as a file.
The regression test for the embedded-notebook title fix started as a bespoke testRender block in render-embed.test.ts with manual teardown of the preview artifacts. smoke-all is the established home for issue-regression tests and its framework already handles output cleanup, so move it there: a 14577.qmd that embeds notebook.ipynb and asserts the Source link uses the filename, not the Python comment, with the preview artifacts declared for postRenderCleanup.
These asserted pre-existing, unchanged extraction behavior plus a trivial pass-through (partitionMarkdown now forwards contentBeforeHeading, a value markdownWithExtractedHeading already computed). They pinned behavior rather than exercising the fix. The fix lives in findTitle and is covered end-to-end by the smoke-all regression test, so the unit additions added no real coverage.
Record why findTitle can't mirror fixupFrontMatter's markdown-cells-only guard: it runs on rendered JupyterCellOutput, which carries no cell_type, so the !contentBeforeHeading check is the available equivalent for excluding a code cell's fenced source.
Add a smoke-all test for an embedded notebook whose only heading has prose before it: the heading must not become the embed title, which falls back to the filename. This is the same rule fixupFrontMatter applies to a notebook's own front-matter title since 24672a4 (#5363/#6411); the embed title now aligns with it. Broaden the changelog entry to describe this general behavior, not just the code-comment case.
…hared gate The comment and changelog claimed findTitle now derives the title the same way fixupFrontMatter does. Only the content-before-heading gate is shared: findTitle still scans every cell (fixupFrontMatter inspects only the first markdown cell) and cannot filter on cell_type, so the two are not equivalent. State that the gate is borrowed and list the divergences, so the comparison is not read as a guarantee of identical behavior.
Member
Author
|
I am putting that as a draft, because maybe option B is better... I need to think about it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
An embedded notebook with no title and no heading cell used a code-cell comment (e.g.
# plt.savefig(...)) as its "Source" title. This PR gates heading-to-title promotion infindTitleon "nothing precedes the heading" (option A), so a#inside a code fence can't be the title and the filename is used instead. A side effect: a real markdown heading with prose before it also falls back to the filename — intentional, because a standalone notebook's own title already behaves that way. A more surgical fence-aware fix (option B) is described below but not taken here.When embedding a notebook that has no YAML title and no markdown heading cell, the "Source" link and sidebar showed a Python code comment (e.g.
# plt.savefig(...)) instead of the notebook filename.Root Cause
findTitle()insrc/core/jupyter/jupyter-embed.tsscans every cell's markdown to find a title. A code cell's markdown is its fenced source, andpartitionMarkdownis not fence-aware, so a#comment line inside the fence matches the ATX-heading regex and gets returned as the title.The non-embed path already guards against exactly this. The Dec-2023 "Improve title snipping in notebooks" change (24672a4, #5363/#6411) made
fixupFrontMatteronly promote a heading to the notebook title when no content precedes it.findTitlepredates that work and was never brought in line, so it still promoted any heading-like line.Fix (option A)
Expose
contentBeforeHeading(already computed bymarkdownWithExtractedHeading) throughPartitionedMarkdown, and gatefindTitleon!partitioned.contentBeforeHeading. A fenced code block's opening delimiter always counts as content before the comment, so the comment is no longer mistaken for a title and the filename is used instead.This borrows the same content-before-heading gate that
fixupFrontMatteruses. The two functions are not otherwise equivalent —findTitlescans every cell and can't filter oncell_type, whereasfixupFrontMatterinspects only the first markdown cell.Behavior note
Because the gate keys off "content before the heading" rather than "is this a code cell", it also affects a real markdown cell whose heading has prose before it: that heading is no longer promoted, and the embed "Source" title falls back to the filename.
This is intentional and is the reason to prefer option A. A standalone notebook with prose before its first heading already gets no title from that heading —
fixupFrontMatteronly promotes a heading when nothing precedes it (since #5363 / #6411). Option A makes the embed "Source" title behave the same way as the notebook's own title, instead of using a heading that the rest of Quarto would not treat as the document title.The documented precedence is YAML
title, then the first markdown heading, then the filename. Option A is marginally stricter than the literal "first markdown heading" wording for the prose-before-heading case, but matches the actual title behavior of a standalone notebook.Alternative (option B, not taken here)
The root cause mcanouil identified is that heading extraction is not fence-aware. Option B would make
markdownWithExtractedHeading(orfindTitle) skip#lines inside fenced code, fixing only the code-comment case and leaving a prose-before-heading markdown cell promotable — matching the literal documented "first markdown heading" wording.Option B is the more surgical fix for the reported bug, but it touches every caller of
markdownWithExtractedHeadingand would leave the embed title behaving differently from the standalone notebook title for the prose-before-heading case. This PR takes option A to keep the two title code paths consistent; option B remains available if we later prefer to track the documented wording exactly.Tests
Two smoke-all tests under
tests/docs/smoke-all/2026/06/30/:14577.qmd— the reported case: an embed of a notebook whose only cell is a#-comment code cell, asserting the Source link uses the filename and never the comment.embed-heading-after-content.qmd— the general alignment: an embed of a notebook whose only heading has content before it, asserting the Source link falls back to the filename rather than the heading.Two incidental test-infra cleanups came out of writing that test:
render-embed.test.tsteardowns now usesafeRemoveSyncinstead of rawDeno.removeSync, so a teardown no longer throws (and aborts, leaving stale state) when a preview artifact is already gone.postRenderCleanupnow removes recursively, so a registered cleanup entry can be a*_filessupport directory and not only a single file.Fixes #14577