Skip to content

Capture global ORDER BY requirement under ScalarSubqueryExec root#23146

Draft
sgrebnov wants to merge 1 commit into
apache:mainfrom
spiceai:sgrebnov/scalar-subquery-output-requirements
Draft

Capture global ORDER BY requirement under ScalarSubqueryExec root#23146
sgrebnov wants to merge 1 commit into
apache:mainfrom
spiceai:sgrebnov/scalar-subquery-output-requirements

Conversation

@sgrebnov

@sgrebnov sgrebnov commented Jun 24, 2026

Copy link
Copy Markdown
Member

Draft / WIP. Opening early for visibility while I'm still working on additional visible .slt-based coverage on top of unit test

Which issue does this PR close?

After upgrading from DataFusion 53 to 54 queries with ScalarSubquery started returning unordered results (ORDER BY was not respected)

Rationale for this change

OutputRequirements walks the plan via require_top_ordering_helper to capture the query's global ORDER BY as an OutputRequirementExec. The helper bails on any node with more than one child (children.len() != 1). ScalarSubqueryExec is such a node — child 0 is the order-transparent main input and the rest are uncorrelated subquery plans — so when it is the plan root the rule stops immediately and stamps an empty OutputRequirementExec(order_by=[], dist_by=Unspecified) at the top, never recording the global ordering requirement.

ScalarSubqueryExec is order-transparent on its main input (maintains_input_order()[0] == true, no required input ordering), so the search should descend through child 0, as it does for other single-child order-preserving operators.

What changes are included in this PR?

require_top_ordering_helper now special-cases ScalarSubqueryExec: it descends through child 0 to find and wrap the top SortExec, reattaching the subquery children unchanged.

Are these changes tested?

A new unit test (add_mode_descends_through_scalar_subquery that asserts the
OutputRequirementExec carrying the real ordering is placed below the ScalarSubqueryExec; without the fix the rule produces the empty order_by=[], dist_by=Unspecified requirement at the root.

Existing subquery.slt / TPC-H plan snapshots are unchanged: for built-in sources EnforceSorting independently rebuilds the order-preserving merge, so final plans are identical. WIP: adding visible/end-to-end coverage on top of the unit test.

Are there any user-facing changes?

No.

require_top_ordering_helper bailed on ScalarSubqueryExec because it has
multiple children (main input + subquery plans), so the rule stamped an
empty OutputRequirementExec (order_by=[], dist_by=Unspecified) at the root
and never recorded the query's global ORDER BY requirement.

Descend through child 0 (the order-transparent main input:
maintains_input_order()[0] == true, no required input ordering) to find
and protect the top SortExec, reattaching the subquery children unchanged.
@github-actions github-actions Bot added optimizer Optimizer rules core Core DataFusion crate labels Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant