Skip to content

Python: Fix background agent telemetry context error#6764

Open
westey-m wants to merge 2 commits into
microsoft:mainfrom
westey-m:python-backgroundagents-telemetry
Open

Python: Fix background agent telemetry context error#6764
westey-m wants to merge 2 commits into
microsoft:mainfrom
westey-m:python-backgroundagents-telemetry

Conversation

@westey-m

Copy link
Copy Markdown
Contributor

Motivation & Context

Telemetry threw whenever an agent was invoked through the background-agents
provider. Running background agents (e.g. three concurrent tasks via
BackgroundAgentsProvider) failed with:

Task failed: <Token var=<ContextVar name='inner_response_telemetry_captured_fields' ...>> was created in a different Context

This made background agents unusable while OpenTelemetry instrumentation was
enabled.

Description & Review Guide

Root cause. AgentTelemetryLayer.run() is a synchronous method that returns
an awaitable. It eagerly called .set() on two telemetry context variables —
INNER_RESPONSE_TELEMETRY_CAPTURED_FIELDS and INNER_ACCUMULATED_USAGE — in the
caller's execution context, while the non-streaming _run() coroutine reset
those tokens in its finally block. BackgroundAgentsProvider schedules the
returned coroutine with asyncio.create_task(agent.run(...)), so the coroutine
runs in a new copied context. A contextvars.Token can only be reset in the
exact context that created it, so the .reset() raised
ValueError: <Token ...> was created in a different Context.

Normal await agent.run() worked because the eager .set() (caller context) and
the finally .reset() happened to run in the same context; the bug only
surfaced when the returned awaitable was scheduled onto a different task — a
legitimate and common pattern.

  • What are the major changes?

    • Non-streaming path: move the two .set() calls into the start of the
      _run() coroutine body, so the set and the existing finally reset always
      execute in the same context.
    • Streaming path: keep the eager .set() at the top of the if stream:
      block — the streaming branch returns a ResponseStream synchronously and is
      consumed in the same task/context, so its cleanup-hook reset runs in the
      context that created the tokens. Background agents are non-streaming, so the
      streaming path is unaffected.
    • Both branches are documented with comments explaining the context constraint.
    • Added a regression test
      (test_agent_run_contextvars_safe_when_awaited_in_different_context) that
      invokes run(stream=False) synchronously, then awaits the returned coroutine
      inside asyncio.create_task (a different context), and asserts it completes
      without the cross-context ValueError.
  • What is the impact of these changes?

    • Background agents run correctly with instrumentation enabled. Each invocation's
      contextvars are now set and reset within its own context copy, which also
      improves isolation across concurrent background agents. No public API changes;
      normal await agent.run() behavior is unchanged.
  • What do you want reviewers to focus on?

    • That moving the .set() into _run() for the non-streaming path (while keeping
      the streaming path eager) is the right split, and that the streaming cleanup
      hooks still reset in the token-creating context.

Related Issue

Fixes #6762

Contribution Checklist

  • The code builds clean without any errors or warnings
  • All unit tests pass, and I have added new tests where possible
  • The PR follows the Contribution Guidelines
  • This PR is linked to an issue and there is no other open PR for this issue (see Related Issue above).
  • This is not a breaking change. If it is a breaking change, add the breaking change label (or add "[BREAKING]" to the title prefix, before or after any language prefix) — a workflow keeps the label and title prefix in sync automatically.

Copilot AI review requested due to automatic review settings June 26, 2026 13:17
@moonbox3 moonbox3 added the python Usage: [Issues, PRs], Target: Python label Jun 26, 2026
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework
   observability.py10098791%408, 410–411, 414, 417, 420–421, 426–427, 433–434, 440–441, 448, 450–452, 455–457, 462–463, 469–470, 476–477, 484, 661–662, 861, 865–867, 869, 873–874, 878, 916, 918, 929–931, 933–935, 939, 947, 1071–1072, 1307, 1582–1583, 1694, 1832, 1874–1875, 2057, 2340–2343, 2349, 2375–2376, 2404–2415, 2534, 2537, 2549, 2566, 2570–2571, 2574, 2580, 2689, 2906, 2908
TOTAL42571508788% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
8342 37 💤 0 ❌ 0 🔥 2m 12s ⏱️

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Python OpenTelemetry instrumentation bug where non-streaming agent invocations could raise ValueError: <Token ...> was created in a different Context when AgentTelemetryLayer.run() is called synchronously and the returned awaitable is later awaited in a different task/context (e.g., BackgroundAgentsProvider via asyncio.create_task(agent.run(...))).

Changes:

  • Non-streaming agent path: move INNER_RESPONSE_TELEMETRY_CAPTURED_FIELDS / INNER_ACCUMULATED_USAGE contextvar .set() calls into the _run() coroutine so .set() and .reset() always occur in the same execution context.
  • Streaming agent path: keep eager contextvar .set() where the ResponseStream is created, with added commentary about the context/token constraint.
  • Add a regression test that calls run(stream=False) synchronously and awaits the returned coroutine inside asyncio.create_task(...).

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.

File Description
python/packages/core/agent_framework/observability.py Adjusts where telemetry contextvars are set to avoid cross-context token resets in non-streaming invocations; adds explanatory comments.
python/packages/core/tests/core/test_observability.py Adds a regression test covering awaiting run()’s returned awaitable in a separate task/context.
python/uv.lock Updates locked dependency constraints/versions (not described in PR motivation/scope).

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 5 | Confidence: 92%

✓ Correctness

The fix correctly moves contextvar .set() calls into the _run() coroutine body for the non-streaming path, ensuring set and reset always execute in the same context. The streaming path correctly keeps eager setting since it's consumed synchronously in the caller's context. The regression test accurately mimics the BackgroundAgentsProvider pattern. The lock file changes are unrelated dependency version pins. No correctness issues found.

✓ Security Reliability

The fix correctly addresses the cross-context ValueError by moving contextvars.set() calls into the _run() coroutine body for the non-streaming path, ensuring set/reset always happen in the same execution context. The streaming path remains correctly eager since it's consumed synchronously in the same context. The uv.lock changes are dependency version pins unrelated to the core fix. No security or reliability issues found.

✓ Test Coverage

The regression test covers the primary scenario well — verifying that the non-streaming path's contextvar set/reset works correctly when the coroutine is awaited in a different context via asyncio.create_task. The test assertions are meaningful (checks AgentResponse type, span count, and non-error status). However, the error path (where the inner agent raises) in a cross-context scenario is not tested, even though the finally block at line 1947-1949 performs the same token reset that caused the original bug.

✓ Failure Modes

The fix correctly moves contextvar .set() calls into the _run() coroutine body for the non-streaming path, ensuring set/reset pairs always execute in the same context. The streaming path appropriately keeps eager .set() since its cleanup runs synchronously in the same context. The regression test properly exercises the asyncio.create_task pattern used by BackgroundAgentsProvider. The uv.lock changes are unrelated dependency version pins. No failure modes introduced.

✓ Design Approach

The non-streaming fix is on the right track, but the streaming branch still relies on the same cross-context assumption that caused the original bug. Because ResponseStream cleanup hooks run in the context that consumes the stream, not the context that created it, eagerly setting the telemetry ContextVars before returning the stream can still raise the same Token was created in a different Context error when a stream is consumed from another task/context.


Automated review by westey-m's agents

Comment thread python/packages/core/agent_framework/observability.py Outdated
@westey-m westey-m marked this pull request as ready for review June 26, 2026 13:27

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 5 | Confidence: 91%

✓ Correctness

The core fix is correct: moving the contextvars .set() calls into the _run() coroutine body for the non-streaming path ensures set/reset always happen in the same execution context, fixing the cross-context ValueError when asyncio.create_task schedules the returned coroutine. The test properly reproduces the background-agent pattern. The streaming path concern is already covered by an existing unresolved review comment. No new correctness issues found.

✓ Security Reliability

The non-streaming fix is correct: moving the contextvar .set() calls into the _run() coroutine body ensures set/reset always happen in the same execution context, resolving the cross-context ValueError for background agents. The try/finally block properly guarantees token cleanup on all paths. The streaming path concern is already covered by an existing unresolved review comment and is not restated here. No new security or reliability issues found in this change.

✓ Test Coverage

The regression test is well-crafted and directly exercises the exact failure scenario (non-streaming run() called synchronously, then awaited via asyncio.create_task in a different context). Assertions are meaningful — checking successful completion, span count, and no error status. The fix itself correctly co-locates the contextvar .set() and .reset() inside the _run() coroutine body for the non-streaming path. One minor test coverage gap: no error-path variant of the cross-context test (agent raises inside _run() while awaited in a different context). The streaming cross-context concern is already flaged by an existing unresolved review comment, so I won't duplicate it.

✓ Failure Modes

The non-streaming path fix is correct: moving the contextvar .set() calls into the _run() coroutine body ensures set and reset always execute in the same context, fixing the cross-context ValueError for BackgroundAgentsProvider. The finally block at lines 1948-1949 properly resets both tokens. The streaming path concern (cleanup hooks running in the consumer's context rather than the token-creating context) is already captured by the existing unresolved review comment. No additional silent failures, lost errors, or operational failure modes are introduced by this change.

✓ Design Approach

I did not find any additional design-approach issues beyond the already-open streaming-context concern in the review thread. The non-streaming change matches the documented background-agent call path, and the new regression test covers the specific cross-context await pattern the PR is fixing.

Suggestions

  • Consider adding an error-path variant of the cross-context regression test — e.g. an agent whose run() returns a coroutine that raises, awaited via asyncio.create_task. This would verify the finally block (observability.py:1947-1949) properly resets the tokens even when the exception propagates across context boundaries. The mechanism should already work correctly since both .set() and finally .reset() live inside _run(), but an explicit test would guard against future regressions.

Automated review by westey-m's agents

Comment thread python/uv.lock
{ name = "ag-ui-protocol", specifier = ">=0.1.16,<0.2" },
{ name = "agent-framework-core", editable = "packages/core" },
{ name = "fastapi", specifier = ">=0.115.0,<0.138.1" },
{ name = "fastapi", specifier = ">=0.115.0,<0.133.1" },

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're downgrading multiple packages, that can't be right

Comment on lines +1835 to +1837
# The contextvars are set lazily on the first pull, which only happens after this
# synchronous setup phase returns the stream, so the tokens are still unset here and
# there is nothing to reset.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the comments here isn't associated with any actual code. I suspect these are copilot notes. Maybe we can remove.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Usage: [Issues, PRs], Target: Python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: Fix background agent telemetry context issue

5 participants