Skip to content

Reasoning fixes#37

Open
skrimix wants to merge 2 commits intoCaddyGlow:mainfrom
skrimix:fix/thinking-fix
Open

Reasoning fixes#37
skrimix wants to merge 2 commits intoCaddyGlow:mainfrom
skrimix:fix/thinking-fix

Conversation

@skrimix
Copy link

@skrimix skrimix commented Jan 28, 2026

This PR fixes several issues around handling thinking blocks:

Anthropic API: Accept thinking blocks in requests

Clients that try to pass thinking blocks back in consecutive requests were getting an error:

body -> messages -> 1 -> content -> str: Input should be a valid string;
body -> messages -> 1 -> content -> list[...] -> 0: Input tag 'thinking' found using 'type' does not match any of the expected tags: 'text', 'image', 'tool_use', 'tool_result'

Fix: Added ThinkingBlock and RedactedThinkingBlock to the list of accepted request content blocks.

OpenAI API: Use more common thinking tag format

Thinking blocks were formatted as <thinking signature="Eok...">, which I'm not sure there's any client that can handle.

Fix: Changed to the common <think> tag format.

OpenAI API: Fix streaming reasoning chunks

When streaming reasoning content, each chunk was being enclosed in its own thinking tag, resulting in broken output:

<thinking>User</thinking><thinking> wants to simpl</thinking><thinking>ify -</thinking>...

Fix: Moved the enclosing tag logic into the start and stop event handlers so the tags wrap the entire thinking content rather than each chunk.

@CaddyGlow
Copy link
Owner

@skrimix thank you.

Renaming the "thinking" block to "think" is fine. However, we should keep the signature attribute or pass it back another way. Does removing it cause issues with tools that handle think blocks?

From what I remember, I included the signature so we can reconstruct the thinking block when needed for tool use. I don't know how other tools handle this, but Anthropic requires it.

https://platform.claude.com/docs/en/api/messages#thinking_block

https://platform.claude.com/docs/en/build-with-claude/extended-thinking

"Preserving thinking blocks: During tool use, you must pass thinking blocks back to the API for the last assistant message. Include the complete unmodified block back to the API to maintain reasoning continuity."

https://platform.claude.com/docs/en/build-with-claude/extended-thinking#preserving-thinking-blocks

"During tool use, you must pass thinking blocks back to the API, and you must include the complete unmodified block back to the API. This is critical for maintaining the model's reasoning flow and conversation integrity."

@skrimix
Copy link
Author

skrimix commented Jan 29, 2026

I'll be honest, I didn't look too much into the tool calling side of things, since that isn't in my use case and so I missed that and likely broke whatever handling is there. I apologize.
As far as I can tell, handling reasoning and tool calling is tricky, since Responses API and Messages API handle things differently, and Chat Completions just doesn't have "passing back thinking" in any standardized way.

Responses API -> Messages API

In this case I guess we could try to glue it together by treating "thinking" field as "summary", and "signature" as "encrypted_content". Not sure.

Chat Completions API -> Messages API

This is even messier. Many providers don't support passing thinking in requests, so I'm guessing clients too. Those who do support that (e.g. Z.AI, Moonshot) use a simple "reasoning_context" text field, which isn't enough for handling Anthropic.
Since it's normal* for OpenAI clients to simply strip the thinking parts of responses perhaps it might even be desirable to intentionally avoid the parsing by using custom <thinking signature="..."> tags. I can revert the change around this part if you want.

    • my experience with custom providers is mostly through simpler chatbot kinda clients without much tool use, so I might be latest info.

LiteLLM's example

LiteLLM seems to be using both "reasoning_content" and a different "thinking_blocks" field specific to Anthropic, on their Chat Completions endpoint.
"Compatibility Notice
Anthropic extended thinking with tool calling is not fully compatible with OpenAI-compatible API clients. This is due to fundamental architectural differences between how OpenAI and Anthropic handle reasoning in multi-turn conversations.

When using Anthropic models with thinking enabled and tool calling, you must include thinking_blocks from the previous assistant response when sending tool results back. Failure to do so will result in a 400 Bad Request error."
This might be a viable approach. However, when I tried to trace all handling of thinking blocks around the code I quickly got lost, and I'm not yet comfortable with delegating everything to CC, so I probably won't be of much help here.

@CaddyGlow CaddyGlow requested a review from Copilot February 24, 2026 20:17
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes handling of “thinking”/reasoning blocks across Anthropic and OpenAI compatibility layers, ensuring request validation accepts thinking blocks and streaming output wraps reasoning in a single tag pair.

Changes:

  • Accept thinking and redacted_thinking blocks in Anthropic request content validation.
  • Switch OpenAI-facing reasoning markup from <thinking ...> to <think>.
  • Fix streaming so <think>/</think> wrap the entire thinking block rather than each delta chunk.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
ccproxy/llms/models/anthropic.py Expands request content block union to accept thinking-related blocks.
ccproxy/llms/formatters/anthropic_to_openai/streams.py Moves <think> wrapping to block start/stop events for correct streaming output.
ccproxy/llms/formatters/anthropic_to_openai/responses.py Normalizes non-streaming conversions to use <think> and skips redacted thinking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1396 to +1402
openai_models.StreamingChoice(
index=0,
delta=openai_models.DeltaMessage(
role="assistant", content="<think>"
),
finish_reason=None,
)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For OpenAI-compatible streaming, delta.role is typically only sent once at the start of the assistant message (many clients assume subsequent chunks omit it). For these synthetic <think> wrapper chunks, consider omitting role (i.e., only set content) or only including role if it hasn’t been emitted yet for the message.

Copilot uses AI. Check for mistakes.
openai_models.StreamingChoice(
index=0,
delta=openai_models.DeltaMessage(
role="assistant", content="</think>"
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the opening-tag emission: emitting delta.role=\"assistant\" on this closing-tag chunk may be inconsistent with common streaming expectations. Prefer omitting role (or gate it behind a ‘role already emitted’ flag) for these wrapper-only chunks.

Suggested change
role="assistant", content="</think>"
content="</think>"

Copilot uses AI. Check for mistakes.
Comment on lines +1390 to +1404
yield openai_models.ChatCompletionChunk(
id="chatcmpl-stream",
object="chat.completion.chunk",
created=0,
model=model_id,
choices=[
openai_models.StreamingChoice(
index=0,
delta=openai_models.DeltaMessage(
role="assistant", content="<think>"
),
finish_reason=None,
)
],
)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ChatCompletionChunk construction for emitting wrapper tags is duplicated (opening and closing) with many identical fields. Consider extracting a small helper/factory (e.g., emit_text_chunk(content: str, *, role: str | None = None)) to reduce repetition and the risk of future inconsistencies across these synthetic chunks.

Copilot uses AI. Check for mistakes.
@CaddyGlow
Copy link
Owner

@copilot open a new pull request to apply changes based on the comments in this thread

@nikhilbatra789work
Copy link

@CaddyGlow
Are you planning to add support for new claude and codex models support?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants