Reasoning fixes by skrimix · Pull Request #37 · CaddyGlow/ccproxy-api

skrimix · 2026-01-28T15:32:49Z

This PR fixes several issues around handling thinking blocks:

Anthropic API: Accept thinking blocks in requests

Clients that try to pass thinking blocks back in consecutive requests were getting an error:

body -> messages -> 1 -> content -> str: Input should be a valid string;
body -> messages -> 1 -> content -> list[...] -> 0: Input tag 'thinking' found using 'type' does not match any of the expected tags: 'text', 'image', 'tool_use', 'tool_result'

Fix: Added ThinkingBlock and RedactedThinkingBlock to the list of accepted request content blocks.

OpenAI API: Use more common thinking tag format

Thinking blocks were formatted as <thinking signature="Eok...">, which I'm not sure there's any client that can handle.

Fix: Changed to the common <think> tag format.

OpenAI API: Fix streaming reasoning chunks

When streaming reasoning content, each chunk was being enclosed in its own thinking tag, resulting in broken output:

<thinking>User</thinking><thinking> wants to simpl</thinking><thinking>ify -</thinking>...

Fix: Moved the enclosing tag logic into the start and stop event handlers so the tags wrap the entire thinking content rather than each chunk.

CaddyGlow · 2026-01-28T22:10:01Z

@skrimix thank you.

Renaming the "thinking" block to "think" is fine. However, we should keep the signature attribute or pass it back another way. Does removing it cause issues with tools that handle think blocks?

From what I remember, I included the signature so we can reconstruct the thinking block when needed for tool use. I don't know how other tools handle this, but Anthropic requires it.

https://platform.claude.com/docs/en/api/messages#thinking_block

https://platform.claude.com/docs/en/build-with-claude/extended-thinking

"Preserving thinking blocks: During tool use, you must pass thinking blocks back to the API for the last assistant message. Include the complete unmodified block back to the API to maintain reasoning continuity."

https://platform.claude.com/docs/en/build-with-claude/extended-thinking#preserving-thinking-blocks

"During tool use, you must pass thinking blocks back to the API, and you must include the complete unmodified block back to the API. This is critical for maintaining the model's reasoning flow and conversation integrity."

skrimix · 2026-01-29T15:43:08Z

I'll be honest, I didn't look too much into the tool calling side of things, since that isn't in my use case and so I missed that and likely broke whatever handling is there. I apologize.
As far as I can tell, handling reasoning and tool calling is tricky, since Responses API and Messages API handle things differently, and Chat Completions just doesn't have "passing back thinking" in any standardized way.

Responses API -> Messages API

In this case I guess we could try to glue it together by treating "thinking" field as "summary", and "signature" as "encrypted_content". Not sure.

Chat Completions API -> Messages API

This is even messier. Many providers don't support passing thinking in requests, so I'm guessing clients too. Those who do support that (e.g. Z.AI, Moonshot) use a simple "reasoning_context" text field, which isn't enough for handling Anthropic.
Since it's normal* for OpenAI clients to simply strip the thinking parts of responses perhaps it might even be desirable to intentionally avoid the parsing by using custom <thinking signature="..."> tags. I can revert the change around this part if you want.

- my experience with custom providers is mostly through simpler chatbot kinda clients without much tool use, so I might be latest info.

LiteLLM's example

LiteLLM seems to be using both "reasoning_content" and a different "thinking_blocks" field specific to Anthropic, on their Chat Completions endpoint.
"Compatibility Notice
Anthropic extended thinking with tool calling is not fully compatible with OpenAI-compatible API clients. This is due to fundamental architectural differences between how OpenAI and Anthropic handle reasoning in multi-turn conversations.

When using Anthropic models with thinking enabled and tool calling, you must include thinking_blocks from the previous assistant response when sending tool results back. Failure to do so will result in a 400 Bad Request error."
This might be a viable approach. However, when I tried to trace all handling of thinking blocks around the code I quickly got lost, and I'm not yet comfortable with delegating everything to CC, so I probably won't be of much help here.

Copilot

Pull request overview

This PR fixes handling of “thinking”/reasoning blocks across Anthropic and OpenAI compatibility layers, ensuring request validation accepts thinking blocks and streaming output wraps reasoning in a single tag pair.

Changes:

Accept thinking and redacted_thinking blocks in Anthropic request content validation.
Switch OpenAI-facing reasoning markup from <thinking ...> to <think>.
Fix streaming so <think>/</think> wrap the entire thinking block rather than each delta chunk.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
ccproxy/llms/models/anthropic.py	Expands request content block union to accept thinking-related blocks.
ccproxy/llms/formatters/anthropic_to_openai/streams.py	Moves `<think>` wrapping to block start/stop events for correct streaming output.
ccproxy/llms/formatters/anthropic_to_openai/responses.py	Normalizes non-streaming conversions to use `<think>` and skips redacted thinking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-24T20:19:16Z

ccproxy/llms/formatters/anthropic_to_openai/streams.py

+                                openai_models.StreamingChoice(
+                                    index=0,
+                                    delta=openai_models.DeltaMessage(
+                                        role="assistant", content="<think>"
+                                    ),
+                                    finish_reason=None,
+                                )


For OpenAI-compatible streaming, delta.role is typically only sent once at the start of the assistant message (many clients assume subsequent chunks omit it). For these synthetic <think> wrapper chunks, consider omitting role (i.e., only set content) or only including role if it hasn’t been emitted yet for the message.

Copilot · 2026-02-24T20:19:17Z

ccproxy/llms/formatters/anthropic_to_openai/streams.py

+                                openai_models.StreamingChoice(
+                                    index=0,
+                                    delta=openai_models.DeltaMessage(
+                                        role="assistant", content="</think>"


Same as the opening-tag emission: emitting delta.role=\"assistant\" on this closing-tag chunk may be inconsistent with common streaming expectations. Prefer omitting role (or gate it behind a ‘role already emitted’ flag) for these wrapper-only chunks.

Suggested change

role="assistant", content="</think>"

content="</think>"

Copilot · 2026-02-24T20:19:17Z

ccproxy/llms/formatters/anthropic_to_openai/streams.py

+                        yield openai_models.ChatCompletionChunk(
+                            id="chatcmpl-stream",
+                            object="chat.completion.chunk",
+                            created=0,
+                            model=model_id,
+                            choices=[
+                                openai_models.StreamingChoice(
+                                    index=0,
+                                    delta=openai_models.DeltaMessage(
+                                        role="assistant", content="<think>"
+                                    ),
+                                    finish_reason=None,
+                                )
+                            ],
+                        )


The ChatCompletionChunk construction for emitting wrapper tags is duplicated (opening and closing) with many identical fields. Consider extracting a small helper/factory (e.g., emit_text_chunk(content: str, *, role: str | None = None)) to reduce repetition and the risk of future inconsistencies across these synthetic chunks.

CaddyGlow · 2026-02-25T09:16:56Z

@copilot open a new pull request to apply changes based on the comments in this thread

nikhilbatra789work · 2026-02-25T17:23:42Z

@CaddyGlow
Are you planning to add support for new claude and codex models support?

skrimix added 2 commits January 28, 2026 20:07

fix: Allow passing back thinking blocks

027b1e2

fix: Fix thinking tag and streaming reasoning chunks

f51e85a

CaddyGlow requested a review from Copilot February 24, 2026 20:17

Copilot AI reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reasoning fixes#37

Reasoning fixes#37
skrimix wants to merge 2 commits intoCaddyGlow:mainfrom
skrimix:fix/thinking-fix

skrimix commented Jan 28, 2026

Uh oh!

CaddyGlow commented Jan 28, 2026

Uh oh!

skrimix commented Jan 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

CaddyGlow commented Feb 25, 2026

Uh oh!

nikhilbatra789work commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

skrimix commented Jan 28, 2026

Anthropic API: Accept thinking blocks in requests

OpenAI API: Use more common thinking tag format

OpenAI API: Fix streaming reasoning chunks

Uh oh!

CaddyGlow commented Jan 28, 2026

Uh oh!

skrimix commented Jan 29, 2026

Responses API -> Messages API

Chat Completions API -> Messages API

LiteLLM's example

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

CaddyGlow commented Feb 25, 2026

Uh oh!

nikhilbatra789work commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants