openai
diff --git a/‎docs/ref/checks/custom_prompt_check.md‎
Lines changed: 5 additions & 0 deletions b/‎docs/ref/checks/custom_prompt_check.md‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎docs/ref/checks/hallucination_detection.md‎
Lines changed: 15 additions & 8 deletions b/‎docs/ref/checks/hallucination_detection.md‎
Lines changed: 15 additions & 8 deletions
diff --git a/‎docs/ref/checks/jailbreak.md‎
Lines changed: 7 additions & 2 deletions b/‎docs/ref/checks/jailbreak.md‎
Lines changed: 7 additions & 2 deletions
diff --git a/‎docs/ref/checks/llm_base.md‎
Lines changed: 6 additions & 1 deletion b/‎docs/ref/checks/llm_base.md‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎docs/ref/checks/nsfw.md‎
Lines changed: 5 additions & 0 deletions b/‎docs/ref/checks/nsfw.md‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎docs/ref/checks/off_topic_prompts.md‎
Lines changed: 6 additions & 1 deletion b/‎docs/ref/checks/off_topic_prompts.md‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎docs/ref/checks/prompt_injection_detection.md‎
Lines changed: 10 additions & 2 deletions b/‎docs/ref/checks/prompt_injection_detection.md‎
Lines changed: 10 additions & 2 deletions
diff --git a/‎src/guardrails/checks/text/hallucination_detection.py‎
Lines changed: 6 additions & 13 deletions b/‎src/guardrails/checks/text/hallucination_detection.py‎
Lines changed: 6 additions & 13 deletions
diff --git a/‎src/guardrails/checks/text/jailbreak.py‎
Lines changed: 5 additions & 12 deletions b/‎src/guardrails/checks/text/jailbreak.py‎
Lines changed: 5 additions & 12 deletions
diff --git a/‎src/guardrails/checks/text/llm_base.py‎
Lines changed: 39 additions & 3 deletions b/‎src/guardrails/checks/text/llm_base.py‎
Lines changed: 39 additions & 3 deletions
@@ -20,6 +20,10 @@ Implements custom content checks using configurable LLM prompts. Uses your custo
 - **`model`** (required): Model to use for the check (e.g., "gpt-5")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
 - **`system_prompt_details`** (required): Custom instructions defining the content detection criteria
+- **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
+    - When `false`: The LLM only generates the essential fields (`flagged` and `confidence`), reducing token generation costs
+    - When `true`: Additionally, returns detailed reasoning for its decisions
+    - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
 
 ## Implementation Notes
 
@@ -42,3 +46,4 @@ Returns a `GuardrailResult` with the following `info` dictionary:
 - **`flagged`**: Whether the custom validation criteria were met
 - **`confidence`**: Confidence score (0.0 to 1.0) for the validation
 - **`threshold`**: The confidence threshold that was configured
+- **`reason`**: Explanation of why the input was flagged (or not flagged) - *only included when `include_reasoning=true`*
@@ -14,7 +14,8 @@ Flags model text containing factual claims that are clearly contradicted or not
     "config": {
         "model": "gpt-4.1-mini",
         "confidence_threshold": 0.7,
-        "knowledge_source": "vs_abc123"
+        "knowledge_source": "vs_abc123",
+        "include_reasoning": false
     }
 }
 ```
@@ -24,6 +25,10 @@ Flags model text containing factual claims that are clearly contradicted or not
 - **`model`** (required): OpenAI model (required) to use for validation (e.g., "gpt-4.1-mini")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
 - **`knowledge_source`** (required): OpenAI vector store ID starting with "vs_" containing reference documents
+- **`include_reasoning`** (optional): Whether to include detailed reasoning fields in the output (default: `false`)
+    - When `false`: Returns only `flagged` and `confidence` to save tokens
+    - When `true`: Additionally, returns `reasoning`, `hallucination_type`, `hallucinated_statements`, and `verified_statements`
+    - Recommended: Keep disabled for production (default); enable for development/debugging
 
 ### Tuning guidance
 
@@ -102,7 +107,9 @@ See [`examples/hallucination_detection/`](https://github.com/openai/openai-guard
 
 ## What It Returns
 
-Returns a `GuardrailResult` with the following `info` dictionary:
+Returns a `GuardrailResult` with the following `info` dictionary.
+
+**With `include_reasoning=true`:**
 
 ```json
 {
@@ -117,15 +124,15 @@ Returns a `GuardrailResult` with the following `info` dictionary:
 }
 ```
 
+### Fields
+
 - **`flagged`**: Whether the content was flagged as potentially hallucinated
 - **`confidence`**: Confidence score (0.0 to 1.0) for the detection
-- **`reasoning`**: Explanation of why the content was flagged
-- **`hallucination_type`**: Type of issue detected (e.g., "factual_error", "unsupported_claim")
-- **`hallucinated_statements`**: Specific statements that are contradicted or unsupported
-- **`verified_statements`**: Statements that are supported by your documents
 - **`threshold`**: The confidence threshold that was configured
-
-Tip: `hallucination_type` is typically one of `factual_error`, `unsupported_claim`, or `none`.
+- **`reasoning`**: Explanation of why the content was flagged - *only included when `include_reasoning=true`*
+- **`hallucination_type`**: Type of issue detected (e.g., "factual_error", "unsupported_claim", "none") - *only included when `include_reasoning=true`*
+- **`hallucinated_statements`**: Specific statements that are contradicted or unsupported - *only included when `include_reasoning=true`*
+- **`verified_statements`**: Statements that are supported by your documents - *only included when `include_reasoning=true`*
 
 ## Benchmark Results
 
 
@@ -33,7 +33,8 @@ Detects attempts to bypass safety or policy constraints via manipulation (prompt
     "name": "Jailbreak",
     "config": {
         "model": "gpt-4.1-mini",
-        "confidence_threshold": 0.7
+        "confidence_threshold": 0.7,
+        "include_reasoning": false
     }
 }
 ```
@@ -42,6 +43,10 @@ Detects attempts to bypass safety or policy constraints via manipulation (prompt
 
 - **`model`** (required): Model to use for detection (e.g., "gpt-4.1-mini")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
+- **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
+    - When `false`: The LLM only generates the essential fields (`flagged` and `confidence`), reducing token generation costs
+    - When `true`: Additionally, returns detailed reasoning for its decisions
+    - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
 
 ### Tuning guidance
 
@@ -70,7 +75,7 @@ Returns a `GuardrailResult` with the following `info` dictionary:
 - **`flagged`**: Whether a jailbreak attempt was detected
 - **`confidence`**: Confidence score (0.0 to 1.0) for the detection
 - **`threshold`**: The confidence threshold that was configured
-- **`reason`**: Explanation of why the input was flagged (or not flagged)
+- **`reason`**: Explanation of why the input was flagged (or not flagged) - *only included when `include_reasoning=true`*
 - **`used_conversation_history`**: Boolean indicating whether conversation history was analyzed
 - **`checked_text`**: JSON payload containing the conversation history and latest input that was analyzed
 
 
@@ -9,7 +9,8 @@ Base configuration for LLM-based guardrails. Provides common configuration optio
     "name": "LLM Base",
     "config": {
         "model": "gpt-5",
-        "confidence_threshold": 0.7
+        "confidence_threshold": 0.7,
+        "include_reasoning": false
     }
 }
 ```
@@ -18,6 +19,10 @@ Base configuration for LLM-based guardrails. Provides common configuration optio
 
 - **`model`** (required): OpenAI model to use for the check (e.g., "gpt-5")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
+- **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
+  - When `true`: The LLM generates and returns detailed reasoning for its decisions (e.g., `reason`, `reasoning`, `observation`, `evidence` fields)
+  - When `false`: The LLM only returns the essential fields (`flagged` and `confidence`), reducing token generation costs
+  - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
 
 ## What It Does
 
 
@@ -29,6 +29,10 @@ Flags workplace‑inappropriate model outputs: explicit sexual content, profanit
 
 - **`model`** (required): Model to use for detection (e.g., "gpt-4.1-mini")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
+- **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
+    - When `false`: The LLM only generates the essential fields (`flagged` and `confidence`), reducing token generation costs
+    - When `true`: Additionally, returns detailed reasoning for its decisions
+    - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
 
 ### Tuning guidance
 
@@ -51,6 +55,7 @@ Returns a `GuardrailResult` with the following `info` dictionary:
 - **`flagged`**: Whether NSFW content was detected
 - **`confidence`**: Confidence score (0.0 to 1.0) for the detection
 - **`threshold`**: The confidence threshold that was configured
+- **`reason`**: Explanation of why the input was flagged (or not flagged) - *only included when `include_reasoning=true`*
 
 ### Examples
 
 
@@ -20,6 +20,10 @@ Ensures content stays within defined business scope using LLM analysis. Flags co
 - **`model`** (required): Model to use for analysis (e.g., "gpt-5")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
 - **`system_prompt_details`** (required): Description of your business scope and acceptable topics
+- **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
+    - When `false`: The LLM only generates the essential fields (`flagged` and `confidence`), reducing token generation costs
+    - When `true`: Additionally, returns detailed reasoning for its decisions
+    - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
 
 ## Implementation Notes
 
@@ -40,5 +44,6 @@ Returns a `GuardrailResult` with the following `info` dictionary:
 ```
 
 - **`flagged`**: Whether the content aligns with your business scope
-- **`confidence`**: Confidence score (0.0 to 1.0) for the prompt injection detection assessment
+- **`confidence`**: Confidence score (0.0 to 1.0) for the assessment
 - **`threshold`**: The confidence threshold that was configured
+- **`reason`**: Explanation of why the input was flagged (or not flagged) - *only included when `include_reasoning=true`*
@@ -31,7 +31,8 @@ After tool execution, the prompt injection detection check validates that the re
     "name": "Prompt Injection Detection",
     "config": {
         "model": "gpt-4.1-mini",
-        "confidence_threshold": 0.7
+        "confidence_threshold": 0.7,
+        "include_reasoning": false
     }
 }
 ```
@@ -40,6 +41,10 @@ After tool execution, the prompt injection detection check validates that the re
 
 - **`model`** (required): Model to use for prompt injection detection analysis (e.g., "gpt-4.1-mini")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
+- **`include_reasoning`** (optional): Whether to include the `observation` and `evidence` fields in the output (default: `false`)
+    - When `true`: Returns detailed `observation` explaining what the action is doing and `evidence` with specific quotes/details
+    - When `false`: Omits reasoning fields to save tokens (typically 100-300 tokens per check)
+    - Recommended: Keep disabled for production (default); enable for development/debugging
 
 **Flags as MISALIGNED:**
 
@@ -77,13 +82,16 @@ Returns a `GuardrailResult` with the following `info` dictionary:
 }
 ```
 
-- **`observation`**: What the AI action is doing
+- **`observation`**: What the AI action is doing - *only included when `include_reasoning=true`*
 - **`flagged`**: Whether the action is misaligned (boolean)
 - **`confidence`**: Confidence score (0.0 to 1.0) that the action is misaligned
+- **`evidence`**: Specific evidence from conversation supporting the decision - *only included when `include_reasoning=true`*
 - **`threshold`**: The confidence threshold that was configured
 - **`user_goal`**: The tracked user intent from conversation
 - **`action`**: The list of function calls or tool outputs analyzed for alignment
 
+**Note**: When `include_reasoning=false` (the default), the `observation` and `evidence` fields are omitted to reduce token generation costs.
+
 ## Benchmark Results
 
 ### Dataset Description
 
@@ -94,8 +94,8 @@ class HallucinationDetectionOutput(LLMOutput):
     Extends the base LLM output with hallucination-specific details.
 
     Attributes:
-        flagged (bool): Whether the content was flagged as potentially hallucinated.
-        confidence (float): Confidence score (0.0 to 1.0) that the input is hallucinated.
+        flagged (bool): Whether the content was flagged as potentially hallucinated (inherited).
+        confidence (float): Confidence score (0.0 to 1.0) that the input is hallucinated (inherited).
         reasoning (str): Detailed explanation of the analysis.
         hallucination_type (str | None): Type of hallucination detected.
         hallucinated_statements (list[str] | None): Specific statements flagged as
@@ -104,16 +104,6 @@ class HallucinationDetectionOutput(LLMOutput):
             by the documents.
     """
 
-    flagged: bool = Field(
-        ...,
-        description="Indicates whether the content was flagged as potentially hallucinated.",
-    )
-    confidence: float = Field(
-        ...,
-        description="Confidence score (0.0 to 1.0) that the input is hallucinated.",
-        ge=0.0,
-        le=1.0,
-    )
     reasoning: str = Field(
         ...,
         description="Detailed explanation of the hallucination analysis.",
@@ -245,12 +235,15 @@ async def hallucination_detection(
         # Create the validation query
         validation_query = f"{VALIDATION_PROMPT}\n\nText to validate:\n{candidate}"
 
+        # Use HallucinationDetectionOutput (with reasoning fields) if enabled, otherwise base LLMOutput
+        output_format = HallucinationDetectionOutput if config.include_reasoning else LLMOutput
+
         # Use the Responses API with file search and structured output
         response = await _invoke_openai_callable(
             ctx.guardrail_llm.responses.parse,
             input=validation_query,
             model=config.model,
-            text_format=HallucinationDetectionOutput,
+            text_format=output_format,
             tools=[{"type": "file_search", "vector_store_ids": [config.knowledge_source]}],
         )
 
 
@@ -40,8 +40,6 @@
 import textwrap
 from typing import Any
 
-from pydantic import Field
-
 from guardrails.registry import default_spec_registry
 from guardrails.spec import GuardrailSpecMetadata
 from guardrails.types import GuardrailLLMContextProto, GuardrailResult, token_usage_to_dict
@@ -50,6 +48,7 @@
     LLMConfig,
     LLMErrorOutput,
     LLMOutput,
+    LLMReasoningOutput,
     create_error_result,
     run_llm,
 )
@@ -226,15 +225,6 @@
 MAX_CONTEXT_TURNS = 10
 
 
-class JailbreakLLMOutput(LLMOutput):
-    """LLM output schema including rationale for jailbreak classification."""
-
-    reason: str = Field(
-        ...,
-        description=("Justification for why the input was flagged or not flagged as a jailbreak."),
-    )
-
-
 def _build_analysis_payload(conversation_history: list[Any] | None, latest_input: str) -> str:
     """Return a JSON payload with recent turns and the latest input."""
     trimmed_input = latest_input.strip()
@@ -251,12 +241,15 @@ async def jailbreak(ctx: GuardrailLLMContextProto, data: str, config: LLMConfig)
     conversation_history = getattr(ctx, "get_conversation_history", lambda: None)() or []
     analysis_payload = _build_analysis_payload(conversation_history, data)
 
+    # Use LLMReasoningOutput (with reason) if reasoning is enabled, otherwise use base LLMOutput
+    output_model = LLMReasoningOutput if config.include_reasoning else LLMOutput
+
     analysis, token_usage = await run_llm(
         analysis_payload,
         SYSTEM_PROMPT,
         ctx.guardrail_llm,
         config.model,
-        JailbreakLLMOutput,
+        output_model,
     )
 
     if isinstance(analysis, LLMErrorOutput):
 
@@ -73,6 +73,7 @@ class MyLLMOutput(LLMOutput):
     "LLMConfig",
     "LLMErrorOutput",
     "LLMOutput",
+    "LLMReasoningOutput",
     "create_error_result",
     "create_llm_check_fn",
 ]
@@ -87,6 +88,9 @@ class LLMConfig(BaseModel):
         model (str): The LLM model to use for checking the text.
         confidence_threshold (float): Minimum confidence required to trigger the guardrail,
             as a float between 0.0 and 1.0.
+        include_reasoning (bool): Whether to include reasoning/explanation in guardrail
+            output. Useful for development and debugging, but can be disabled in production
+            to save tokens. Defaults to True.
     """
 
     model: str = Field(..., description="LLM model to use for checking the text")
@@ -96,6 +100,13 @@ class LLMConfig(BaseModel):
         ge=0.0,
         le=1.0,
     )
+    include_reasoning: bool = Field(
+        False,
+        description=(
+            "Include reasoning/explanation fields in output. "
+            "Defaults to False for token efficiency. Enable for development/debugging."
+        ),
+    )
 
     model_config = ConfigDict(extra="forbid")
 
@@ -117,6 +128,19 @@ class LLMOutput(BaseModel):
     confidence: float
 
 
+class LLMReasoningOutput(LLMOutput):
+    """Extended LLM output schema with reasoning explanation.
+
+    Extends LLMOutput to include a reason field explaining the decision.
+    This is the standard extended output for guardrails that include reasoning.
+
+    Attributes:
+        reason (str): Explanation for why the input was flagged or not flagged.
+    """
+
+    reason: str = Field(..., description="Explanation for the flagging decision")
+
+
 class LLMErrorOutput(LLMOutput):
     """Extended LLM output schema with error information.
 
@@ -399,7 +423,7 @@ def create_llm_check_fn(
     name: str,
     description: str,
     system_prompt: str,
-    output_model: type[LLMOutput] = LLMOutput,
+    output_model: type[LLMOutput] | None = None,
     config_model: type[TLLMCfg] = LLMConfig,  # type: ignore[assignment]
 ) -> CheckFn[GuardrailLLMContextProto, str, TLLMCfg]:
     """Factory for constructing and registering an LLM-based guardrail check_fn.
@@ -409,17 +433,25 @@ def create_llm_check_fn(
     use the configured LLM to analyze text, validate the result, and trigger if
     confidence exceeds the provided threshold.
 
+    When `include_reasoning=True` in the config, the guardrail will automatically
+    use an extended output model with a `reason` field. When `include_reasoning=False`,
+    it uses the base `LLMOutput` model (only `flagged` and `confidence` fields).
+
     Args:
         name (str): Name under which to register the guardrail.
         description (str): Short explanation of the guardrail's logic.
         system_prompt (str): Prompt passed to the LLM to control analysis.
-        output_model (type[LLMOutput]): Schema for parsing the LLM output.
+        output_model (type[LLMOutput] | None): Custom schema for parsing the LLM output.
+            If None (default), uses `LLMReasoningOutput` when reasoning is enabled.
+            Provide a custom model only if you need additional fields beyond `reason`.
         config_model (type[LLMConfig]): Configuration schema for the check_fn.
 
     Returns:
         CheckFn[GuardrailLLMContextProto, str, TLLMCfg]: Async check function
             to be registered as a guardrail.
     """
+    # Default to LLMReasoningOutput if no custom model provided
+    extended_output_model = output_model or LLMReasoningOutput
 
     async def guardrail_func(
         ctx: GuardrailLLMContextProto,
@@ -441,12 +473,16 @@ async def guardrail_func(
         else:
             rendered_system_prompt = system_prompt
 
+        # Use base LLMOutput if reasoning is disabled, otherwise use the extended model
+        include_reasoning = getattr(config, "include_reasoning", False)
+        selected_output_model = extended_output_model if include_reasoning else LLMOutput
+
         analysis, token_usage = await run_llm(
             data,
             rendered_system_prompt,
             ctx.guardrail_llm,
             config.model,
-            output_model,
+            selected_output_model,
         )
 
         # Check if this is an error result