Skip to content

LCORE-1500: [bug] Fixing cases where both inline and tools RAG are enabled#1351

Open
JslYoon wants to merge 9 commits intolightspeed-core:mainfrom
JslYoon:JslYoon-LCORE-1500
Open

LCORE-1500: [bug] Fixing cases where both inline and tools RAG are enabled#1351
JslYoon wants to merge 9 commits intolightspeed-core:mainfrom
JslYoon:JslYoon-LCORE-1500

Conversation

@JslYoon
Copy link
Contributor

@JslYoon JslYoon commented Mar 18, 2026

Description

There was a bug where when querying LLM, if both inline RAG and tools RAG are enabled, LLMS run an infinite loop.
This was caused when user provides a vector_store_id in their query, which bypasses the preset inline RAG tools in the _fetch_byok_rag() function.

The Jira ticket is [LCORE-1500]

Hen

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Added a per-request "no_tools" flag to disable tool-based sources in RAG contexts and a corresponding request field.
  • Behavior

    • When set, tool-based sources are excluded from inline RAG context; per-request vector store IDs still take precedence over configured inline sources.
  • Tests

    • Unit and integration tests updated to cover the new flag and related selection behavior.

Signed-off-by: Lucas <lyoon@redhat.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f2005d9a-6094-450f-88b0-9b6a38b54f6c

📥 Commits

Reviewing files that changed from the base of the PR and between 3972b4e and fba709a.

📒 Files selected for processing (1)
  • src/app/endpoints/streaming_query.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/app/endpoints/streaming_query.py

Walkthrough

Adds a no_tools flag to request models and threads it into endpoint handlers and RAG construction via a new RAGContextParams object; build_rag_context signature changed to accept this params object and BYOK selection logic now respects no_tools.

Changes

Cohort / File(s) Summary
Endpoint handlers
src/app/endpoints/query.py, src/app/endpoints/responses.py, src/app/endpoints/streaming_query.py
Invoke build_rag_context with a RAGContextParams object instead of discrete args. query.py passes no_tools=True unconditionally; other handlers pass no_tools from the request (default False).
Request models & tests
src/models/requests.py, tests/integration/endpoints/test_query_byok_integration.py, tests/integration/endpoints/test_streaming_query_byok_integration.py
Added no_tools: bool to request models; integration tests updated to include no_tools=True in relevant QueryRequest setups.
RAG types
src/utils/types.py
Added RAGContextParams Pydantic model: moderation_decision, query, vector_store_ids, solr, no_tools (default False).
RAG context & BYOK logic
src/utils/vector_search.py, tests/unit/utils/test_vector_search.py
Refactored build_rag_context to accept RAGContextParams; added _fetch_byok_rag(...); threaded no_tools into BYOK selection so tool-based sources can be excluded when true; unit tests updated to pass RAGContextParams.
Response tests
tests/unit/app/endpoints/test_responses.py
Adjusted assertions to inspect the query field on the RAGContextParams passed to build_rag_context rather than previous positional-argument checks.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Endpoint
    participant RAGBuilder
    participant VectorStore
    Client->>Endpoint: send request (may include no_tools)
    Endpoint->>RAGBuilder: build_rag_context(RAGContextParams{query, moderation_decision, vector_store_ids, solr, no_tools})
    alt params.no_tools == true
        RAGBuilder->>VectorStore: fetch BYOK using provided vector_store_ids (skip tool-based sources)
    else params.no_tools == false
        RAGBuilder->>VectorStore: fetch BYOK using config/inline sources and vector_store_ids
    end
    VectorStore-->>RAGBuilder: return RAG chunks
    RAGBuilder-->>Endpoint: assembled RAG context
    Endpoint-->>Client: respond (with RAG context applied)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main bug fix: preventing cases where both inline and tools RAG are enabled simultaneously. This matches the core objective evident from the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/app/endpoints/responses.py (1)

218-238: ⚠️ Potential issue | 🟠 Major

no_tools is not actually disabling tool-based RAG in /responses.

At Line 222, tools are resolved unconditionally, and at Line 237 no_tools only affects inline RAG source selection. As written, requests with no_tools=True can still send tool definitions to client.responses.create(...).

🛠️ Suggested fix
     (
         responses_request.tools,
         responses_request.tool_choice,
         vector_store_ids,
     ) = await resolve_tool_choice(
         responses_request.tools,
         responses_request.tool_choice,
         auth[1],
         mcp_headers,
         request.headers,
     )
+
+    if responses_request.no_tools:
+        # keep resolved vector_store_ids for inline RAG, but prevent tool execution
+        responses_request.tools = None
+        responses_request.tool_choice = "none"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/responses.py` around lines 218 - 238, The code currently
resolves tools unconditionally and only uses responses_request.no_tools when
building inline RAG, so requests with no_tools=True can still send tools to
client.responses.create; modify the control flow to check
responses_request.no_tools before calling resolve_tool_choice (or, if you must
call it, ensure it returns empty tools and None tool_choice when
responses_request.no_tools is True) and ensure the final payload sent to
client.responses.create does not include tool definitions or tool_choice when
responses_request.no_tools is True; update references in this block
(responses_request.no_tools, resolve_tool_choice, build_rag_context, and the
eventual client.responses.create call) so no tool-based RAG is used if no_tools
is set.
src/utils/vector_search.py (2)

320-420: ⚠️ Potential issue | 🟠 Major

CI blocker: _fetch_byok_rag exceeds pylint local-variable limit.

Line 320 is currently failing with R0914 (16/15), so this PR won’t merge cleanly until locals are reduced (or explicitly waived).

✅ Minimal refactor to reduce locals
-        score_multiplier_mapping = configuration.score_multiplier_mapping
-        rag_id_mapping = configuration.rag_id_mapping
-
         # Query all vector stores in parallel
         results_per_store = await asyncio.gather(
             *[
                 _query_store_for_byok_rag(
                     client,
                     vector_store_id,
                     query,
-                    score_multiplier_mapping.get(vector_store_id, 1.0),
+                    configuration.score_multiplier_mapping.get(vector_store_id, 1.0),
                 )
                 for vector_store_id in vector_store_ids_to_query
             ]
         )
@@
-            result["source"] = rag_id_mapping.get(result["source"], result["source"])
+            result["source"] = configuration.rag_id_mapping.get(
+                result["source"], result["source"]
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` around lines 320 - 420, _fetcH_byok_rag currently
has too many local variables (R0914); refactor to reduce locals by extracting
small helpers: move the vector-store selection logic into a helper that returns
vector_store_ids_to_query (use resolve_vector_store_ids and filter out
constants.SOLR_DEFAULT_VECTOR_STORE_ID), and move the
result-flattening/sorting/top-N + RAGChunk conversion into another helper that
takes results_per_store, configuration.score_multiplier_mapping,
configuration.rag_id_mapping and returns (rag_chunks, referenced_documents)
using _query_store_for_byok_rag and _process_byok_rag_chunks_for_documents; call
these helpers from _fetch_byok_rag so the function keeps few locals and stays
under the pylint limit.

499-506: ⚠️ Potential issue | 🟠 Major

CI blocker: build_rag_context now violates pylint argument limits.

Line 499 fails R0913/R0917 after adding no_tools as a sixth parameter. This needs a parameter-shape change (or a targeted pylint waiver) to unblock CI.

🔧 Pragmatic unblock (targeted waiver)
-async def build_rag_context(
+async def build_rag_context(  # pylint: disable=too-many-arguments,too-many-positional-arguments
     client: AsyncLlamaStackClient,
     moderation_decision: str,
     query: str,
     vector_store_ids: Optional[list[str]],
     solr: Optional[dict[str, Any]] = None,
     no_tools: bool = False,
 ) -> RAGContext:

Also applies to: 525-525

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` around lines 499 - 506, build_rag_context
currently has too many positional parameters (causing R0913/R0917); fold the
less-essential/optional parameters into a single options object to reduce
parameter count. Create a small dataclass/TypedDict (e.g., BuildRAGOptions) that
contains vector_store_ids: Optional[list[str]], solr: Optional[dict[str, Any]],
and no_tools: bool, change the signature of build_rag_context to
build_rag_context(client: AsyncLlamaStackClient, moderation_decision: str,
query: str, options: BuildRAGOptions) -> RAGContext, update the function body to
read options.vector_store_ids / options.solr / options.no_tools, and update all
callers to construct and pass a BuildRAGOptions instance (or equivalent dict)
instead of separate parameters so the pylint argument limit is no longer
violated.
🧹 Nitpick comments (1)
src/models/requests.py (1)

671-672: Expose no_tools in ResponsesRequest schema examples.

Line 698 adds the field, but the request examples don’t show it, which makes the new behavior harder to discover in generated API docs.

✏️ Suggested doc example update
                 {
                     "input": "Hello World!",
                     "model": "openai/gpt-4o-mini",
                     "instructions": "You are a helpful assistant",
                     "store": True,
                     "stream": False,
+                    "no_tools": False,
                     "generate_topic_summary": True,
                 }

Also applies to: 698-698

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/requests.py` around lines 671 - 672, The ResponsesRequest schema
gained a new boolean field no_tools but the existing examples for
ResponsesRequest were not updated; update the API schema examples for the
ResponsesRequest model/class (the examples used by the pydantic/schema generator
in the ResponsesRequest definition) to include the no_tools key (true/false) in
each relevant example so the generated docs show the new behavior; locate the
ResponsesRequest examples object/Examples dict in src/models/requests.py and add
no_tools to every example entry alongside the other fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/app/endpoints/responses.py`:
- Around line 218-238: The code currently resolves tools unconditionally and
only uses responses_request.no_tools when building inline RAG, so requests with
no_tools=True can still send tools to client.responses.create; modify the
control flow to check responses_request.no_tools before calling
resolve_tool_choice (or, if you must call it, ensure it returns empty tools and
None tool_choice when responses_request.no_tools is True) and ensure the final
payload sent to client.responses.create does not include tool definitions or
tool_choice when responses_request.no_tools is True; update references in this
block (responses_request.no_tools, resolve_tool_choice, build_rag_context, and
the eventual client.responses.create call) so no tool-based RAG is used if
no_tools is set.

In `@src/utils/vector_search.py`:
- Around line 320-420: _fetcH_byok_rag currently has too many local variables
(R0914); refactor to reduce locals by extracting small helpers: move the
vector-store selection logic into a helper that returns
vector_store_ids_to_query (use resolve_vector_store_ids and filter out
constants.SOLR_DEFAULT_VECTOR_STORE_ID), and move the
result-flattening/sorting/top-N + RAGChunk conversion into another helper that
takes results_per_store, configuration.score_multiplier_mapping,
configuration.rag_id_mapping and returns (rag_chunks, referenced_documents)
using _query_store_for_byok_rag and _process_byok_rag_chunks_for_documents; call
these helpers from _fetch_byok_rag so the function keeps few locals and stays
under the pylint limit.
- Around line 499-506: build_rag_context currently has too many positional
parameters (causing R0913/R0917); fold the less-essential/optional parameters
into a single options object to reduce parameter count. Create a small
dataclass/TypedDict (e.g., BuildRAGOptions) that contains vector_store_ids:
Optional[list[str]], solr: Optional[dict[str, Any]], and no_tools: bool, change
the signature of build_rag_context to build_rag_context(client:
AsyncLlamaStackClient, moderation_decision: str, query: str, options:
BuildRAGOptions) -> RAGContext, update the function body to read
options.vector_store_ids / options.solr / options.no_tools, and update all
callers to construct and pass a BuildRAGOptions instance (or equivalent dict)
instead of separate parameters so the pylint argument limit is no longer
violated.

---

Nitpick comments:
In `@src/models/requests.py`:
- Around line 671-672: The ResponsesRequest schema gained a new boolean field
no_tools but the existing examples for ResponsesRequest were not updated; update
the API schema examples for the ResponsesRequest model/class (the examples used
by the pydantic/schema generator in the ResponsesRequest definition) to include
the no_tools key (true/false) in each relevant example so the generated docs
show the new behavior; locate the ResponsesRequest examples object/Examples dict
in src/models/requests.py and add no_tools to every example entry alongside the
other fields.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1cb84421-ade6-41af-898e-2b6a71104fbc

📥 Commits

Reviewing files that changed from the base of the PR and between cdec708 and 543dc7d.

📒 Files selected for processing (5)
  • src/app/endpoints/query.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/streaming_query.py
  • src/models/requests.py
  • src/utils/vector_search.py

@JslYoon JslYoon force-pushed the JslYoon-LCORE-1500 branch from eb33354 to 956da63 Compare March 18, 2026 20:29
@JslYoon JslYoon force-pushed the JslYoon-LCORE-1500 branch from 956da63 to b19d3b9 Compare March 18, 2026 20:32
Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it rebased? does not seem so

@tisnik tisnik changed the title [LCORE-1500] [bug] Fixing cases where both inline and tools RAG are enabled LCORE-1500: [bug] Fixing cases where both inline and tools RAG are enabled Mar 18, 2026
Signed-off-by: Lucas <lyoon@redhat.com>
@JslYoon JslYoon force-pushed the JslYoon-LCORE-1500 branch from b19d3b9 to 08cc480 Compare March 18, 2026 20:48
@JslYoon
Copy link
Contributor Author

JslYoon commented Mar 18, 2026

Just rebased and fixed @tisnik!

@JslYoon JslYoon requested a review from tisnik March 18, 2026 20:50
JslYoon added 3 commits March 18, 2026 17:12
Signed-off-by: Lucas <lyoon@redhat.com>
Signed-off-by: Lucas <lyoon@redhat.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/app/endpoints/query.py (1)

63-69: Consolidate duplicate imports from utils.types.

RAGContextParams should be imported alongside the other types from utils.types on lines 63-67 rather than as a separate import on line 69.

♻️ Suggested consolidation
 from utils.types import (
     ResponsesApiParams,
     ShieldModerationResult,
     TurnSummary,
+    RAGContextParams,
 )
 from utils.vector_search import build_rag_context
-from utils.types import RAGContextParams
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/query.py` around lines 63 - 69, The imports from
utils.types are duplicated: move RAGContextParams into the existing grouped
import with ResponsesApiParams, ShieldModerationResult, and TurnSummary so all
four types are imported in a single statement; update the import block that
currently lists ResponsesApiParams, ShieldModerationResult, TurnSummary to also
include RAGContextParams and remove the separate from utils.types import
RAGContextParams line to eliminate the duplicate import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/app/endpoints/query.py`:
- Around line 63-69: The imports from utils.types are duplicated: move
RAGContextParams into the existing grouped import with ResponsesApiParams,
ShieldModerationResult, and TurnSummary so all four types are imported in a
single statement; update the import block that currently lists
ResponsesApiParams, ShieldModerationResult, TurnSummary to also include
RAGContextParams and remove the separate from utils.types import
RAGContextParams line to eliminate the duplicate import.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8c55e906-b91f-45df-ae02-2f0333ee9895

📥 Commits

Reviewing files that changed from the base of the PR and between b19d3b9 and a53fcec.

📒 Files selected for processing (9)
  • src/app/endpoints/query.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/streaming_query.py
  • src/utils/types.py
  • src/utils/vector_search.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_vector_search.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/app/endpoints/streaming_query.py
  • tests/unit/utils/test_vector_search.py

@tisnik tisnik requested a review from are-ces March 19, 2026 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants