LCORE-1500: [bug] Fixing cases where both inline and tools RAG are enabled#1351
LCORE-1500: [bug] Fixing cases where both inline and tools RAG are enabled#1351JslYoon wants to merge 9 commits intolightspeed-core:mainfrom
Conversation
Signed-off-by: Lucas <lyoon@redhat.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds a Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Endpoint
participant RAGBuilder
participant VectorStore
Client->>Endpoint: send request (may include no_tools)
Endpoint->>RAGBuilder: build_rag_context(RAGContextParams{query, moderation_decision, vector_store_ids, solr, no_tools})
alt params.no_tools == true
RAGBuilder->>VectorStore: fetch BYOK using provided vector_store_ids (skip tool-based sources)
else params.no_tools == false
RAGBuilder->>VectorStore: fetch BYOK using config/inline sources and vector_store_ids
end
VectorStore-->>RAGBuilder: return RAG chunks
RAGBuilder-->>Endpoint: assembled RAG context
Endpoint-->>Client: respond (with RAG context applied)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/app/endpoints/responses.py (1)
218-238:⚠️ Potential issue | 🟠 Major
no_toolsis not actually disabling tool-based RAG in/responses.At Line 222, tools are resolved unconditionally, and at Line 237
no_toolsonly affects inline RAG source selection. As written, requests withno_tools=Truecan still send tool definitions toclient.responses.create(...).🛠️ Suggested fix
( responses_request.tools, responses_request.tool_choice, vector_store_ids, ) = await resolve_tool_choice( responses_request.tools, responses_request.tool_choice, auth[1], mcp_headers, request.headers, ) + + if responses_request.no_tools: + # keep resolved vector_store_ids for inline RAG, but prevent tool execution + responses_request.tools = None + responses_request.tool_choice = "none"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 218 - 238, The code currently resolves tools unconditionally and only uses responses_request.no_tools when building inline RAG, so requests with no_tools=True can still send tools to client.responses.create; modify the control flow to check responses_request.no_tools before calling resolve_tool_choice (or, if you must call it, ensure it returns empty tools and None tool_choice when responses_request.no_tools is True) and ensure the final payload sent to client.responses.create does not include tool definitions or tool_choice when responses_request.no_tools is True; update references in this block (responses_request.no_tools, resolve_tool_choice, build_rag_context, and the eventual client.responses.create call) so no tool-based RAG is used if no_tools is set.src/utils/vector_search.py (2)
320-420:⚠️ Potential issue | 🟠 MajorCI blocker:
_fetch_byok_ragexceeds pylint local-variable limit.Line 320 is currently failing with
R0914 (16/15), so this PR won’t merge cleanly until locals are reduced (or explicitly waived).✅ Minimal refactor to reduce locals
- score_multiplier_mapping = configuration.score_multiplier_mapping - rag_id_mapping = configuration.rag_id_mapping - # Query all vector stores in parallel results_per_store = await asyncio.gather( *[ _query_store_for_byok_rag( client, vector_store_id, query, - score_multiplier_mapping.get(vector_store_id, 1.0), + configuration.score_multiplier_mapping.get(vector_store_id, 1.0), ) for vector_store_id in vector_store_ids_to_query ] ) @@ - result["source"] = rag_id_mapping.get(result["source"], result["source"]) + result["source"] = configuration.rag_id_mapping.get( + result["source"], result["source"] + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 320 - 420, _fetcH_byok_rag currently has too many local variables (R0914); refactor to reduce locals by extracting small helpers: move the vector-store selection logic into a helper that returns vector_store_ids_to_query (use resolve_vector_store_ids and filter out constants.SOLR_DEFAULT_VECTOR_STORE_ID), and move the result-flattening/sorting/top-N + RAGChunk conversion into another helper that takes results_per_store, configuration.score_multiplier_mapping, configuration.rag_id_mapping and returns (rag_chunks, referenced_documents) using _query_store_for_byok_rag and _process_byok_rag_chunks_for_documents; call these helpers from _fetch_byok_rag so the function keeps few locals and stays under the pylint limit.
499-506:⚠️ Potential issue | 🟠 MajorCI blocker:
build_rag_contextnow violates pylint argument limits.Line 499 fails
R0913/R0917after addingno_toolsas a sixth parameter. This needs a parameter-shape change (or a targeted pylint waiver) to unblock CI.🔧 Pragmatic unblock (targeted waiver)
-async def build_rag_context( +async def build_rag_context( # pylint: disable=too-many-arguments,too-many-positional-arguments client: AsyncLlamaStackClient, moderation_decision: str, query: str, vector_store_ids: Optional[list[str]], solr: Optional[dict[str, Any]] = None, no_tools: bool = False, ) -> RAGContext:Also applies to: 525-525
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 499 - 506, build_rag_context currently has too many positional parameters (causing R0913/R0917); fold the less-essential/optional parameters into a single options object to reduce parameter count. Create a small dataclass/TypedDict (e.g., BuildRAGOptions) that contains vector_store_ids: Optional[list[str]], solr: Optional[dict[str, Any]], and no_tools: bool, change the signature of build_rag_context to build_rag_context(client: AsyncLlamaStackClient, moderation_decision: str, query: str, options: BuildRAGOptions) -> RAGContext, update the function body to read options.vector_store_ids / options.solr / options.no_tools, and update all callers to construct and pass a BuildRAGOptions instance (or equivalent dict) instead of separate parameters so the pylint argument limit is no longer violated.
🧹 Nitpick comments (1)
src/models/requests.py (1)
671-672: Exposeno_toolsinResponsesRequestschema examples.Line 698 adds the field, but the request examples don’t show it, which makes the new behavior harder to discover in generated API docs.
✏️ Suggested doc example update
{ "input": "Hello World!", "model": "openai/gpt-4o-mini", "instructions": "You are a helpful assistant", "store": True, "stream": False, + "no_tools": False, "generate_topic_summary": True, }Also applies to: 698-698
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/requests.py` around lines 671 - 672, The ResponsesRequest schema gained a new boolean field no_tools but the existing examples for ResponsesRequest were not updated; update the API schema examples for the ResponsesRequest model/class (the examples used by the pydantic/schema generator in the ResponsesRequest definition) to include the no_tools key (true/false) in each relevant example so the generated docs show the new behavior; locate the ResponsesRequest examples object/Examples dict in src/models/requests.py and add no_tools to every example entry alongside the other fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/app/endpoints/responses.py`:
- Around line 218-238: The code currently resolves tools unconditionally and
only uses responses_request.no_tools when building inline RAG, so requests with
no_tools=True can still send tools to client.responses.create; modify the
control flow to check responses_request.no_tools before calling
resolve_tool_choice (or, if you must call it, ensure it returns empty tools and
None tool_choice when responses_request.no_tools is True) and ensure the final
payload sent to client.responses.create does not include tool definitions or
tool_choice when responses_request.no_tools is True; update references in this
block (responses_request.no_tools, resolve_tool_choice, build_rag_context, and
the eventual client.responses.create call) so no tool-based RAG is used if
no_tools is set.
In `@src/utils/vector_search.py`:
- Around line 320-420: _fetcH_byok_rag currently has too many local variables
(R0914); refactor to reduce locals by extracting small helpers: move the
vector-store selection logic into a helper that returns
vector_store_ids_to_query (use resolve_vector_store_ids and filter out
constants.SOLR_DEFAULT_VECTOR_STORE_ID), and move the
result-flattening/sorting/top-N + RAGChunk conversion into another helper that
takes results_per_store, configuration.score_multiplier_mapping,
configuration.rag_id_mapping and returns (rag_chunks, referenced_documents)
using _query_store_for_byok_rag and _process_byok_rag_chunks_for_documents; call
these helpers from _fetch_byok_rag so the function keeps few locals and stays
under the pylint limit.
- Around line 499-506: build_rag_context currently has too many positional
parameters (causing R0913/R0917); fold the less-essential/optional parameters
into a single options object to reduce parameter count. Create a small
dataclass/TypedDict (e.g., BuildRAGOptions) that contains vector_store_ids:
Optional[list[str]], solr: Optional[dict[str, Any]], and no_tools: bool, change
the signature of build_rag_context to build_rag_context(client:
AsyncLlamaStackClient, moderation_decision: str, query: str, options:
BuildRAGOptions) -> RAGContext, update the function body to read
options.vector_store_ids / options.solr / options.no_tools, and update all
callers to construct and pass a BuildRAGOptions instance (or equivalent dict)
instead of separate parameters so the pylint argument limit is no longer
violated.
---
Nitpick comments:
In `@src/models/requests.py`:
- Around line 671-672: The ResponsesRequest schema gained a new boolean field
no_tools but the existing examples for ResponsesRequest were not updated; update
the API schema examples for the ResponsesRequest model/class (the examples used
by the pydantic/schema generator in the ResponsesRequest definition) to include
the no_tools key (true/false) in each relevant example so the generated docs
show the new behavior; locate the ResponsesRequest examples object/Examples dict
in src/models/requests.py and add no_tools to every example entry alongside the
other fields.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1cb84421-ade6-41af-898e-2b6a71104fbc
📒 Files selected for processing (5)
src/app/endpoints/query.pysrc/app/endpoints/responses.pysrc/app/endpoints/streaming_query.pysrc/models/requests.pysrc/utils/vector_search.py
eb33354 to
956da63
Compare
956da63 to
b19d3b9
Compare
tisnik
left a comment
There was a problem hiding this comment.
Is it rebased? does not seem so
Signed-off-by: Lucas <lyoon@redhat.com>
b19d3b9 to
08cc480
Compare
|
Just rebased and fixed @tisnik! |
Signed-off-by: Lucas <lyoon@redhat.com>
Signed-off-by: Lucas <lyoon@redhat.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/app/endpoints/query.py (1)
63-69: Consolidate duplicate imports fromutils.types.
RAGContextParamsshould be imported alongside the other types fromutils.typeson lines 63-67 rather than as a separate import on line 69.♻️ Suggested consolidation
from utils.types import ( ResponsesApiParams, ShieldModerationResult, TurnSummary, + RAGContextParams, ) from utils.vector_search import build_rag_context -from utils.types import RAGContextParams🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/query.py` around lines 63 - 69, The imports from utils.types are duplicated: move RAGContextParams into the existing grouped import with ResponsesApiParams, ShieldModerationResult, and TurnSummary so all four types are imported in a single statement; update the import block that currently lists ResponsesApiParams, ShieldModerationResult, TurnSummary to also include RAGContextParams and remove the separate from utils.types import RAGContextParams line to eliminate the duplicate import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/app/endpoints/query.py`:
- Around line 63-69: The imports from utils.types are duplicated: move
RAGContextParams into the existing grouped import with ResponsesApiParams,
ShieldModerationResult, and TurnSummary so all four types are imported in a
single statement; update the import block that currently lists
ResponsesApiParams, ShieldModerationResult, TurnSummary to also include
RAGContextParams and remove the separate from utils.types import
RAGContextParams line to eliminate the duplicate import.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8c55e906-b91f-45df-ae02-2f0333ee9895
📒 Files selected for processing (9)
src/app/endpoints/query.pysrc/app/endpoints/responses.pysrc/app/endpoints/streaming_query.pysrc/utils/types.pysrc/utils/vector_search.pytests/integration/endpoints/test_query_byok_integration.pytests/integration/endpoints/test_streaming_query_byok_integration.pytests/unit/app/endpoints/test_responses.pytests/unit/utils/test_vector_search.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/app/endpoints/streaming_query.py
- tests/unit/utils/test_vector_search.py
Signed-off-by: Lucas <lyoon@redhat.com>
Description
There was a bug where when querying LLM, if both inline RAG and tools RAG are enabled, LLMS run an infinite loop.
This was caused when user provides a vector_store_id in their query, which bypasses the preset inline RAG tools in the _fetch_byok_rag() function.
The Jira ticket is [LCORE-1500]
Hen
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Behavior
Tests