LCORE-1504: Change how the OKP server is located at runtime: RH_SERVER_OKP#1347
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughReplaced Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/llama_stack_configuration.py (1)
424-446:⚠️ Potential issue | 🟠 MajorAlign
RH_SERVER_OKPcontract across Solr and document URLs.At Line 424,
RH_SERVER_OKPis treated as a Solr URL (.../solr), but insrc/utils/vector_search.py(Line 29-31 and Line 573+) the same variable is treated as a document base URL. One value cannot reliably satisfy both shapes.Suggested fix (single contract: RH_SERVER_OKP = OKP server base)
- rh_server_okp_env = "${env.RH_SERVER_OKP:=http://localhost:8081/solr}" + rh_server_okp_env = "${env.RH_SERVER_OKP:=http://localhost:8081}" @@ - "solr_url": rh_server_okp_env, + "solr_url": f"{rh_server_okp_env}/solr",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llama_stack_configuration.py` around lines 424 - 446, The RH_SERVER_OKP value is used inconsistently: in this file rh_server_okp_env is defined with a trailing "/solr" but elsewhere (src/utils/vector_search.py) RH_SERVER_OKP is expected to be the OKP server base (document URLs); fix by standardizing on a single contract (prefer: RH_SERVER_OKP = OKP server base) and update this config to not append "/solr" directly—change rh_server_okp_env to use the base env var only and build the Solr endpoint when needed (e.g., set "solr_url" to f"{rh_server_okp_env.rstrip('/')}/solr" at the point where ls_config["providers"]["vector_io"] is populated), or alternatively update code in vector_search.py to append "/solr" when calling Solr; adjust only the rh_server_okp_env definition and the place that sets "solr_url" (and mention RH_SERVER_OKP usage in vector_search) so both consumers agree on a base-URL contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/vector_search.py`:
- Around line 573-585: The doc_url assembly can produce malformed paths due to
raw concatenation and urljoin quirks; change both offline and online branches to
a single deterministic join function that ensures base_url ends with a single
'/' and reference_doc/reference_url is stripped of any leading '/', then join
with base_url + reference_doc (or return reference_doc unchanged if it already
starts with "http"); update the code that sets base_url, reference_doc, and
doc_url (and the similar logic at the later block) to use this helper logic so
offline/relative-path flows and the variables base_url, reference_doc,
reference_url, and doc_url produce consistent, well-formed URLs.
- Around line 29-31: The helper _get_okp_base_url currently returns the raw
RH_SERVER_OKP env var which can be an empty string; change it to treat empty or
whitespace-only values as unset by reading the env var (e.g., val =
os.environ.get("RH_SERVER_OKP")) and returning val if val and val.strip() else
constants.RH_SERVER_OKP_DEFAULT_URL so downstream URL generation uses the
default when the env var is empty; update the function _get_okp_base_url
accordingly.
---
Outside diff comments:
In `@src/llama_stack_configuration.py`:
- Around line 424-446: The RH_SERVER_OKP value is used inconsistently: in this
file rh_server_okp_env is defined with a trailing "/solr" but elsewhere
(src/utils/vector_search.py) RH_SERVER_OKP is expected to be the OKP server base
(document URLs); fix by standardizing on a single contract (prefer:
RH_SERVER_OKP = OKP server base) and update this config to not append "/solr"
directly—change rh_server_okp_env to use the base env var only and build the
Solr endpoint when needed (e.g., set "solr_url" to
f"{rh_server_okp_env.rstrip('/')}/solr" at the point where
ls_config["providers"]["vector_io"] is populated), or alternatively update code
in vector_search.py to append "/solr" when calling Solr; adjust only the
rh_server_okp_env definition and the place that sets "solr_url" (and mention
RH_SERVER_OKP usage in vector_search) so both consumers agree on a base-URL
contract.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2017d98b-dcc3-4905-8eb4-4f803bf7b6f0
📒 Files selected for processing (4)
src/constants.pysrc/llama_stack_configuration.pysrc/utils/vector_search.pytests/unit/utils/test_vector_search.py
tisnik
left a comment
There was a problem hiding this comment.
Why we use env. vars in the first place? Is there any strong reason about it?
It's an OKP proposal - https://docs.google.com/document/d/15wxuPf9r1oEsVdpsgLM-aialDaz7UzaR5cl85vk2WkE/edit?tab=t.0 |
c486c32 to
9d4518d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/utils/vector_search.py (1)
574-585:⚠️ Potential issue | 🟠 MajorDocument URL assembly is still inconsistent and can produce malformed links.
At Line 578/Line 585, raw concatenation can miss separators; at Line 618/Line 631, a different join strategy is used. This keeps URL output format nondeterministic across code paths for the same inputs.
🔧 Consolidated fix pattern
+def _join_okp_url(base_url: str, reference: str) -> str: + """Join OKP base URL and document reference deterministically.""" + return f"{base_url.rstrip('/')}/{reference.lstrip('/')}" + def _build_document_url( @@ - base_url = _get_okp_base_url() + base_url = _get_okp_base_url() @@ - doc_url = base_url + reference_doc if reference_doc else "" + doc_url = _join_okp_url(base_url, reference_doc) if reference_doc else "" @@ - else (base_url + reference_doc if reference_doc else "") + else (_join_okp_url(base_url, reference_doc) if reference_doc else "") @@ - attributes["doc_url"] = urljoin(_get_okp_base_url(), parent_id) + attributes["doc_url"] = _join_okp_url( + _get_okp_base_url(), parent_id + ) @@ - attributes["doc_url"] = urljoin(_get_okp_base_url(), doc_id) + attributes["doc_url"] = _join_okp_url( + _get_okp_base_url(), doc_id + )Use this to verify the current mismatch in output shape:
#!/bin/bash python - <<'PY' from urllib.parse import urljoin cases = [ ("https://mimir.corp.redhat.com", "doc_123"), ("https://mimir.corp.redhat.com/", "doc_123"), ("http://localhost:8081/solr", "doc_123"), ] for base, ref in cases: print(f"\nbase={base!r}, ref={ref!r}") print("concat :", base + ref) print("urljoin:", urljoin(base, ref)) PYAlso applies to: 618-631
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 574 - 585, The URL assembly is inconsistent due to raw string concatenation; replace both concatenation branches so doc_url is built with urllib.parse.urljoin using base_url and the chosen reference (use reference_url or doc_id as before) to guarantee correct separators; update the blocks that set reference_doc and doc_url (variables: base_url, reference_doc, reference_url, doc_id, doc_url) and the helper call to _get_okp_base_url(), and apply the same urljoin logic to the other affected section around the 618–631 region to make output deterministic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/llama_stack_configuration.py`:
- Around line 424-425: The RH_SERVER_OKP default currently includes a Solr path
which mixes server-base and Solr-specific URL shapes; change rh_server_okp_env
to default to the server base (e.g., "http://localhost:8081") instead of
".../solr" and remove any hardcoded "/solr" from collection_env initialization,
then update the Solr-specific call sites to append "/solr" at the Solr boundary
(for example in src/utils/vector_search.py where RH_SERVER_OKP is used to build
document/Solr endpoints around lines 29-32); also apply the same change to the
other occurrence referenced near line 445 so all Solr paths are constructed only
where Solr is actually invoked.
---
Duplicate comments:
In `@src/utils/vector_search.py`:
- Around line 574-585: The URL assembly is inconsistent due to raw string
concatenation; replace both concatenation branches so doc_url is built with
urllib.parse.urljoin using base_url and the chosen reference (use reference_url
or doc_id as before) to guarantee correct separators; update the blocks that set
reference_doc and doc_url (variables: base_url, reference_doc, reference_url,
doc_id, doc_url) and the helper call to _get_okp_base_url(), and apply the same
urljoin logic to the other affected section around the 618–631 region to make
output deterministic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e65d44af-f4cd-4f6b-b7e7-99dfb761ed5c
📒 Files selected for processing (4)
src/constants.pysrc/llama_stack_configuration.pysrc/utils/vector_search.pytests/unit/utils/test_vector_search.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/utils/test_vector_search.py
But WHO creates the env.var? Is not it that OKP is just another service we have to connect to? |
9d4518d to
f0316d8
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit/utils/test_vector_search.py (1)
239-244: Add dedicated tests for_get_okp_base_urlfallback behavior.These changes correctly isolate
_build_document_url, but they also bypass verification of env/default resolution. Please add focused tests for_get_okp_base_urlwithRH_SERVER_OKPset/unset (and default-port expectation) so the runtime contract is directly covered.Suggested test additions
from utils.vector_search import ( + _get_okp_base_url, _build_document_url, _build_query_params, @@ class TestBuildDocumentUrl: @@ assert doc_url == "https://mimir.test/relative/path" assert reference_doc == "relative/path" +class TestGetOkpBaseUrl: + """Tests for _get_okp_base_url helper.""" + + def test_uses_env_value_when_set(self, monkeypatch) -> None: + monkeypatch.setenv("RH_SERVER_OKP", "https://okp.example:8081") + assert _get_okp_base_url().startswith("https://okp.example:8081") + + def test_uses_default_when_env_unset(self, monkeypatch) -> None: + monkeypatch.delenv("RH_SERVER_OKP", raising=False) + value = _get_okp_base_url() + assert value.startswith("https://") + # tighten this assertion if the contract requires explicit :8081 +``` </details> Also applies to: 263-268 <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@tests/unit/utils/test_vector_search.pyaround lines 239 - 244, Add dedicated
unit tests for _get_okp_base_url to verify its fallback behavior when
RH_SERVER_OKP is set and when it is unset; specifically, write one test that
sets RH_SERVER_OKP to a custom host and asserts the returned base URL uses that
host, and another test that clears RH_SERVER_OKP and asserts the returned base
URL uses the default host and includes the expected default port (e.g., :443 or
configured default). Update tests/unit/utils/test_vector_search.py to remove the
mocker.patch of _get_okp_base_url in the existing _build_document_url tests and
instead add the new focused tests around _get_okp_base_url (also add equivalents
for the region around lines 263-268) so env/default resolution is directly
covered.</details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In@tests/unit/utils/test_vector_search.py:
- Around line 239-244: Add dedicated unit tests for _get_okp_base_url to verify
its fallback behavior when RH_SERVER_OKP is set and when it is unset;
specifically, write one test that sets RH_SERVER_OKP to a custom host and
asserts the returned base URL uses that host, and another test that clears
RH_SERVER_OKP and asserts the returned base URL uses the default host and
includes the expected default port (e.g., :443 or configured default). Update
tests/unit/utils/test_vector_search.py to remove the mocker.patch of
_get_okp_base_url in the existing _build_document_url tests and instead add the
new focused tests around _get_okp_base_url (also add equivalents for the region
around lines 263-268) so env/default resolution is directly covered.</details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: Path: .coderabbit.yaml **Review profile**: CHILL **Plan**: Pro **Run ID**: `6b9bd9c3-0099-48ca-8142-6fa08922532b` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 9d4518d979cabaebeeabad9936e067c473a2cef9 and f0316d8e845215bb7a5dc9b1f358818ce3989d5f. </details> <details> <summary>📒 Files selected for processing (4)</summary> * `src/constants.py` * `src/llama_stack_configuration.py` * `src/utils/vector_search.py` * `tests/unit/utils/test_vector_search.py` </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (2)</summary> * src/llama_stack_configuration.py * src/utils/vector_search.py </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
Whoever runs the LSC? |
f0316d8 to
f461c9a
Compare
src/constants.py
Outdated
| # SOLR OKP RAG | ||
| MIMIR_DOC_URL = "https://mimir.corp.redhat.com" | ||
| # SOLR OKP RAG - default document base URL when RH_SERVER_OKP env is unset | ||
| RH_SERVER_OKP_DEFAULT_URL = "https://mimir.corp.redhat.com" |
There was a problem hiding this comment.
@syedriko I think we should set this default to "http://localhost:8081"
Since mimir.corp.redhat.com is internal and behind VPN it should not be used.
f461c9a to
e5a8d6d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/utils/vector_search.py (1)
630-632: Consider caching_get_okp_base_url()at the start of the function.
_get_okp_base_url()is called multiple times within_convert_solr_chunks_to_rag_format(at lines 631 and 646). Whileos.environ.getis fast, caching the base URL at the start of the function would be slightly cleaner and more efficient for large chunk lists.♻️ Proposed refactor
def _convert_solr_chunks_to_rag_format( retrieved_chunks: list[Any], retrieved_scores: list[float], offline: bool, ) -> list[RAGChunk]: ... rag_chunks = [] + base_url = _get_okp_base_url() for i, chunk in enumerate(retrieved_chunks): ... if chunk.metadata: if offline: parent_id = chunk.metadata.get("parent_id") if parent_id: - attributes["doc_url"] = _join_okp_doc_url( - _get_okp_base_url(), parent_id - ) + attributes["doc_url"] = _join_okp_doc_url(base_url, parent_id) ... if "doc_url" not in attributes and offline and doc_id: - attributes["doc_url"] = _join_okp_doc_url( - _get_okp_base_url(), doc_id - ) + attributes["doc_url"] = _join_okp_doc_url(base_url, doc_id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 630 - 632, The function _convert_solr_chunks_to_rag_format calls _get_okp_base_url() multiple times; cache its result once at the start of that function (e.g., base_url = _get_okp_base_url()) and then use base_url where _get_okp_base_url() is currently passed (such as into _join_okp_doc_url for attributes["doc_url"] and other uses) to avoid repeated environment lookups and make the code cleaner.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/utils/test_vector_search.py`:
- Around line 265-269: The test test_returns_default_when_rh_server_okp_unset
incorrectly expects HTTPS; update the assertion to match the default constant
RH_SERVER_OKP_DEFAULT_URL (which is "http://localhost:8081") by asserting
_get_okp_base_url().startswith("http://") or simply comparing equality to
constants.RH_SERVER_OKP_DEFAULT_URL; modify the assertion in that test to use
"http://" or the constant accordingly.
---
Nitpick comments:
In `@src/utils/vector_search.py`:
- Around line 630-632: The function _convert_solr_chunks_to_rag_format calls
_get_okp_base_url() multiple times; cache its result once at the start of that
function (e.g., base_url = _get_okp_base_url()) and then use base_url where
_get_okp_base_url() is currently passed (such as into _join_okp_doc_url for
attributes["doc_url"] and other uses) to avoid repeated environment lookups and
make the code cleaner.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f39f6bca-2a82-4e92-8c88-52bab255d60f
📒 Files selected for processing (4)
src/constants.pysrc/llama_stack_configuration.pysrc/utils/vector_search.pytests/unit/utils/test_vector_search.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/llama_stack_configuration.py
e5a8d6d to
3adfa9a
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/utils/vector_search.py (1)
619-647: Consider hoisting_get_okp_base_url()outside the loop.The base URL is fetched via
_get_okp_base_url()inside the loop (lines 631, 646), resulting in repeatedos.environ.get()calls for each chunk. Since the value doesn't change during iteration, it could be retrieved once before the loop.♻️ Suggested optimization
def _convert_solr_chunks_to_rag_format( retrieved_chunks: list[Any], retrieved_scores: list[float], offline: bool, ) -> list[RAGChunk]: ... rag_chunks = [] + base_url = _get_okp_base_url() for i, chunk in enumerate(retrieved_chunks): ... if chunk.metadata: if offline: parent_id = chunk.metadata.get("parent_id") if parent_id: - attributes["doc_url"] = _join_okp_doc_url( - _get_okp_base_url(), parent_id - ) + attributes["doc_url"] = _join_okp_doc_url(base_url, parent_id) ... if "doc_url" not in attributes and offline and doc_id: - attributes["doc_url"] = _join_okp_doc_url( - _get_okp_base_url(), doc_id - ) + attributes["doc_url"] = _join_okp_doc_url(base_url, doc_id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 619 - 647, The loop repeatedly calls _get_okp_base_url() for each chunk which is unnecessary; before iterating over retrieved_chunks, call _get_okp_base_url() once and store it in a local variable (e.g., okp_base_url), then pass that variable into _join_okp_doc_url(...) in the branches that build attributes (the offline branch where parent_id is used and the Solr chunk branch where document_id is used), leaving the rest of the logic in the for-loop unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/utils/vector_search.py`:
- Around line 619-647: The loop repeatedly calls _get_okp_base_url() for each
chunk which is unnecessary; before iterating over retrieved_chunks, call
_get_okp_base_url() once and store it in a local variable (e.g., okp_base_url),
then pass that variable into _join_okp_doc_url(...) in the branches that build
attributes (the offline branch where parent_id is used and the Solr chunk branch
where document_id is used), leaving the rest of the logic in the for-loop
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a4151224-0765-4c02-9573-a2c1f869bf0c
📒 Files selected for processing (4)
src/constants.pysrc/llama_stack_configuration.pysrc/utils/vector_search.pytests/unit/utils/test_vector_search.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/llama_stack_configuration.py
- tests/unit/utils/test_vector_search.py
3adfa9a to
6311518
Compare
|
Please rebase, the changes in PR are from previous commits (OpenAPI.json for example) |
6311518 to
2ef828f
Compare
tisnik
left a comment
There was a problem hiding this comment.
the main part: LGTM config handling is fine, thank you
some nits about URLs (url is not any string)
- info about rendering (might be remove from this PR, can be addressed later).
There was a problem hiding this comment.
it is not rendered correctly, as you can see. Being you I'd skip the "Java part" from this PR for now.
There was a problem hiding this comment.
Sure, leaving PlantUML in place. The current plantuml-1.2026.2.jar still needs a couple of tweaks in the Makefile to work though.
src/models/config.py
Outdated
| Only relevant when ``"okp"`` is listed in ``rag.inline`` or ``rag.tool``. | ||
| """ | ||
|
|
||
| rhokp_url: Optional[str] = Field( |
There was a problem hiding this comment.
AnyHttpUrl perhaps instead of str perhaps?
src/utils/vector_search.py
Outdated
|
|
||
| def _get_okp_base_url() -> str: | ||
| """Return OKP document base URL from configuration (rhokp_url), or default if unset.""" | ||
| value = (configuration.okp.rhokp_url or "").strip() |
There was a problem hiding this comment.
if the URL will be real url, strip() is not needed
| return reference | ||
| base = base_url.rstrip("/") + "/" | ||
| ref = reference.lstrip("/") | ||
| return base + ref |
There was a problem hiding this comment.
URL classes provides "join" operations, which means that handling it manually won't be needed. But it's up to you in this case.
There was a problem hiding this comment.
Do you mean from urllib.parse import urljoin?
It has some odd behavior (https://stackoverflow.com/questions/10893374/python-confusions-with-urljoin) so the end result is not cleaner than what we started with:
base = str(base_url)
if not base.endswith("/"):
base = f"{base}/"
ref = reference.lstrip("/")
return urljoin(base, ref)
| else constants.SOLR_CHUNK_FILTER_QUERY | ||
| ) | ||
|
|
||
| solr_url = ( |
There was a problem hiding this comment.
dtto about AnyHttpUrl - no stripping etc.
| uv run pyreverse src/models/config.py --output puml --output-directory=docs/ | ||
| mv docs/classes.puml docs/config.puml | ||
|
|
||
| docs/config.png: docs/config.puml ## Generate an image with configuration graph |
There was a problem hiding this comment.
dunno why, but you can see that it does not render correctly (and AFAIK is outside scope of this pr)
There was a problem hiding this comment.
There's a default 4096 pixel width limit for PNGs in PlantUML apparently, bumped it in the Makefile.
There was a problem hiding this comment.
And "--theme rose" blows up like so:
$ make docs/config.svg
pyreverse src/models/config.py --output puml --output-directory=docs/
mv docs/classes.puml docs/config.puml
pushd docs && \
java -jar ~/bin/plantuml.jar --theme rose config.puml -tsvg && \
xmllint --format classes.svg > config.svg && \
rm -f classes.svg && \
popd
~/lightspeed-core/lightspeed-stack/docs ~/lightspeed-core/lightspeed-stack
Error line 1 in file: config.puml
Some diagram description contains errors
make: *** [Makefile:80: docs/config.svg] Error 200
$
added the rhokp_url configuration variable to the OkpConfiguration section.
2ef828f to
c2f6afe
Compare
Description
According to https://docs.google.com/document/d/15wxuPf9r1oEsVdpsgLM-aialDaz7UzaR5cl85vk2WkE/edit?tab=t.0, make the following changes to how the OKP server is located at runtime:
The default value for rhokp_url is "http://localhost:8081"
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
Chores
Tests