Skip to content

LCORE-1493: Improve K8s authentication error handling#1341

Open
anik120 wants to merge 1 commit intolightspeed-core:mainfrom
anik120:better-k8sauth-error-handling
Open

LCORE-1493: Improve K8s authentication error handling#1341
anik120 wants to merge 1 commit intolightspeed-core:mainfrom
anik120:better-k8sauth-error-handling

Conversation

@anik120
Copy link
Contributor

@anik120 anik120 commented Mar 17, 2026

Description

Refactored Kubernetes authentication error handling to replace the generic ClusterIDUnavailableError with a granular exception hierarchy that differentiates between transient API failures (503) and persistent configuration issues (500). Error messages are now preserved and returned to clients with appropriate HTTP status codes.

Motivation
The current implementation had several issues:

  • Single ClusterIDUnavailableError for all failure scenarios
  • Error context lost when converting to generic 500 responses
  • Always returned HTTP 500, even for transient K8s API failures (should be 503)
  • Broad except Exception catches masked specific errors
  • Hard to debug production issues due to generic error messages

Note: This PR does not introduce any breaking changes. The changes are internal to the K8s authentication module and the public API call method signature remains unchanged.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Bug Fixes

    • More granular Kubernetes authentication error handling: clearer HTTP mappings (Kubernetes API connectivity → 503 Service Unavailable; cluster/version and configuration issues → 500 Internal Server Error) and more descriptive error causes surfaced to clients.
  • Documentation

    • Expanded OpenAPI error examples to include Kubernetes API connectivity failures, missing/invalid cluster-version cases, and permission-denied scenarios.
  • Tests

    • Updated and expanded unit tests to cover new Kubernetes error scenarios, HTTP mappings, and additional example labels.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Replaced a single ClusterIDUnavailableError with a granular Kubernetes authentication exception hierarchy, changed _get_cluster_id to map ApiException statuses to new errors, updated kube-admin HTTP mappings, added OpenAPI error examples, and expanded unit tests to cover new failure modes.

Changes

Cohort / File(s) Summary
Exception hierarchy & auth logic
src/authentication/k8s.py
Removed ClusterIDUnavailableError. Added K8sAuthenticationError, K8sAPIConnectionError, K8sConfigurationError, ClusterVersionNotFoundError, ClusterVersionPermissionError, InvalidClusterVersionError. _get_cluster_id now maps ApiException statuses (404→not found, 403→permission, 429/5xx→API connection, other 4xx→configuration); missing/invalid clusterID→InvalidClusterVersionError. Kube-admin flow maps new errors to HTTP 503/500 and logs causes; added HTTPStatus usage.
API response models
src/models/responses.py
Added OpenAPI examples to InternalServerErrorResponse ("cluster version not found", "cluster version permission denied", "invalid cluster version") and a ServiceUnavailableResponse example ("kubernetes api" with 503 cause).
Unit tests
tests/unit/authentication/test_k8s.py, tests/unit/models/responses/test_error_responses.py
Reworked tests to replace ClusterIDUnavailableError with the new exception classes; added coverage for get_cluster_id success and failure paths (missing/invalid fields, 404, 403, 429, 5xx, other 4xx) and kube-admin HTTP mappings (503 for API connection, 500 for config/version errors). Updated OpenAPI example expectations and HTTPStatus usage.
Test support & imports
tests/unit/..., pyproject.toml, requirements.txt
Adjusted test imports/exports to include new exceptions and get_user_info, updated mocks/fixtures (group name mapping), and added HTTPStatus imports in tests.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant K8SAuthDependency
    participant K8sClient
    participant KubernetesAPI

    Client->>K8SAuthDependency: request (kube:admin)
    K8SAuthDependency->>K8sClient: get_cluster_id()
    K8sClient->>KubernetesAPI: GET ClusterVersion 'version'
    alt 200 OK
        KubernetesAPI-->>K8sClient: 200 ClusterVersion (JSON)
        K8sClient-->>K8SAuthDependency: returns clusterID
        K8SAuthDependency-->>Client: proceed (200)
    else 404 Not Found
        KubernetesAPI-->>K8sClient: ApiException 404
        K8sClient-->>K8SAuthDependency: raises ClusterVersionNotFoundError
        K8SAuthDependency-->>Client: HTTP 500 (InternalServerErrorResponse)
    else 403 Forbidden
        KubernetesAPI-->>K8sClient: ApiException 403
        K8sClient-->>K8SAuthDependency: raises ClusterVersionPermissionError
        K8SAuthDependency-->>Client: HTTP 500 (InternalServerErrorResponse)
    else 429 / 5xx
        KubernetesAPI-->>K8sClient: ApiException 429/5xx
        K8sClient-->>K8SAuthDependency: raises K8sAPIConnectionError
        K8SAuthDependency-->>Client: HTTP 503 (ServiceUnavailableResponse)
    else other 4xx
        KubernetesAPI-->>K8sClient: ApiException other 4xx
        K8sClient-->>K8SAuthDependency: raises K8sConfigurationError / InvalidClusterVersionError
        K8SAuthDependency-->>Client: HTTP 500 (InternalServerErrorResponse)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the main change: improving K8s authentication error handling through a granular exception hierarchy. It is clear, specific, and directly reflects the primary objective.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@anik120
Copy link
Contributor Author

anik120 commented Mar 17, 2026

cc: @asimurka @tisnik

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/unit/authentication/test_k8s.py (1)

528-538: Consider resetting singleton state in tests to prevent test interdependencies.

The K8sClientSingleton._cluster_id class variable persists between tests. While the current mocking strategy isolates most tests, if _get_cluster_id is called and succeeds, it will cache the cluster ID. Running tests in different orders could cause flaky behavior.

🧹 Consider using a fixture to reset singleton state
`@pytest.fixture`(autouse=True)
def reset_k8s_singleton_state():
    """Reset K8sClientSingleton state before each test."""
    yield
    # Clean up after test
    K8sClientSingleton._cluster_id = None

Or alternatively, reset state at the start of tests that depend on a clean state:

 def test_get_cluster_id_success(mocker: MockerFixture) -> None:
     """Test get_cluster_id function with successful response."""
+    K8sClientSingleton._cluster_id = None  # Ensure clean state
     mock_get_custom_objects_api = mocker.patch(
         "authentication.k8s.K8sClientSingleton.get_custom_objects_api"
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/authentication/test_k8s.py` around lines 528 - 538, The tests
leave K8sClientSingleton._cluster_id cached which can cause order-dependent
failures; update the test(s) to reset the singleton state before (or after)
running assertions by setting K8sClientSingleton._cluster_id = None (or add an
autouse pytest fixture that clears _cluster_id) so that
test_get_cluster_id_success and any other tests calling
K8sClientSingleton._get_cluster_id run with a clean state.
src/authentication/k8s.py (1)

226-234: Consider differentiating other client errors from connection errors.

The comment says "all other ApiException" are treated as connection/server errors (503), but this also captures client errors like 400 Bad Request or 401 Unauthorized. While these are unlikely in practice for this specific API call, mapping them to K8sAPIConnectionError (which implies transient failures) may be semantically incorrect.

Consider explicitly handling 5xx errors as connection errors and treating unexpected 4xx errors as configuration errors, or at minimum, documenting this design decision.

💡 Optional: More precise error categorization
             raise ClusterVersionPermissionError(
                 "Insufficient permissions to read ClusterVersion resource"
             ) from e
-            # Treat all other ApiException as connection/server errors (503)
+            # Treat server errors (5xx) and network issues as connection errors (503)
+            # Unexpected client errors (4xx) indicate misconfiguration
+            if e.status is not None and 400 <= e.status < 500:
+                logger.error(
+                    "Unexpected client error while fetching ClusterVersion (status %s): %s",
+                    e.status,
+                    e.reason,
+                )
+                raise K8sConfigurationError(
+                    f"Unexpected error accessing ClusterVersion: {e.reason} (status {e.status})"
+                ) from e
             logger.error(
                 "Kubernetes API error while fetching ClusterVersion (status %s): %s",
                 e.status,
                 e.reason,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/authentication/k8s.py` around lines 226 - 234, The current handling of
ApiException in the ClusterVersion fetch treats all non-OK errors as
K8sAPIConnectionError; update the logic in the except block that references
ApiException and logger.error so that you inspect e.status: if status is 500–599
log and raise K8sAPIConnectionError (transient/server error), if status is
400–499 log and raise a new or existing K8sAPIConfigurationError
(client/configuration error) with the original message, and for any
other/unknown statuses fall back to the connection error; ensure the
logger.error messages include e.status and e.reason and chain the original
exception when raising.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/authentication/k8s.py`:
- Around line 226-234: The current handling of ApiException in the
ClusterVersion fetch treats all non-OK errors as K8sAPIConnectionError; update
the logic in the except block that references ApiException and logger.error so
that you inspect e.status: if status is 500–599 log and raise
K8sAPIConnectionError (transient/server error), if status is 400–499 log and
raise a new or existing K8sAPIConfigurationError (client/configuration error)
with the original message, and for any other/unknown statuses fall back to the
connection error; ensure the logger.error messages include e.status and e.reason
and chain the original exception when raising.

In `@tests/unit/authentication/test_k8s.py`:
- Around line 528-538: The tests leave K8sClientSingleton._cluster_id cached
which can cause order-dependent failures; update the test(s) to reset the
singleton state before (or after) running assertions by setting
K8sClientSingleton._cluster_id = None (or add an autouse pytest fixture that
clears _cluster_id) so that test_get_cluster_id_success and any other tests
calling K8sClientSingleton._get_cluster_id run with a clean state.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cdbb6293-6312-4264-8de2-176d28db0cea

📥 Commits

Reviewing files that changed from the base of the PR and between 80d5b9e and 124228b.

📒 Files selected for processing (3)
  • src/authentication/k8s.py
  • src/models/responses.py
  • tests/unit/authentication/test_k8s.py

@anik120 anik120 force-pushed the better-k8sauth-error-handling branch from 124228b to f1bd8fc Compare March 17, 2026 19:47
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/authentication/k8s.py`:
- Around line 242-246: The TypeError except block in the code that calls
cls.get_custom_objects_api() and custom_objects_api.get_cluster_custom_object()
can reference an unbound version_data; initialize version_data = None before the
try where ClusterVersion is built and then, in the except TypeError handler, log
the type/name or a safe representation by checking version_data is not None
(e.g., include type(version_data).__name__ only if version_data is set) so the
original TypeError from get_custom_objects_api or get_cluster_custom_object
isn't masked; update the handling around the ClusterVersion construction and the
TypeError except block to guard access to version_data.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9b8bac02-b631-4fbf-9081-793935ce0720

📥 Commits

Reviewing files that changed from the base of the PR and between 124228b and f1bd8fc.

📒 Files selected for processing (4)
  • src/authentication/k8s.py
  • src/models/responses.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/models/responses/test_error_responses.py

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most parts seems ok, have some nits. W/o integration nor e2e tests it is a bit tricky to make sure the API is the same, but can be tested later.

return cluster_id
except ApiException as e:
# Handle specific HTTP status codes from Kubernetes API
if e.status == 404:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise ClusterVersionNotFoundError(
"ClusterVersion 'version' resource not found in OpenShift cluster"
) from e
if e.status == 403:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtto

logger.error(
"Failed to get cluster_id, version object is: %s", version_data
"ClusterVersion has invalid structure, version_data type: %s",
type(version_data).__name__,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corerabbit is right there

allowed=True,
username="kube:admin",
uid="some-uuid",
groups=["ols-group"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like some leftover ("ols"). repeated in other tests

@anik120 anik120 force-pushed the better-k8sauth-error-handling branch from f1bd8fc to 3baec73 Compare March 18, 2026 15:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/authentication/k8s.py (1)

202-210: ⚠️ Potential issue | 🔴 Critical

Fix version_data typing and add missing status_code parameter to error response.

Line 208 indexes version_data while declared as Optional[object] (line 202), which causes a mypy type error since object is not indexable.

Additionally, line 397 instantiates InternalServerErrorResponse without the required status_code parameter. The AbstractErrorResponse.__init__ signature requires: def __init__(self, *, response: str, cause: str, status_code: int). This will raise a TypeError at runtime.

Proposed fixes

For the typing issue at lines 202-210:

-from typing import Optional, Self
+from typing import Any, Optional, Self
 
-        version_data: Optional[object] = None
+        version_data: Optional[dict[str, Any]] = None
         try:
             custom_objects_api = cls.get_custom_objects_api()
             version_data = custom_objects_api.get_cluster_custom_object(

For the error response constructor at lines 397-400:

+from fastapi import status
+
 response = InternalServerErrorResponse(
     response="Internal server error",
     cause=str(e),
+    status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/authentication/k8s.py` around lines 202 - 210, Change version_data's type
from Optional[object] to an indexable mapping (e.g., Optional[Mapping[str, Any]]
or Optional[dict]) so indexing version_data["spec"]["clusterID"] is type-safe;
locate where version_data is declared and where get_cluster_custom_object is
called (methods: get_custom_objects_api and get_cluster_custom_object, and class
attribute _cluster_id) and update the annotation and any necessary imports. Also
update the instantiation of InternalServerErrorResponse to pass the required
status_code int (per AbstractErrorResponse.__init__(*, response: str, cause:
str, status_code: int)) wherever InternalServerErrorResponse(...) is constructed
so the constructor call includes a valid status_code.
🧹 Nitpick comments (1)
tests/unit/authentication/test_k8s.py (1)

529-655: Stabilize cluster-ID tests by resetting singleton state per test.

These tests mutate K8sClientSingleton class-level cache (_cluster_id), which can leak across test order. Consider an autouse fixture to reset singleton internals between tests.

💡 Proposed fixture
+@pytest.fixture(autouse=True)
+def reset_k8s_singleton_state() -> None:
+    """Reset singleton state between tests to avoid cross-test leakage."""
+    K8sClientSingleton._instance = None
+    K8sClientSingleton._api_client = None
+    K8sClientSingleton._cluster_id = None
+    K8sClientSingleton._custom_objects_api = None
+    K8sClientSingleton._authn_api = None
+    K8sClientSingleton._authz_api = None
+    yield
+    K8sClientSingleton._instance = None
+    K8sClientSingleton._api_client = None
+    K8sClientSingleton._cluster_id = None
+    K8sClientSingleton._custom_objects_api = None
+    K8sClientSingleton._authn_api = None
+    K8sClientSingleton._authz_api = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/authentication/test_k8s.py` around lines 529 - 655, Add an autouse
pytest fixture in the test module that resets the K8sClientSingleton singleton
state before each test: set K8sClientSingleton._cluster_id = None (and clear any
other cached attrs like K8sClientSingleton._custom_objects_api if present) so
cached values don't leak between tests; implement the fixture with
scope="function" and autouse=True so every test (including
test_get_cluster_id_*) runs against a fresh singleton state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/authentication/k8s.py`:
- Around line 228-236: The code currently maps all non-403/404 ApiException to
K8sAPIConnectionError (treated as 503); change the branching to inspect e.status
and only classify transient/connection errors as K8sAPIConnectionError (e.g.,
status in 500-599 and 429) while raising K8sConfigurationError for
client/configuration errors (e.g., 400-499 except 403/404). Update the
exception-raising in the ClusterVersion fetch block (where ApiException is
handled) to branch on e.status and raise K8sAPIConnectionError for
server/transient codes and K8sConfigurationError for client/configuration codes;
also ensure K8SAuthDependency.__call__ is updated to catch K8sConfigurationError
and return a 500 in the kube-admin path as described.
- Around line 397-400: The constructor call creating InternalServerErrorResponse
is missing the required keyword-only status_code argument
(AbstractErrorResponse.__init__ requires status_code: int), causing a TypeError
at runtime; update the instantiation of InternalServerErrorResponse (where it's
created) to pass status_code=500 (or the appropriate HTTP status constant) along
with response and cause so the object initializes correctly.

---

Outside diff comments:
In `@src/authentication/k8s.py`:
- Around line 202-210: Change version_data's type from Optional[object] to an
indexable mapping (e.g., Optional[Mapping[str, Any]] or Optional[dict]) so
indexing version_data["spec"]["clusterID"] is type-safe; locate where
version_data is declared and where get_cluster_custom_object is called (methods:
get_custom_objects_api and get_cluster_custom_object, and class attribute
_cluster_id) and update the annotation and any necessary imports. Also update
the instantiation of InternalServerErrorResponse to pass the required
status_code int (per AbstractErrorResponse.__init__(*, response: str, cause:
str, status_code: int)) wherever InternalServerErrorResponse(...) is constructed
so the constructor call includes a valid status_code.

---

Nitpick comments:
In `@tests/unit/authentication/test_k8s.py`:
- Around line 529-655: Add an autouse pytest fixture in the test module that
resets the K8sClientSingleton singleton state before each test: set
K8sClientSingleton._cluster_id = None (and clear any other cached attrs like
K8sClientSingleton._custom_objects_api if present) so cached values don't leak
between tests; implement the fixture with scope="function" and autouse=True so
every test (including test_get_cluster_id_*) runs against a fresh singleton
state.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4c422842-85a2-446b-9d73-6faddcc46e1f

📥 Commits

Reviewing files that changed from the base of the PR and between f1bd8fc and 3baec73.

📒 Files selected for processing (4)
  • src/authentication/k8s.py
  • src/models/responses.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/models/responses/test_error_responses.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/models/responses.py
  • tests/unit/models/responses/test_error_responses.py

Comment on lines 397 to 400
response = InternalServerErrorResponse(
response="Internal server error",
cause="Unable to retrieve cluster ID",
cause=str(e),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
nl -ba src/models/responses.py | sed -n '1689,1715p'
nl -ba src/models/responses.py | sed -n '2367,2415p'
nl -ba src/authentication/k8s.py | sed -n '395,401p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 118


🏁 Script executed:

# View the line ranges from responses.py and k8s.py using sed
sed -n '1689,1715p' src/models/responses.py
echo "---"
sed -n '2367,2415p' src/models/responses.py
echo "---"
sed -n '395,401p' src/authentication/k8s.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 3289


🏁 Script executed:

# Check HTTPStatus import in k8s.py
head -50 src/authentication/k8s.py | grep -i "httpstatus\|from http"

# Find other InternalServerErrorResponse usages in the codebase
rg "InternalServerErrorResponse" -A 2 -B 1

# Check if HTTPStatus is imported
rg "HTTPStatus" src/authentication/k8s.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 50388


Add missing status_code parameter to InternalServerErrorResponse constructor.

Line 397-400 constructs InternalServerErrorResponse without the required status_code parameter. Since AbstractErrorResponse.__init__ requires status_code: int as a keyword-only argument, this will raise TypeError at runtime.

💡 Proposed fix
                 response = InternalServerErrorResponse(
                     response="Internal server error",
                     cause=str(e),
+                    status_code=HTTPStatus.INTERNAL_SERVER_ERROR,
                 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/authentication/k8s.py` around lines 397 - 400, The constructor call
creating InternalServerErrorResponse is missing the required keyword-only
status_code argument (AbstractErrorResponse.__init__ requires status_code: int),
causing a TypeError at runtime; update the instantiation of
InternalServerErrorResponse (where it's created) to pass status_code=500 (or the
appropriate HTTP status constant) along with response and cause so the object
initializes correctly.

@anik120 anik120 force-pushed the better-k8sauth-error-handling branch from 3baec73 to 0a569eb Compare March 18, 2026 17:41
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/authentication/k8s.py (1)

202-210: ⚠️ Potential issue | 🔴 Critical

Fix the type annotation to resolve the mypy error.

The pipeline reports: Value of type "object | Any" is not indexable [index]. The version_data variable is typed as Optional[object], but object doesn't support indexing. Since the Kubernetes API returns a dict-like structure, use the appropriate type.

Proposed fix
-        version_data: Optional[object] = None
+        version_data: dict[str, Any] | None = None

This requires adding Any to the imports from typing at line 6:

-from typing import Optional, Self
+from typing import Any, Optional, Self
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/authentication/k8s.py` around lines 202 - 210, The variable version_data
is annotated as Optional[object], which is not indexable; update its type to
Optional[Any] (or Optional[dict[str, Any]] if you prefer stricter typing) and
add Any to the typing imports so mypy knows indexing is allowed; adjust the
declaration of version_data and ensure the file imports Any from typing, leaving
the get_cluster_custom_object call, cluster_id extraction
(version_data["spec"]["clusterID"]), cls._cluster_id assignment, and
get_custom_objects_api usage unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/authentication/k8s.py`:
- Around line 202-210: The variable version_data is annotated as
Optional[object], which is not indexable; update its type to Optional[Any] (or
Optional[dict[str, Any]] if you prefer stricter typing) and add Any to the
typing imports so mypy knows indexing is allowed; adjust the declaration of
version_data and ensure the file imports Any from typing, leaving the
get_cluster_custom_object call, cluster_id extraction
(version_data["spec"]["clusterID"]), cls._cluster_id assignment, and
get_custom_objects_api usage unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5d44e0e0-36b0-4c6b-8b85-95a435b22e93

📥 Commits

Reviewing files that changed from the base of the PR and between 3baec73 and 0a569eb.

📒 Files selected for processing (4)
  • src/authentication/k8s.py
  • src/models/responses.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/models/responses/test_error_responses.py

@anik120 anik120 force-pushed the better-k8sauth-error-handling branch from 0a569eb to 86cad71 Compare March 18, 2026 19:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/authentication/test_k8s.py (1)

530-540: ⚠️ Potential issue | 🟠 Major

Reset K8sClientSingleton state between cluster-id tests.

_get_cluster_id() mutates the class cache, so after Line 540 test_get_cluster_id_in_cluster() can pass without ever exercising the in-cluster fetch path. Resetting _cluster_id (and ideally _instance) around these tests keeps them order-independent.

♻️ Suggested test fixture
+@pytest.fixture(autouse=True)
+def reset_k8s_singleton_state():
+    K8sClientSingleton._instance = None
+    K8sClientSingleton._cluster_id = None
+    yield
+    K8sClientSingleton._instance = None
+    K8sClientSingleton._cluster_id = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/authentication/test_k8s.py` around lines 530 - 540, The tests
mutate K8sClientSingleton's cache causing order-dependent failures; ensure each
test resets the class state by clearing K8sClientSingleton._cluster_id (and
K8sClientSingleton._instance) before/after tests that call
K8sClientSingleton._get_cluster_id (e.g., in test_get_cluster_id_success and
test_get_cluster_id_in_cluster) so the in-cluster fetch path is actually
exercised and tests are independent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/authentication/k8s.py`:
- Around line 399-413: get_user_info() currently swallows TokenReview exceptions
for non-`kube:admin` lookups, causing Kubernetes API faults to be returned as
401; update get_user_info() so that when a TokenReview call raises
K8sAPIConnectionError or K8sConfigurationError you construct and raise the
appropriate HTTPException (use ServiceUnavailableResponse for
K8sAPIConnectionError and InternalServerErrorResponse for K8sConfigurationError)
instead of returning None, matching the handling already added for the
`kube:admin` cluster-id path; keep the logger.error calls (with the exception
message) and raise HTTPException(**response.model_dump()) from the caught
exception.

---

Outside diff comments:
In `@tests/unit/authentication/test_k8s.py`:
- Around line 530-540: The tests mutate K8sClientSingleton's cache causing
order-dependent failures; ensure each test resets the class state by clearing
K8sClientSingleton._cluster_id (and K8sClientSingleton._instance) before/after
tests that call K8sClientSingleton._get_cluster_id (e.g., in
test_get_cluster_id_success and test_get_cluster_id_in_cluster) so the
in-cluster fetch path is actually exercised and tests are independent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1a70be98-e665-4c82-9148-bcfff12e7ecc

📥 Commits

Reviewing files that changed from the base of the PR and between 0a569eb and 86cad71.

📒 Files selected for processing (4)
  • src/authentication/k8s.py
  • src/models/responses.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/models/responses/test_error_responses.py

@anik120 anik120 force-pushed the better-k8sauth-error-handling branch 2 times, most recently from 762fc5a to 9e3d1eb Compare March 18, 2026 19:37
Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few nits

) from e
# Classify errors by status code range
# 5xx errors and 429 (rate limit) are transient - map to 503
if e.status >= 500 or e.status == HTTPStatus.TOO_MANY_REQUESTS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

500 == http.INTERNAL_SERVER_ERROR

Raises:
HTTPException: If unable to connect to Kubernetes API or unexpected error occurs.
HTTPException: 503 if Kubernetes API is unavailable (5xx errors, 429 rate limit).
500 if Kubernetes API configuration issue (4xx errors).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indenting

raise HTTPException(**response.model_dump()) from e

# 5xx errors and 429 (rate limit) are transient - map to 503
if e.status >= 500 or e.status == HTTPStatus.TOO_MANY_REQUESTS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtto

@tisnik tisnik requested a review from asimurka March 18, 2026 19:46
@tisnik
Copy link
Contributor

tisnik commented Mar 18, 2026

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

✅ Actions performed

Full review triggered.

@anik120 anik120 force-pushed the better-k8sauth-error-handling branch from 9e3d1eb to d5faea4 Compare March 18, 2026 23:16
Refactored Kubernetes authentication error handling to replace the
generic `ClusterIDUnavailableError` with a granular exception hierarchy
that differentiates between transient API failures (503) and persistent
configuration issues (500). Error messages are now preserved and returned
to clients with appropriate HTTP status codes.

**Motivation**
The current implementation had several issues:
- Single `ClusterIDUnavailableError` for all failure scenarios
- Error context lost when converting to generic 500 responses
- Always returned HTTP 500, even for transient K8s API failures (should be 503)
- Broad `except Exception` catches masked specific errors
- Hard to debug production issues due to generic error messages

Note: This PR does not introduce any breaking changes. The changes are internal to the
K8s authentication module and the public API __call__ method signature remains unchanged.

Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>
@anik120 anik120 force-pushed the better-k8sauth-error-handling branch from d5faea4 to d9ca8dd Compare March 18, 2026 23:23
@anik120
Copy link
Contributor Author

anik120 commented Mar 19, 2026

@tisnik addressed, PTAL, thanks!

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants