Skip to content

observability: new page based on dataplane observability API#2205

Open
nicolaferraro wants to merge 14 commits intomasterfrom
nf/observability-api
Open

observability: new page based on dataplane observability API#2205
nicolaferraro wants to merge 14 commits intomasterfrom
nf/observability-api

Conversation

@nicolaferraro
Copy link
Member

When the feature flag is toggled, console starts showing this and cloud-ui stops showing the old one (different ff for Serverless and Serverful).

This is a first iteration, it allows to select the time range (but only from now back to 6 hours). I need to publish the max lookback time in the API to have a better control bar.

Screenshot 2026-02-01 at 00 13 13

"react-markdown": "^10.1.0",
"react-simple-code-editor": "^0.14.1",
"react-syntax-highlighter": "^15.6.6",
"recharts": "2.15.4",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use recharts v3 shadcn-ui/ui#7669 which will require components to be upgraded.

We can do that in a followup PR


import ObservabilityPage from '../../components/pages/observability/observability-page';

export const Route = createFileRoute('/observability/')({
Copy link
Contributor

@malinskibeniamin malinskibeniamin Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this be required for standalone mode

@sago2k8
Copy link
Contributor

sago2k8 commented Feb 3, 2026

@claude could you review, focus on simplicity and ensure it follows some well known frontend best practices

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this component will require updating to support shadcn chart + recharts v3 major release.

Comment on lines 18 to 26
export type TimeRange = '5m' | '15m' | '30m' | '1h' | '3h' | '6h';

const TIME_RANGE_MS: Record<TimeRange, number> = {
'5m': 5 * 60 * 1000,
'15m': 15 * 60 * 1000,
'30m': 30 * 60 * 1000,
'1h': 60 * 60 * 1000,
'3h': 3 * 60 * 60 * 1000,
'6h': 6 * 60 * 60 * 1000,
};
Copy link
Contributor

@malinskibeniamin malinskibeniamin Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weeco implemented this for transcripts/traces UI, can you please adapt a similar approach to follow the same practices?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 45 to 53
// biome-ignore lint/correctness/useExhaustiveDependencies: refreshKey triggers recalculation on refresh
const timeRange = useMemo(() => {
const now = new Date();
const startTime = new Date(now.getTime() - TIME_RANGE_MS[selectedTimeRange]);

return {
start: startTime,
end: now,
};
}, [selectedTimeRange, refreshKey]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need useExhaustiveDependencies?

icon: ActivityIcon,
},
beforeLoad: () => {
const isObservabilityEnabled = isServerless()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to this we also need to make sure that this is actually configured in the backend I'd say - specifically thinking of self-hosted community users and customers here. Not sure if we want to disable/hide the sidebar or show a setup / onboarding page in this case.

* by the Apache License, Version 2.0
*/

import { Box, Flex, Text } from '@redpanda-data/ui';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Redpanda UI registry instead of @redpanda-data/ui for this new observability page.

Comment on lines 196 to 160
tickFormatter={(value) => {
// Format large numbers with K, M, B suffixes
if (value >= 1_000_000_000) {
return `${(value / 1_000_000_000).toFixed(1)}B`;
}
if (value >= 1_000_000) {
return `${(value / 1_000_000).toFixed(1)}M`;
}
if (value >= 1000) {
return `${(value / 1000).toFixed(1)}K`;
}
// Show decimals for small values
if (value < 10) {
return value.toFixed(2);
}
if (value < 100) {
return value.toFixed(1);
}
return value.toFixed(0);
}}
tickLine={false}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should filter using storage utils

*/

import { timestampFromMs } from '@bufbuild/protobuf/wkt';
import { Alert, AlertIcon, Box, Heading, Skeleton, Text } from '@redpanda-data/ui';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocker: use redpanda UI registry

@malinskibeniamin
Copy link
Contributor

@claude could you review, focus on simplicity and ensure it follows some well known frontend best practices

@sago2k8 it's not enabled yet AFAIK

@nicolaferraro
Copy link
Member Author

I did a first pass to remove the old components and fixing stuff. About the self-hosted mode, currently the sidebar is only shown for embedded mode only. Don't know if there's a clean way to make this work for self-hosted as well

@malinskibeniamin
Copy link
Contributor

@claude review

@claude
Copy link

claude bot commented Feb 4, 2026

Claude finished @malinskibeniamin's task in 2m 36s —— View job


Code Review

I've reviewed the observability page implementation focusing on simplicity and frontend best practices. Here's my comprehensive feedback:


Summary

Overall, the implementation is well-structured with good test coverage and proper TypeScript typing. However, there are several important issues to address, particularly regarding code consistency with existing patterns and some potential bugs.


Critical Issues (Must Fix)

1. Biome Linting Violations

The github-actions bot has flagged multiple formatting issues that should be addressed. These are primarily spacing/formatting issues in:

  • metric-chart.tsx: Lines 18, 20, 130, 138, 149, 153, 163
  • observability-page.tsx: Lines 19, 22, 83

Fix linting issues →

2. Inconsistent Import Paths

metric-chart.tsx:20 - The import statement doesn't follow the pattern used elsewhere:

import { prettyNumber } from 'utils/utils';

Should be relative like other imports in the file:

import { prettyNumber } from '../../../utils/utils';

This is inconsistent with the other imports in the same file which use relative paths like ../../redpanda-ui/components/.

3. useExhaustiveDependencies Pattern

Both observability-page.tsx:52 and observability-toolbar.tsx:44 use biome-ignore comments for useExhaustiveDependencies. According to the existing review comment from @malinskibeniamin, this pattern should be reconsidered. The refreshKey dependency pattern is valid, but the implementation could be cleaner:

Current approach (observability-page.tsx:42-53):

const refreshData = useCallback(() => {
  setRefreshKey((prev) => prev + 1);
  refetch();
}, [refetch]);

// biome-ignore lint/correctness/useExhaustiveDependencies: refreshKey triggers recalculation on refresh
const timeRange = useMemo(() => calculateTimeRange(selectedTimeRange), [selectedTimeRange, refreshKey]);

Suggested approach: Consider using a timestamp or Date object instead of an incrementing key for better clarity:

const [lastRefreshTime, setLastRefreshTime] = useState(Date.now());

const refreshData = useCallback(() => {
  setLastRefreshTime(Date.now());
  refetch();
}, [refetch]);

const timeRange = useMemo(
  () => calculateTimeRange(selectedTimeRange), 
  [selectedTimeRange, lastRefreshTime]
);

This makes the intent clearer and avoids the biome-ignore.


High Priority Issues

4. Code Duplication in Time Range Calculation

The time range calculation logic is duplicated between observability-page.tsx and observability-toolbar.tsx:

  • observability-page.tsx:53 uses calculateTimeRange(selectedTimeRange)
  • observability-toolbar.tsx:45-53 reimplements the same logic inline

The calculateTimeRange function exists and is exported (observability-toolbar.tsx:111-119), but the toolbar component doesn't use it. This is unnecessary duplication and a maintenance risk.

Fix: Remove the inline calculation in ObservabilityToolbar and use the exported function:

// Current (lines 45-53):
const timeRange = useMemo(() => {
  const now = new Date();
  const startTime = new Date(now.getTime() - TIME_RANGE_MS[selectedTimeRange]);
  return { start: startTime, end: now };
}, [selectedTimeRange, refreshKey]);

// Should be:
const timeRange = useMemo(
  () => calculateTimeRange(selectedTimeRange), 
  [selectedTimeRange, refreshKey]
);

5. Grid Layout Classes

observability-page.tsx:83 - The grid classes appear correct but differ from the Biome suggestion. Current:

<div className="grid gap-6 grid-cols-1 lg:grid-cols-2">

Biome suggests:

<div className="grid grid-cols-1 gap-6 lg:grid-cols-2">

The suggested order follows Tailwind's recommended class ordering (layout → spacing → responsive).

6. Error Handling Could Be More Specific

metric-chart.tsx:135-142 - The error state treats all errors the same way. Consider differentiating between:

  • Network errors (suggest retry)
  • 404 (metric not found)
  • Permission errors
  • Timeout errors

This would provide better UX and debugging information.


Medium Priority Issues

7. Magic Numbers in Color Array

metric-chart.tsx:109-115 - The chart colors are hardcoded. Consider:

// Current:
const colors = [
  'var(--color-chart-1)',
  'var(--color-chart-2)',
  // ...
];

// Better: Define as constant outside component
const CHART_COLORS = [
  'var(--color-chart-1)',
  'var(--color-chart-2)',
  'var(--color-chart-3)',
  'var(--color-chart-4)',
  'var(--color-chart-5)',
] as const;

This prevents recreation on every render and makes the colors reusable.

8. Potential NaN Timestamp

metric-chart.tsx:237-244 - Multiple checks for NaN timestamp, but the flow could be simplified:

const timestamp = payload?.[0]?.payload?.timestamp;
if (!timestamp || typeof timestamp !== 'number') return '';
const date = new Date(timestamp);
if (!date.getTime()) return ''; // getTime() returns 0 for invalid dates

9. Timezone Hardcoded to UTC

observability-toolbar.tsx:104 and metric-chart.tsx:179, 250 - UTC is hardcoded. Consider:

  • Making timezone configurable via user preferences
  • Or at minimum, document why UTC is required (e.g., backend always returns UTC)

10. Type Safety Could Be Improved

metric-chart.tsx:40-60 - The addDataPoint function accepts point with optional properties but doesn't validate the structure deeply. Consider using a more specific type or adding runtime validation.


Low Priority / Nice to Have

11. Chart Height Hardcoded

metric-chart.tsx:168 - Height is hardcoded to 250px. Consider making this configurable via props if different charts need different heights in the future.

12. Loading State Could Show Skeleton Count

observability-page.tsx:55-62 - Shows 3 skeleton cards regardless of how many charts will actually load. Consider showing the correct number if queries is available from cache.

13. Test Coverage Gaps

observability-page.test.tsx - Tests are good but missing:

  • Test for the refresh functionality (appGlobal.onRefresh)
  • Test for time range changes
  • Test for toolbar interactions

14. Accessibility Improvements

  • observability-toolbar.tsx:73-108 - The toolbar could benefit from ARIA labels
  • metric-chart.tsx - Consider adding aria-label to charts for screen readers
  • The refresh button uses global state but should ideally have keyboard support

Performance Observations

Good:

  • Proper use of useMemo to prevent unnecessary recalculations
  • useCallback for event handlers
  • Chart data transformation is memoized

⚠️ Could be improved:

  • observability-page.tsx:84-86 - The .map() creates new <MetricChart> components on every render. Consider adding React.memo() to MetricChart if queries list is large.

Code Style & Consistency

Good:

  • Consistent TypeScript typing throughout
  • Good separation of concerns (separate files for chart, toolbar, page)
  • Proper copyright headers
  • Clean component structure

⚠️ Inconsistencies:

  • Import path styles (relative vs absolute) - see issue Add prometheus metrics #2
  • Some components use functional declarations while others might benefit from consistency

Security

✅ No security issues identified. The code:

  • Uses proper protobuf message creation
  • Doesn't execute arbitrary code
  • Doesn't expose sensitive data
  • Uses proper XSS protection (React's built-in escaping)

Testing

Good test coverage with proper mocking and protobuf usage. Tests cover:

  • Empty state
  • Loading state
  • Error state
  • Success with data

Addressing Existing Review Comments

From previous reviewers:

  1. @malinskibeniamin: "use redpanda UI registry" ✅ - All components correctly use ../../redpanda-ui/components/ imports. No @redpanda-data/ui imports found.

  2. @malinskibeniamin: "filter using storage utils" ⚠️ - I couldn't find storage utils in the codebase. If this refers to persisting filter state, that's not currently implemented.

  3. @malinskibeniamin: "useExhaustiveDependencies" - Addressed in issue Create helm chart #3 above.

  4. @weeco: "similar approach to transcripts/traces UI" - The current implementation does follow a similar pattern with toolbar + data grid layout.

  5. @weeco: "self-hosted mode" - The route guard (routes/observability/index.tsx:23-33) only checks feature flags. No backend configuration check is present. This could lead to showing an empty page if the backend doesn't support the API.


Recommendations Priority

Must fix before merge:

  1. Fix all Biome linting issues (automated suggestions available)
  2. Fix inconsistent import path (issue Add prometheus metrics #2)
  3. Remove duplicated time range calculation (issue Add dark theme  #4)

Should fix before merge:
4. Consider alternative to useExhaustiveDependencies pattern (issue #3)
5. Add backend configuration check for self-hosted mode (issue from @weeco)
6. Fix grid class ordering (issue #5)

Can be addressed in follow-up:
7. Improve error handling specificity
8. Extract color constants
9. Make timezone configurable
10. Add missing test coverage
11. Performance optimization with React.memo()


Conclusion

This is a solid first iteration of the observability page. The core functionality is well-implemented with good separation of concerns and test coverage. The main areas for improvement are:

  1. Code consistency - Import paths and React patterns should be uniform
  2. Simplification - Remove code duplication in time range calculations
  3. Robustness - Add backend configuration checks for self-hosted scenarios
  4. Polish - Address linting issues and class ordering

The implementation follows good React practices overall and the component architecture is clean. Once the critical issues are addressed, this will be ready to merge.


@nicolaferraro nicolaferraro force-pushed the nf/observability-api branch 2 times, most recently from a89aa76 to c826332 Compare February 5, 2026 14:15
Comment on lines 146 to 159
tickFormatter={(value) => {
// Use prettyNumber for large values (handles K, M, B formatting)
if (value >= 1000) {
return prettyNumber(value).toUpperCase();
}
// Show decimals for small values
if (value < 10) {
return value.toFixed(2);
}
if (value < 100) {
return value.toFixed(1);
}
return value.toFixed(0);
}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use storage utils to get proper formatting for these units

icon: ActivityIcon,
visibilityCheck: routeVisibility(
() =>
isEmbedded() &&
Copy link
Contributor

@malinskibeniamin malinskibeniamin Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be gated by isAdpEnabled() too, right? Or will we have observability in self-hosted too 1 day, somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it's possible to provide it in self-hosted, by configuring a local prom server. There's an internal discussion about it. For the moment, I think this condition only enables it in embedded mode with a specific feature flag.

Comment on lines +34 to +37
return useQuery(listQueries, listQueriesRequest, {
...opts,
retry: fastFailRetry,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to poll this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not wire auto-polling since the page where charts are used has an auto-refresh button, that also moves the clock forward.

Polling without applying a time shift does nothing (the API uses an absolute time range).

Copy link
Contributor

@malinskibeniamin malinskibeniamin Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rely on work from @weeco for transcripts? and use the same approach for the time ranges

@nicolaferraro
Copy link
Member Author

Done another pass, I hope I got what you meant :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants