Skip to content

Conversation

@yashmayya
Copy link
Contributor

@yashmayya yashmayya commented Jan 15, 2026

@codecov-commenter
Copy link

codecov-commenter commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 82.32044% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.18%. Comparing base (7a96b42) to head (e642147).
⚠️ Report is 27 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...ntroller/helix/core/rebalance/TableRebalancer.java 60.52% 11 Missing and 4 partials ⚠️
...ntroller/helix/core/PinotHelixResourceManager.java 75.60% 6 Missing and 4 partials ⚠️
...not/common/assignment/InstancePartitionsUtils.java 91.11% 2 Missing and 2 partials ⚠️
...ker/routing/instanceselector/InstanceSelector.java 0.00% 1 Missing ⚠️
...ing/instanceselector/SegmentInstanceCandidate.java 75.00% 1 Missing ⚠️
.../core/realtime/PinotLLCRealtimeSegmentManager.java 95.65% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17515      +/-   ##
============================================
- Coverage     63.25%   63.18%   -0.07%     
+ Complexity     1477     1476       -1     
============================================
  Files          3170     3172       +2     
  Lines        189469   189919     +450     
  Branches      28988    29063      +75     
============================================
+ Hits         119840   120002     +162     
- Misses        60339    60590     +251     
- Partials       9290     9327      +37     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.17% <82.32%> (-0.05%) ⬇️
java-21 55.51% <89.28%> (-7.68%) ⬇️
temurin 63.18% <82.32%> (-0.07%) ⬇️
unittests 63.18% <82.32%> (-0.07%) ⬇️
unittests1 55.54% <89.28%> (-0.05%) ⬇️
unittests2 34.06% <77.34%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yashmayya yashmayya force-pushed the ideal-state-instance-partitions branch 2 times, most recently from 2065108 to b95aee0 Compare January 15, 2026 18:56
@yashmayya yashmayya force-pushed the ideal-state-instance-partitions branch from b95aee0 to 5117d78 Compare January 16, 2026 01:29
@yashmayya yashmayya force-pushed the ideal-state-instance-partitions branch from 820e6a0 to e642147 Compare January 23, 2026 01:18

public TableRebalancer(HelixManager helixManager) {
this(helixManager, null, null, null, null, null);
this(helixManager, null, null, null, null, null, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be true or false?

"Cannot rebalance disabled table without downtime", null, null, null, null, null);
}

// Wipe out ideal state instance partitions metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't wipe it until a rebalance is indeed required.
E.g. when segmentAssignmentUnchanged, we should check if instance partitions changed, then modify accordingly.
If we wipe it here, and following part throws exception, we might end up with an IS without instance partitions

Map<String, List<String>> idealStateListFields = currentIdealState.getRecord().getListFields();
InstancePartitionsUtils.replaceInstancePartitionsInIdealState(currentIdealState, instancePartitionsList);

return HelixHelper.updateIdealState(_helixManager, tableNameWithType, is -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't perform retry here. The update needs to be version checked update to ensure consistency of IS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should wipe the IP with the first IS change, and restore it with the last IS change. Replacing IP as separate step can cause inconsistency

if (_updateIdealStateInstancePartitions) {
// Rebalance completed successfully, so we can update the instance partitions in the ideal state to reflect
// the new set of instance partitions.
List<InstancePartitions> instancePartitionsList = new ArrayList<>(instancePartitionsMap.values());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making the order of this list deterministic, so that we can check if it is identical to the existing one

Map<String, List<String>> idealStateListFields = idealState.getRecord().getListFields();
for (InstancePartitions instancePartitions : instancePartitionsList) {
String instancePartitionsName = instancePartitions.getInstancePartitionsName();
for (String partitionReplica : instancePartitions.getPartitionToInstancesMap().keySet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) We can do entrySet() to reduce map lookup

Integer replicaGroup = Integer.parseInt(key.substring(separatorIndex + 1));
listFields.getValue().forEach(value -> {
if (serverToReplicaGroupMap.containsKey(value)) {
LOGGER.warn("Server {} assigned to multiple replica groups ({}, {})", value, replicaGroup,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that one server is assigned to multiple replicas? If so, will this break routing?
Should we consider throwing exception and fall back when this happens?

private final String _instance;
private final boolean _online;
private final int _pool;
private final int _replicaGroupId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Rename it to _replicaId


/// Given a partition ID and replica group ID like "0_0", return the list of instances belonging to that instance
/// partition
public List<String> getInstances(String partitionReplica) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) It is intentional to not provide this method to reduce map access. Caller should use entrySet() of the _partitionToInstancesMap instead of looking up each key

for (InstancePartitions instancePartitions : instancePartitionsMap.values()) {
if (!instancePartitions.equals(
idealStateInstancePartitions.get(instancePartitions.getInstancePartitionsName()))) {
LOGGER.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a table level gauge to reflect if IP is wiped for IP enabled table


// Assign instances
assignInstances(tableConfig, true);
assignInstances(tableConfig, idealState, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we revert the changes for instance assignment?
We should modify IS when assigning segment, not instance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants