Skip to content

Conversation

@ivoson
Copy link
Contributor

@ivoson ivoson commented Dec 23, 2025

What changes were proposed in this pull request?

Enable checksum based indeterminate shuffle retry by default.

Increase jvm memory size to 6g for sql module tests, as test case SPARK-48037: Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data set shuffle partition as 16777216 which will need more memory for computing order independent shuffle checksum.

Why are the changes needed?

As checksum based solution is more accurate to detect indeterminate shuffle output changes, propose to enable it by default to avoid query correctness issues caused by indeterminate shuffle retry.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UTs.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Dec 23, 2025
@ivoson ivoson force-pushed the SPARK-54556-followup branch from 99dc740 to 4ab697e Compare December 24, 2025 01:44
@github-actions github-actions bot added the BUILD label Dec 24, 2025
@ivoson ivoson changed the title [WIP][SPARK-54556][CORE][FOLLOWUP] Enable checksum based indeterminate shuffle retry by default [SPARK-54556][CORE][FOLLOWUP] Enable checksum based indeterminate shuffle retry by default Dec 24, 2025
Added Java options to increase memory limit for tests.
@ivoson
Copy link
Contributor Author

ivoson commented Dec 24, 2025

cc @cloud-fan

@cloud-fan
Copy link
Contributor

can we use a new JIRA ticket? The original ticket is for 4.1 but this change is apparently master branch only.

@ivoson ivoson changed the title [SPARK-54556][CORE][FOLLOWUP] Enable checksum based indeterminate shuffle retry by default [SPARK-54830][CORE] Enable checksum based indeterminate shuffle retry by default Dec 24, 2025
@ivoson
Copy link
Contributor Author

ivoson commented Dec 24, 2025

can we use a new JIRA ticket? The original ticket is for 4.1 but this change is apparently master branch only.

Thanks @cloud-fan. Created a new ticket for this https://issues.apache.org/jira/browse/SPARK-54830

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants