Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jan 20, 2026

Rationale

Closes #2944
Closes #3215

Summary

  • Adds planning-time check in CometExecRule to prevent native write when input data source is not a CometNativeExec (i.e., produces non-Arrow data like OnHeapColumnVector)
  • Improves error message in Utils.scala to explain the issue and suggest workarounds
  • Adds test to verify fallback behavior works correctly

Test plan

  • Added test "native write falls back when scan produces non-Arrow data" in CometParquetWriterSuite
  • Test verifies that when native_comet scan doesn't support complex types, the native write falls back to Spark instead of failing at runtime
  • All existing tests pass

🤖 Generated with Claude Code

andygrove and others added 2 commits January 20, 2026 16:02
This test verifies that when native_comet scan doesn't support complex types,
the native write correctly falls back to Spark instead of failing at runtime
with "Comet execution only takes Arrow Arrays" error.

Also refactors the test helper methods to reduce code duplication.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 40.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.00%. Comparing base (f09f8af) to head (176809a).
⚠️ Report is 859 commits behind head on main.

Files with missing lines Patch % Lines
.../scala/org/apache/spark/sql/comet/util/Utils.scala 0.00% 8 Missing ⚠️
...n/scala/org/apache/comet/rules/CometExecRule.scala 85.71% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3227      +/-   ##
============================================
+ Coverage     56.12%   60.00%   +3.88%     
- Complexity      976     1430     +454     
============================================
  Files           119      170      +51     
  Lines         11743    15758    +4015     
  Branches       2251     2604     +353     
============================================
+ Hits           6591     9456    +2865     
- Misses         4012     4982     +970     
- Partials       1140     1320     +180     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove marked this pull request as ready for review January 21, 2026 13:58
@andygrove andygrove modified the milestone: 0.13.0 Jan 22, 2026

case op: DataWritingCommandExec =>
convertToComet(op, CometDataWritingCommand).getOrElse(op)
// Get the actual data source child that will feed data to the native writer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more generic to do this in CometExecRule.convertToComet -

 if (op.children.nonEmpty && !op.children.forall(_.isInstanceOf[CometNativeExec])) {
    // For  operators like writes, require all children to be native
    if (requiresNativeChildren(handler)) {
      return None  // Fallback to Spark
    }
   ...
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comet should gracefully handle OnHeapColumnVector instead of failing Comet fails when local writer enabled

3 participants