Skip to content

Spark Iceberg streaming - checkpoint leverages S3fileIo signer path instead of hadoop's S3AFileSystem #14762

@koombal

Description

@koombal

Apache Iceberg version

1.8.1

Query engine

Spark

Please describe the bug 🐞

Spark 3.5.6
Iceberg 1.8.1

we are building a Medalion like architecture in iceberg. streaming from iceberg bronze table to iceberg silver table.

this is how we write to the destination table:

        return bronzeStream
                .writeStream()
                .outputMode("append")
                .option("checkpointLocation", ckptDir)
                .queryName("bronze_to_silver_" + silverTableName)
                .trigger(Trigger.ProcessingTime(processingTime))
                .foreachBatch((batch, batchId) -> {
                    if (batch.isEmpty()) {
                        log.info(
                                "Bronze->Silver stream {}: batch {} is empty, skipping write.",
                                silverTableName,
                                batchId);
                        return;
                    }
                    batch.writeTo(StreamUtils.getFullTableName(silverTableName)).append();
                })
                .start();
    }

we are getting the following error in spark:
Query 20b3208b-35a2-4feb-bce8-ac76abe3df9c terminated with error: org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Table not found or action can_get_metadata forbidden for Anonymous
at org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:212)
at org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:188)
at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:224)
at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:308)
at org.apache.iceberg.rest.BaseHTTPClient.post(BaseHTTPClient.java:100)
at **org.apache.iceberg.aws.s3.signer.S3V4RestSignerClient.sign(**S3V4RestSignerClient.java:351)
at software.amazon.awssdk.core.internal.http.pipeline.stages.SigningStage.lambda$signRequest$4(SigningStage.java:154)
at software.amazon.awssdk.core.internal.util.MetricUtils.measureDuration(MetricUtils.java:63)
at software.amazon.awssdk.core.internal.http.pipeline.stages.SigningStage.signRequest(SigningStage.java:153)
at software.amazon.awssdk.core.internal.http.pipeline.stages.SigningStage.execute(SigningStage.java:72)
at software.amazon.awssdk.core.internal.http.pipeline.stages.SigningStage.execute(SigningStage.java:50)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)

in the catalog we can see that a sign request was sent on behalf of the checkpoint directory....
we would expect that:
The checkpoint directory will use its hadoop credentials and not go through S3FileIo with a "S3V4RestSignerClient.sign" flow (or an STS flow for that matter).

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions