Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Jan 14, 2026

Rationale for this change

When reading an encrypted Parquet file with a plaintext footer, the Parquet reader is able to verify footer integrity by comparing the signature in the file with the one computed by encrypting the footer.

However, the way it does this is to first re-serializes the deserialized footer using Thrift. This has several issues:

  1. it's inefficient
  2. it's not obvious that it will always produce the same Thrift encoding as the original, leading to spurious signature verification failures
  3. if the original footer deserializes to invalid enum values, attempting to serialize it again will lead to undefined behavior

Reason 3 is what allowed this to be uncovered by OSS-Fuzz (see https://oss-fuzz.com/testcase-detail/4740205688193024).

This PR switches to reusing the original serialized metadata.

Are these changes tested?

Yes, by existing tests and new fuzz regression file.

Are there any user-facing changes?

No.

@pitrou pitrou requested review from EnricoMi and adamreeve January 14, 2026 15:24
@pitrou pitrou marked this pull request as ready for review January 14, 2026 15:28
@pitrou pitrou requested a review from wgtmac as a code owner January 14, 2026 15:28
Comment on lines 335 to 327
PARQUET_DEPRECATED("Deprecated in 24.0.0. Use the two-argument overload instead.")
bool VerifySignature(const void* signature);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to deprecate this but we might also remove it, as it seems this API is meant for internal use? @adamreeve @EnricoMi

Copy link
Contributor

@adamreeve adamreeve Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I can't see why this would need to be part of the public API. In which case, the new method should probably be private or internal too.

If we remove it and make the new method internal then this could block someone from upgrading if they had a legitimate use case for this. Maybe we should deprecate it and suggest users make an issue if they need this functionality? Also OK by me if you just want to remove it though, it seems unlikely this would be used.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 14, 2026
Comment on lines 837 to 838
bool VerifySignature(std::span<const uint8_t> serialized_metadata,
std::span<const uint8_t> signature) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this as a method of FileMetaData but the only member it uses is the FileDecryptor, so perhaps this should be moved elsewhere (or made static?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@pitrou
Copy link
Member Author

pitrou commented Jan 14, 2026

Hmm, it looks like we need to wait for #48819 for the R failures.

@rok
Copy link
Member

rok commented Jan 14, 2026

If I remember correctly in rust we verify directly on decrypted buffer (skipping reserialization).

@pitrou
Copy link
Member Author

pitrou commented Jan 15, 2026

If I remember correctly in rust we verify directly on decrypted buffer (skipping reserialization).

I'm not surprised that Rust is saner than C++ :-)

@pitrou pitrou force-pushed the gh48858-plaintext-verify-signature branch from 35d6c51 to e853dee Compare January 15, 2026 08:45
@pitrou
Copy link
Member Author

pitrou commented Jan 15, 2026

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: e853dee

Submitted crossbow builds: ursacomputing/crossbow @ actions-f90353001e

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-debian-experimental-cpp-gcc-15 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@pitrou pitrou force-pushed the gh48858-plaintext-verify-signature branch from e853dee to 1bc9ff5 Compare January 19, 2026 10:05
@pitrou
Copy link
Member Author

pitrou commented Jan 19, 2026

@github-actions crossbow submit fuzz

@github-actions
Copy link

Revision: 1bc9ff5

Submitted crossbow builds: ursacomputing/crossbow @ actions-35c4966c5a

Task Status
test-build-cpp-fuzz GitHub Actions

@pitrou
Copy link
Member Author

pitrou commented Jan 19, 2026

CI failures are unrelated.

@pitrou
Copy link
Member Author

pitrou commented Jan 19, 2026

@adamreeve @EnricoMi @wgtmac Are you available to review this?

Copy link
Contributor

@adamreeve adamreeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me thanks @pitrou. I don't feel too strongly either way about how to handle the deprecation and method visibility.

@adamreeve
Copy link
Contributor

If I remember correctly in rust we verify directly on decrypted buffer (skipping reserialization).

Yes, it looks like we started out copying the C++ approach and then switched to encrypting the metadata buffer directly for similar reasons: apache/arrow-rs#7459 (comment)

Comment on lines 837 to 838
bool VerifySignature(std::span<const uint8_t> serialized_metadata,
std::span<const uint8_t> signature) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

auto file_decryptor = std::make_shared<InternalFileDecryptor>(
file_decryption_properties, file_aad, algo.algorithm,
file_metadata_->footer_signing_key_metadata(), properties_.memory_pool());
// set the InternalFileDecryptor in the metadata as well, as it's used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make verify signature as a static function, we can remove set_file_decryptor as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the decryptor is passed down to row group and column metadata, and is necessary to decrypt encrypted column metadata.

std::shared_ptr<Buffer> encrypted_buffer =
AllocateBuffer(file_decryptor_->pool(),
aes_encryptor->CiphertextLength(serialized_metadata.size()));
int32_t encrypted_len = aes_encryptor->SignedFooterEncrypt(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we produce the signature by encrypting the serialized metadata? Can't we compute the signature from the encrypted buffer? Or is the signature a property of the encryption process?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU, the signature is part of the AEAD process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To quote the spec about plaintext footers:

The footer signing is done by encrypting the serialized FileMetaData structure with the AES GCM algorithm - using a footer signing key, and an AAD constructed according to the instructions of the section 4.4. Only the nonce and GCM tag are stored in the file – as a 28-byte fixed-length array, written right after the footer itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the signature we are talking about is the concatenation of the nonce and the AES-GCM "authentication tag".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we obtain the serialized metadata? Does this come from decrypting the encrypted metadata? In that case, we should not need to compute and compare the signature because decryption should fail in the first place if signature does not match.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is plaintext footer mode, so we encrypt the footer (metadata) but only store the signature, not the encrypted metadata. To verify integrity, we encrypt again to compare the signatures. There is no encrypted metadata along with the signature, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context, the footer (i.e. the serialized metadata) is stored in plaintext, not encrypted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(you're right, when the footer is encrypted, checking the signature is part of the decryption process and we don't need to do it manually)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, looks like we posted our messages at the same time :) So, yes.

@pitrou pitrou force-pushed the gh48858-plaintext-verify-signature branch from 1bc9ff5 to e8fea7e Compare January 20, 2026 11:48
@pitrou
Copy link
Member Author

pitrou commented Jan 20, 2026

I've made the new method static and private, and updated the deprecation message to ask users to contact us if they need the deprecated functionality.

@pitrou
Copy link
Member Author

pitrou commented Jan 20, 2026

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: e8fea7e

Submitted crossbow builds: ursacomputing/crossbow @ actions-d1830a6afa

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-debian-experimental-cpp-gcc-15 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants