Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jan 4, 2026

Rationale for this change

4937d9f (ARROW-5102) added the TODO comment requesting a test with valid UTF-8 filenames. Later, the UTF-8 to UTF-16 conversion logic on Windows was introduced in commit eb23ea9 (ARROW-5648) which should fix the issue.

Essentially we should add a test for:

Result<NativePathString> StringToNative(std::string_view s) {
#if _WIN32
return ::arrow::util::UTF8ToWideString(s);
#else
return std::string(s);
#endif
}
(StringToNative()). This test complements existing FileNameWideCharConversionRangeException test (invalid UTF-8).

What changes are included in this PR?

This PR adds the test described above.

Are these changes tested?

Unittest was added.

Are there any user-facing changes?

No, test-only.

@github-actions
Copy link

github-actions bot commented Jan 4, 2026

⚠️ GitHub issue #48721 has been automatically assigned in GitHub to PR creator.

// Test that file operations work with valid UTF-8 filenames.
// On Windows, PlatformFilename::FromString() converts UTF-8 strings to wide strings.
// On Unix, filenames are treated as opaque byte strings.
std::string utf8_file_name = "test_file_한국어_😀.txt";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

한국어 is "Korean" in Korean FYI .. :-)..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the emoticon Korean too? :)

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 6, 2026
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks contributing this @HyukjinKwon . Looks good in general, just one comment.

Also, can you please trim down the PR description?

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 19, 2026

I made the PR description shorter. Hopefully this one is easier to follow.

@HyukjinKwon HyukjinKwon changed the title GH-48721: [C++] Add test for UTF-8 filenames on Windows GH-48721: [C++] Add test for file creation with UTF-8 filenames Jan 20, 2026
@HyukjinKwon HyukjinKwon marked this pull request as draft January 20, 2026 09:52
@HyukjinKwon
Copy link
Member Author

Ah, it's not the related test failure.

@HyukjinKwon HyukjinKwon marked this pull request as ready for review January 20, 2026 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants