Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jan 20, 2026

Rationale for this change

The DictionaryArray::dictionary() method has a data race during lazy initialization. When multiple threads concurrently call dictionary(), the check then initialize pattern allows multiple threads to pass the check simultaneously, creating multiple Array objects instead of one shared cached instance.

There were a bit of discussion in #36418 (comment) but I just propose the standard approach with the keep it simple, stupid spirit.

What changes are included in this PR?

This PR makes DictionaryArray::dictionary thread-safe by std::once_flag and std::call_once.

Are these changes tested?

Manually tested with the example below:

#include <arrow/api.h>
#include <thread>
#include <vector>

int main() {
  arrow::StringBuilder dict_builder;
  dict_builder.Append("foo").Append("bar").Append("baz");
  auto dict = dict_builder.Finish().ValueOrDie();
  arrow::Int32Builder indices_builder;
  indices_builder.AppendValues({0, 1, 2, 0, 1, 2});
  auto indices = indices_builder.Finish().ValueOrDie();

  auto dict_array_result = arrow::DictionaryArray::FromArrays(indices, dict);
  auto dict_array = std::static_pointer_cast<arrow::DictionaryArray>(
      dict_array_result.ValueOrDie());
  
  // Spawn 20 threads that concurrently access dictionary
  std::vector<std::thread> threads;
  for (int i = 0; i < 20; ++i) {
    threads.emplace_back([&dict_array, i]() {
      const auto& dictionary = dict_array->dictionary();
      results[i] = dictionary.get();
    });
  }
  
  // Verify all threads got the same cached dictionary
  for (int i = 1; i < 20; ++i) {
    if (results[i] != results[0]) {
      return 1;
    }
  }
}

Are there any user-facing changes?

As demonstrated above, it might return a different instance instead of the same instance.

@github-actions
Copy link

⚠️ GitHub issue #36503 has been automatically assigned in GitHub to PR creator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant