feat: Thread aware cache #115

ZacWalk · 2025-12-09T18:19:19Z

This PR introduces thread_aware_cache, a high-performance in-memory cache designed to maximize read throughput and minimize latency on multi-socket NUMA architectures.

Thread-Affinity Sharding
Unlike traditional caches that shard by key hash, this implementation shards by thread affinity. This ensures data remains physically local to the executing thread, significantly reducing cross-socket interconnect traffic.
"Read-Through" Replication
This implementation utilizes a unique replication strategy that automatically promotes hot data to the local shard upon access. This is supported by a lock-free Bloom filter to efficiently short-circuit negative lookups across shards.
SIEVE Eviction Algorithm
Capacity management is handled by the SIEVE algorithm. This provides scan-resistance and high concurrency by eliminating the need for heavy write locks during cache reads.

ralfbiedert · 2025-12-09T18:28:19Z

What is the relation between this and the cache Schuyler is working on, do they serve different use cases? Also, I don't quite understand how the cache does thread migration, it doesn't seem to implement the ThreadAware trait?

codecov · 2025-12-09T18:31:44Z

Codecov Report

❌ Patch coverage is 94.37052% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.3%. Comparing base (32e413b) to head (68e98e6).
⚠️ Report is 43 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/thread_aware_cache/src/cache.rs	87.5%	30 Missing ⚠️
crates/thread_aware_cache/src/sieve.rs	94.8%	11 Missing ⚠️
crates/thread_aware_cache/src/bloom.rs	94.3%	7 Missing ⚠️
crates/thread_aware_cache/src/shard.rs	94.6%	6 Missing ⚠️
crates/thread_aware_cache/src/tests.rs	99.6%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #115     +/-   ##
========================================
- Coverage   100.0%   99.3%   -0.7%     
========================================
  Files          33      72     +39     
  Lines        3291    7973   +4682     
========================================
+ Hits         3291    7918   +4627     
- Misses          0      55     +55

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

geeknoid · 2025-12-09T19:12:30Z

Please see the top-level README.md files for the steps needed to add a new crate. You need to register it in the top-level README and the top level CHANGELOG, you need to add the requisite logo and favicon, and you need the voodoo lines in the crate-level documentation to pull in the logo and icon into docs.rs. See the script scripts\add-crate.ps1 for everythig.

geeknoid

It's cool to see this kind of thing coming online.

geeknoid · 2025-12-09T19:14:35Z

crates/thread_aware_cache/CHANGELOG.md

@@ -0,0 +1,14 @@
+# Changelog


We auto-generate the changelog, so you can leave this blank for starters and it'll be filled in when releasing the crate.

geeknoid · 2025-12-09T19:16:44Z

crates/thread_aware_cache/benches/cache.rs

+// Copyright (c) Microsoft Corporation.
+// Licensed under the MIT License.
+
+//! Benchmarks for the NUMA cache.


Can you add comparative benchmarks for existing cache crates out there?

geeknoid · 2025-12-09T19:23:22Z

crates/thread_aware_cache/src/lib.rs

+//!
+//! The Bloom filter is sized for ~1% false positive rate and uses atomic operations for
+//! lock-free concurrent access. Note that removals don't clear bits from the filter, so
+//! stale positives may occur (safe, just slower), but false negatives never occur.


Does the bloom filter ever get regenerated over time?

Handling removals in the bloom filter is tricky because items are independently removed from the shards. Think of it like SIEVE eviction per NUMA node. To handle this well we would need a bit per affinity. My thinking was to have this decided at compile:

Cache has a bit per affinity bloom filter because the cache will have large number of deletes

Single bit but does not track removals and may become saturated. This would be right for mostly read only caches.

I guess I was thinking that you could keep track of the number of false positive over time. As the ratio gets over a certain threshold, you'd throw the filter away and rebuild it. I don't know if rebuilding it is possible though.

I don't know if there is such a thing as a mostly read-only cache. A cache with TTL will eventually recycle every item, so the bloom filter will become ineffective.

geeknoid · 2025-12-09T19:25:20Z

crates/thread_aware_cache/src/bloom.rs

+// Copyright (c) Microsoft Corporation.
+// Licensed under the MIT License.
+
+//! Lock-free Bloom filter for fast negative lookups.


Unless this code has some magic in it, can we use an existing crate for this like fastbloom?

If thia code does have some magic, then perhaps it should be in a separate crate?

geeknoid · 2025-12-09T19:26:49Z

crates/thread_aware_cache/src/lib.rs

+
+pub use cache::{NumaCache, NumaCacheBuilder};
+
+// Re-export thread_aware types for convenience


@ralfbiedert Do we have a policy on reexports?

geeknoid · 2025-12-09T19:31:09Z

crates/thread_aware_cache/src/lib.rs

@@ -0,0 +1,136 @@
+// Copyright (c) Microsoft Corporation.


Couple high-level comments:

You use the word 'shard' to describe the different partitions. Given this is explicitly a NUMA cache, when would you expect the shards to be different than the NUMA nodes? If they're also going to be 1:1, then can we just use the words 'NUMA node' instead of shard in order to eliminate one concept?

You call this thread_aware_cache, but it's really a NUMA-aware cache, right? There is no general affordance for thread-specific shards. You can imagine having per-thread L0 caches to completely eliminate contention for common lookups. So should we call this crate numa_aware_cache?

Would it be possible to support the Borrow pattern like HashMap does? This enables mix-and-match between &str and String which is mighty handy.

Caches should report some state about their efficacy. Hit & miss rates in particular. Can we add this? Otherwise, it's hard for app to tune the cache size.

I worry that this won't scale well within a shard given the very coarse lock that's taken.

I also worry that a contended lock can have catastophic effects on overall systemic perf. If N threads are in the middle of reading from a shard and a writer comes along, that writer will block and in a thread-per-core world the thread's core will go completely idle. Should we be using async mutexes and make cache access be async methods? Is the overhead of async warranted or would that potentially still also suffer from contention inside the async mutex logic?

I didn't dig enough in the code, but what's the lock behavior when a cache miss occur and probing occurs in other shards? If the shard's lock is held across the whole operation, this could lead to substantial contention. If the lock is not held, then we have a racy situation where a probe could be occurring and then a local thread can put the sought-after data into the shard "behind the scenes". Or 10 threads could potentially be probing for the same state at the same time.

Good comments. My intention was to start the ball rolling again with previous work. Setting to draft while we mature the ideas here and have some discussions.

geeknoid · 2025-12-09T19:32:02Z

crates/thread_aware_cache/src/cache.rs

+/// thread affinity to minimize cross-NUMA traffic. Each shard is explicitly
+/// associated with a [`thread_aware::PinnedAffinity`].
+///
+/// A shared Bloom filter optimizes cross-shard lookups by quickly identifying


Seems like an implementation detail that's not needed in the user docs.

geeknoid · 2025-12-09T19:39:09Z

crates/thread_aware_cache/src/cache.rs

+///
+/// * `K` - The key type, must implement `Eq + Hash + Clone`.
+/// * `V` - The value type, must implement `Clone`.
+/// * `S` - The hash builder type, defaults to `DefaultHashBuilder`.


The default hasher is very slow. We might want to default to something else, like RapidHash.

geeknoid · 2025-12-09T19:45:32Z

crates/thread_aware_cache/src/shard.rs

+use crate::sieve::{NodeIndex, SieveList};
+
+/// Cache line size for alignment to prevent false sharing.
+const CACHE_LINE_SIZE: usize = 64;


Nobody is making a 128 byte cache line? There's no system-level constant for this we could leverage?

geeknoid · 2025-12-09T19:46:21Z

crates/thread_aware_cache/src/shard.rs

+/// A single cache shard containing data and eviction metadata.
+///
+/// Aligned to the CPU cache line (64 bytes) to prevent cache-line bouncing between locks.
+#[repr(align(64))]


Can you use CACHE_LINE_SIZE here, or are you not allowed to use constants in these attribute expressions?

ZacWalk · 2025-12-10T10:11:29Z

What is the relation between this and the cache Schuyler is working on, do they serve different use cases? Also, I don't quite understand how the cache does thread migration, it doesn't seem to implement the ThreadAware trait?

Schuyler told me they did not have an in-memory part to their cache yet. I wanted to contribute this one we wrote last year but was parked. This code has been evolved to use thread_aware. Let me set to draft while we discuss if that is a good idea or not.

thread_aware_cache

68e98e6

geeknoid reviewed Dec 9, 2025

View reviewed changes

ZacWalk marked this pull request as draft December 10, 2025 10:12

ZacWalk changed the title ~~Thread aware cache~~ feat: Thread aware cache Dec 10, 2025


		pub use cache::{NumaCache, NumaCacheBuilder};

		// Re-export thread_aware types for convenience

feat: Thread aware cache #115

Are you sure you want to change the base?

feat: Thread aware cache #115

Conversation

ZacWalk commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ralfbiedert commented Dec 9, 2025

Uh oh!

codecov bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

geeknoid commented Dec 9, 2025

Uh oh!

geeknoid left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geeknoid Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZacWalk commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZacWalk commented Dec 9, 2025 •

edited

Loading

codecov bot commented Dec 9, 2025 •

edited

Loading

geeknoid Dec 9, 2025 •

edited

Loading