High-performance cache policies and supporting data structures.
Status: design rationale for sharded data structures that exist today and roadmap notes for sharded cache policies. Companion to
concurrency.mdandhashing.md.
Sharding reduces contention by splitting one shared structure into N independent
substructures, each with its own lock and capacity accounting. cachekit already
uses this pattern at the data-structure and store layers. It does not yet
ship a generic ShardedCache<C> or sharded policy wrapper.
| Type | Layer | Purpose |
|---|---|---|
ShardedHashMapStore<K, V, S> |
store | N locked hash maps with global size counter |
ShardedSlotArena<T> |
data structure | N arenas addressed by ShardedSlotId |
ShardedFrequencyBuckets<K> |
data structure | N frequency bucket sets for concurrent LFU-style metadata |
ShardSelector |
helper | keyed hash routing from key to shard |
The sharded primitives are building blocks, not full cache policies. A future
ShardedLruCache would have to compose a sharded key index, per-shard recency
metadata, and global capacity semantics. That composition is where the hard
policy questions live.
A single RwLock wrapper is simple and often fast enough. It fails when:
get on LRU, LFU, Clock);Sharding turns one contended lock into N less-contended locks. The cost is that each shard is now a smaller cache with less global knowledge.
All routing should go through ShardSelector:
let selector = ShardSelector::randomized(16);
let shard = selector.shard_for_key(&key);
Routing requirements:
[1, MAX_SHARDS].Use ShardSelector::randomized unless reproducibility is required. If using
ShardSelector::new(shards, seed), treat seed as secret when keys are
user-controlled.
Two capacity models are possible:
| Model | Behaviour | Pros | Cons |
|---|---|---|---|
| Per-shard capacity | total capacity split across shards | simple, one lock per op | hit rate fragmentation |
| Global capacity | one shared capacity budget | better utilization | cross-shard locking or global victim selection |
The primitives today mostly follow per-shard local state with global gauges:
each shard owns its data; aggregate len is tracked separately where needed.
This keeps operations single-lock. It also means a full shard can evict even if
another shard has spare room.
That is acceptable for stores and metadata primitives. For a full cache policy, it is a hit-rate trade-off and must be documented at the policy level.
Current sharded operations acquire at most one shard lock. This is the most important invariant:
Any future operation that touches two shards must define an ordering rule, for example “lock lower shard index first.” Avoid two-shard operations unless the hit-rate improvement justifies the concurrency risk.
ShardedSlotIdShardedSlotArena<T> cannot use a plain SlotId. A slot id must identify both
the shard and the local slot:
ShardedSlotId = (shard_index, local_slot_id)
This is why sharding lives at the data-structure layer instead of being hidden behind a generic wrapper. Once a policy stores handles, the handle type is part of the policy’s metadata layout.
Sharded types should expose aggregate metrics but record locally when possible. The rule:
len need either an atomic aggregate or a shard scan.This matches the metrics design: observability must not dominate the hot path.
ShardedCache<C>A generic sharded cache wrapper would look roughly like:
pub struct ShardedCache<C, K> {
shards: Vec<RwLock<C>>,
selector: ShardSelector,
capacity_per_shard: usize,
_key: PhantomData<K>,
}
Open questions:
C have to be constructible by CacheFactory, or does the builder own
all construction?DynCache integrate: DynCache::Sharded(Box<...>) or a sibling
DynShardedCache?The conservative first version should use per-shard capacity and one-lock operations. Global victim selection should wait for benchmark evidence.
p).Sharding is a concurrency optimization, not a policy upgrade.