High-performance cache policies and supporting data structures.
Status: design rationale for hasher choices, key interning, and hash-based routing. Companion to
concurrency.md,sharding.md, and the security notes in store/data-structure modules.
cachekit uses hashing in three different roles:
HashMapStore, policy maps, ghost indexes).KeyInterner).ShardSelector).Those roles have different threat models. Some code paths choose FxHash for
speed on trusted keys; others default to RandomState or keyed SipHash because
untrusted keys can create HashDoS or single-shard contention. This document
explains those choices and when callers should override them.
| Component | Default hasher | Why | Caller override? |
|---|---|---|---|
HashMapStore |
RandomState |
public store API, safer default | yes, with_hasher |
ClockRing |
RandomState |
can be keyed by user input | yes, with explicit trust acknowledgement |
KeyInterner |
FxBuildHasher |
hot internal mapping, trusted-key bias | yes, with_hasher |
WeightStore |
FxHashMap |
speed, large-value target | no generic hasher today |
| Policy internals | mostly FxHashMap |
hot metadata paths | generally no |
ShardSelector |
keyed SipHash-1-3 | routing must resist shard pinning | seed or randomized constructor |
The rule: default to DoS-resistant hashing at public boundaries; use faster hashing inside policy metadata when keys are trusted or already admitted.
RandomState: Safe Public DefaultHashMapStore and ClockRing default to
std::collections::hash_map::RandomState. This is the right public default
because callers often pass keys derived from request paths, tenant ids, URLs,
or filenames. Randomized hashing prevents an attacker from precomputing many
keys that collide in one bucket.
The cost is per-hash overhead. For workloads with fully trusted keys (for
example, dense integer ids generated by the process), callers can use
with_hasher to opt into a faster hasher. That opt-in is intentionally explicit:
the call site documents the threat-model decision.
ClockRing goes further by using a KeysAreTrusted acknowledgement for faster
non-randomized hashers. The extra marker makes the security trade visible in
review rather than hidden in a type alias.
FxHash: Hot Internal DefaultMany policy internals use rustc_hash::FxHashMap:
WeightStore’s index.KeyInterner’s default index.FxHash is fast and deterministic. It is also non-cryptographic and not
HashDoS-resistant. The intended use is trusted, already-admitted keys where the
hash map is not directly exposed as an unbounded public endpoint.
The sharp edge is WeightStore: its target use case (variable-size objects
like images, documents, blobs) often has user-derived keys. Its module docs call
this out directly: pre-hash keys with a keyed hash or use HashMapStore if the
key source is adversarial.
KeyInterner: Identity Compression, Not SecurityKeyInterner maps external keys to compact u64 handles:
index: HashMap<K, u64, S> keys: Vec<K>
"user:123" -> 0 keys[0] = "user:123"
The design goals:
Handles are not capability tokens. They are sequential integers. A handle
from one interner can silently resolve to a different key in another interner,
and handles are reused after clear. Callers that store handles externally
must pair them with generation() and reject stale generations.
Security implications:
FxBuildHasher is for trusted input.with_hasher / with_capacity_and_hasher with RandomState when keys
are derived from untrusted input.KeyInterner is append-only until clear, so unique-key attacks can drive
memory growth. Use try_intern and your own admission bound for untrusted
keys.Debug intentionally omits interned keys to avoid leaking URLs, user ids, or
auth material into logs.ShardSelector: Hashing for RoutingShard routing has a different failure mode than lookup maps. A lookup hash collision slows one map; a routing collision pins the whole workload to one shard and defeats concurrency.
ShardSelector therefore uses keyed SipHash-1-3:
ShardSelector::randomized(shards) draws key material from RandomState.
Use this for normal production sharding.ShardSelector::new(shards, seed) is deterministic and reproducible. Treat
seed as secret key material if adversaries can influence keys.The selector reduces hash output to [0, shards) using fast range reduction
rather than %, keeping distribution unbiased and cheap. The shard count is
clamped to [1, MAX_SHARDS] to prevent user-controlled configs from allocating
an unbounded number of locks or vectors.
When adding a hasher parameter to a public type:
RandomState unless the type is clearly internal-only.with_hasher and try_with_hasher if callers have legitimate
trusted-key fast paths.new.ShardSelector over ad hoc
hashing so the keyed-routing contract stays centralized.When using FxHashMap internally:
Do not serialize hash seeds or hasher state unless the type is explicitly a
deterministic routing artifact. ShardSelector::new(shards, seed) is the one
place where reproducible routing is part of the public contract. RandomState
and policy-internal hash maps should be reconstructed on deserialization.
Serializing raw hash-map order is also wrong. Hash-map iteration order changes with seeds and implementation details; serialized cache state should use stable semantic fields (keys, values, policy order) rather than map buckets.
The codebase intentionally mixes RandomState, FxHashMap, and SipHash. That
mix is valid only while every use site has a documented threat model. A useful
future hardening pass:
KeysAreTrusted-style acknowledgement to any public non-randomized path.WeightStore HashDoS caveatsrc/ds/interner.rssrc/ds/shard.rssrc/store/hashmap.rssrc/ds/clock_ring.rs