CacheKit Docs

High-performance cache policies and supporting data structures.

View the Project on GitHub OxidizeLabs/cachekit

Hashing and Key Identity

Status: design rationale for hasher choices, key interning, and hash-based routing. Companion to concurrency.md, sharding.md, and the security notes in store/data-structure modules.

cachekit uses hashing in three different roles:

Those roles have different threat models. Some code paths choose FxHash for speed on trusted keys; others default to RandomState or keyed SipHash because untrusted keys can create HashDoS or single-shard contention. This document explains those choices and when callers should override them.

The Decision Matrix

Component Default hasher Why Caller override?
HashMapStore RandomState public store API, safer default yes, with_hasher
ClockRing RandomState can be keyed by user input yes, with explicit trust acknowledgement
KeyInterner FxBuildHasher hot internal mapping, trusted-key bias yes, with_hasher
WeightStore FxHashMap speed, large-value target no generic hasher today
Policy internals mostly FxHashMap hot metadata paths generally no
ShardSelector keyed SipHash-1-3 routing must resist shard pinning seed or randomized constructor

The rule: default to DoS-resistant hashing at public boundaries; use faster hashing inside policy metadata when keys are trusted or already admitted.

RandomState: Safe Public Default

HashMapStore and ClockRing default to std::collections::hash_map::RandomState. This is the right public default because callers often pass keys derived from request paths, tenant ids, URLs, or filenames. Randomized hashing prevents an attacker from precomputing many keys that collide in one bucket.

The cost is per-hash overhead. For workloads with fully trusted keys (for example, dense integer ids generated by the process), callers can use with_hasher to opt into a faster hasher. That opt-in is intentionally explicit: the call site documents the threat-model decision.

ClockRing goes further by using a KeysAreTrusted acknowledgement for faster non-randomized hashers. The extra marker makes the security trade visible in review rather than hidden in a type alias.

FxHash: Hot Internal Default

Many policy internals use rustc_hash::FxHashMap:

FxHash is fast and deterministic. It is also non-cryptographic and not HashDoS-resistant. The intended use is trusted, already-admitted keys where the hash map is not directly exposed as an unbounded public endpoint.

The sharp edge is WeightStore: its target use case (variable-size objects like images, documents, blobs) often has user-derived keys. Its module docs call this out directly: pre-hash keys with a keyed hash or use HashMapStore if the key source is adversarial.

KeyInterner: Identity Compression, Not Security

KeyInterner maps external keys to compact u64 handles:

index: HashMap<K, u64, S>     keys: Vec<K>
"user:123" -> 0               keys[0] = "user:123"

The design goals:

Handles are not capability tokens. They are sequential integers. A handle from one interner can silently resolve to a different key in another interner, and handles are reused after clear. Callers that store handles externally must pair them with generation() and reject stale generations.

Security implications:

ShardSelector: Hashing for Routing

Shard routing has a different failure mode than lookup maps. A lookup hash collision slows one map; a routing collision pins the whole workload to one shard and defeats concurrency.

ShardSelector therefore uses keyed SipHash-1-3:

The selector reduces hash output to [0, shards) using fast range reduction rather than %, keeping distribution unbiased and cheap. The shard count is clamped to [1, MAX_SHARDS] to prevent user-controlled configs from allocating an unbounded number of locks or vectors.

Custom Hasher Rules

When adding a hasher parameter to a public type:

  1. Default to RandomState unless the type is clearly internal-only.
  2. Expose with_hasher and try_with_hasher if callers have legitimate trusted-key fast paths.
  3. Document the threat model at the constructor, not only at module level.
  4. Never hide a non-randomized hasher behind a harmless-sounding new.
  5. If the hasher affects shard routing, prefer ShardSelector over ad hoc hashing so the keyed-routing contract stays centralized.

When using FxHashMap internally:

  1. Keep it behind the policy or data-structure boundary.
  2. Do not expose arbitrary insertions from untrusted users without a separate capacity/admission guard.
  3. Mention the assumption in the module’s security notes if keys may be user controlled.

Serialization and Hash Seeds

Do not serialize hash seeds or hasher state unless the type is explicitly a deterministic routing artifact. ShardSelector::new(shards, seed) is the one place where reproducible routing is part of the public contract. RandomState and policy-internal hash maps should be reconstructed on deserialization.

Serializing raw hash-map order is also wrong. Hash-map iteration order changes with seeds and implementation details; serialized cache state should use stable semantic fields (keys, values, policy order) rather than map buckets.

Future Direction: Hasher Audit

The codebase intentionally mixes RandomState, FxHashMap, and SipHash. That mix is valid only while every use site has a documented threat model. A useful future hardening pass:

See Also