CacheKit Docs

High-performance cache policies and supporting data structures.

View the Project on GitHub OxidizeLabs/cachekit

Design Overview

Status: top-level design overview for cachekit. Indexes the architectural layers and the substantive design decisions, with the original numbered principles preserved as Appendix A. Every subsystem (concurrency, storage, metrics, TTL, …) has its own companion design doc; this doc names what they collectively decide.

For a worked example that applies every concern below to a single feature, see the TTL design doc. For interface conventions, the Rust API Guidelines checklist is the companion reference; both module-level rustdoc and the design docs themselves follow the doc style guide.

Architecture at a glance

+------------------------------------------------------------------+
|  Integration: CacheBuilder, DynCache, DynExpiringCache           |
+------------------------------------------------------------------+
|  Capability traits (opt-in):                                     |
|   RecencyTracking, FrequencyTracking, HistoryTracking,           |
|   ExpiringCache, EvictingCache, VictimInspectable                |
+------------------------------------------------------------------+
|  Policy kernel: Cache<K, V>                                      |
|   18 implemented policies (LRU, LFU, S3-FIFO, ARC, CAR, ...)     |
+------------------------------------------------------------------+
|  Storage: StoreCore / StoreMut (+ concurrent peers)              |
|   HashMapStore, SlabStore, HandleStore, WeightStore (sibling)    |
+------------------------------------------------------------------+
        ^                 ^                 ^                ^
   +----+----+      +-----+-----+     +-----+-----+    +----+-----+
   | Metrics |      |    TTL    |     |Concurrency|    | Hashing  |
   |(2-layer)|      | Expiring  |     |  RwLock + |    |    +     |
   |         |      | decorator |     |  sharding |    | Sharding |
   +---------+      +-----------+     +-----------+    +----------+
             (cross-cutting concerns, mostly feature-gated)

Read bottom-up: storage owns layout and ownership; policies layer eviction order on top of that; capability traits expose optional signals when the policy has them; the integration layer turns runtime configuration into one concrete cache type. Cross-cutting concerns (metrics, TTL, concurrency, hashing, sharding) are orthogonal to the layer stack — each has its own design doc and attaches to the layer where its choices live.

Layered map

Storage layer

Stores own keys and values. They expose StoreCore/StoreMut for sequential access and ConcurrentStoreRead/ConcurrentStore for concurrent access, signal capacity refusal with StoreFull, and ship one always-on counter struct (StoreMetrics).

Four concrete stores ship: HashMapStore (default), SlabStore (arena with stable EntryId handles), HandleStore (interner-keyed), and WeightStore (byte-aware sibling that deliberately diverges from StoreMut’s contract — see D14). See Storage layer.

Policy layer

Policies decide eviction order. Every policy implements the object-safe Cache<K, V> kernel; 18 ship today, organised in the catalog at docs/policies/. The kernel deliberately separates peek (&self, side-effect-free) from get (&mut self, policy-updating) so concurrent wrappers can take a read lock on peek paths (see D3). See Cache trait hierarchy.

Capability layer

Capabilities are opt-in extension traits that expose signals which some but not all policies own: RecencyTracking, FrequencyTracking, HistoryTracking, ExpiringCache, EvictingCache, VictimInspectable. Each extends Cache<K, V> — generic code requires the capability bound rather than feature-flag- gating methods on the kernel (D2). See Cache trait hierarchy §”Layer 2 — Capability traits”.

Integration layer

CacheBuilder turns a CachePolicy enum + capacity + optional defaults (TTL, hasher) into a DynCache<K, V> (or DynExpiringCache<K, V> when TTL is configured). DynCache dispatches through an internal enum match rather than Box<dyn Cache> — devirtualising the hot path while keeping the user-facing type uniform (D9). See Builder and runtime dispatch.

Cross-cutting concerns

Design decisions

Numbered ADR-style entries. Each entry documents one substantive choice that shaped the public or internal surface; the canonical companion doc carries the full detail.

D1. Policy / storage separation

Context. Combining “what to evict” and “how entries are laid out in memory” into one type couples eviction strategy to storage and prevents policy experimentation.

Decision. Two layers with explicit interfaces: stores own keys and values and expose StoreCore/StoreMut; policies own eviction metadata and consume the store. Capacity refusal is signalled by StoreFull; the policy decides who to evict and retries.

Alternatives considered.

Consequences.

See also. Appendix §7; storage.md, trait-hierarchy.md.

D2. Object-safe Cache<K, V> kernel + opt-in capability traits

Context. Some policies expose signals (recency rank, frequency count, K-distance history) that others don’t. Putting all of them on the kernel forces meaningless defaults and breaks object safety.

Decision. Cache<K, V> carries only what every policy must do (contains / len / capacity / peek / get / insert / remove / clear). Optional signals live in extension traits (RecencyTracking, FrequencyTracking, HistoryTracking, ExpiringCache, EvictingCache, VictimInspectable) which policies implement only when the signal exists in their metadata.

Alternatives considered.

Consequences.

See also. Appendix §7, §13; trait-hierarchy.md.

D3. peek (side-effect-free) vs get (policy-updating) split

Context. A read that updates LRU recency or LFU frequency cannot be served from a shared read lock — it mutates policy state. A single read method forces every lookup through a write lock.

Decision. Two distinct read methods on Cache<K, V>:

Alternatives considered.

Consequences.

See also. Appendix §3; trait-hierarchy.md §”peek vs get”, concurrency.md.

D4. Sequential &V + concurrent owned values (two trait families)

Context. Borrowed references cannot outlive the lock guard they were extracted from. A concurrent cache cannot expose Cache::get’s Option<&V> without holding the lock across the borrow, which serialises readers.

Decision. Two parallel trait families. Sequential (Cache<K, V>, StoreCore, StoreMut) returns &V / owned V. Concurrent stores (ConcurrentStoreRead, ConcurrentStore) return Arc<V> and take &self. Concurrent cache wrappers also take &self, but their concrete APIs may return Arc<V> (LRU) or cloned V (FIFO, S3-FIFO), depending on the underlying policy storage. Concurrent cache wrappers do not implement Cache<K, V>; they expose their own concrete API.

Alternatives considered.

Consequences.

See also. Appendix §3; concurrency.md, storage.md §”Layer 2”.

D5. parking_lot::RwLock for concurrent wrappers

Context. The choice of lock primitive shapes uncontended cost, fairness, and writer-starvation behaviour.

Decision. Every Concurrent* wrapper uses parking_lot::RwLock. Gated behind the concurrency Cargo feature.

Alternatives considered.

Consequences.

See also. Appendix §3; concurrency.md §”Lock primitive choice”.

D6. Slab / arena / intrusive layout over pointer chasing

Context. A cache implemented over Box-allocated nodes with linked-list pointers pays cache misses on every traversal step, fights the borrow checker on every mutation, and fragments memory.

Decision. Build policies over reusable primitives in src/ds/: SlotArena for stable Handle-indexed entries, IntrusiveList for recency lists threading through arena slots, ClockRing for contiguous reference-bit storage, FrequencyBuckets for LFU.

Alternatives considered.

Consequences.

See also. Appendix §2; src/ds/, docs/policy-ds/.

D7. No per-operation allocation in hot paths

Context. Allocation on get / insert dominates flamegraphs and makes tail latency unpredictable.

Decision. Pre-size pools, slabs, and intrusive lists at construction time. Policy and store hot paths avoid Box::new and Vec::push-that-grows; allocation is acceptable on new / with_capacity paths and caller-driven clear / replace paths. The integration layer has narrow exceptions where a policy stores shared values internally: DynCache wraps inserted values in Arc<V> for the LRU, LFU, and Heap-LFU variants.

Alternatives considered.

Consequences.

See also. Appendix §4; src/ds/slot_arena.rs, src/store/slab.rs.

D8. Predictable eviction via direct handles / indices

Context. Eviction is the slow path on a full cache. A policy that scans to find a victim trades constant-time inserts for linear-time evictions, which dominates tail latency.

Decision. Policies targeting interactive hot paths maintain direct indices or Handles (intrusive list head / tail, ClockRing hand, lazy-heap root) to the eviction candidate. Eviction cost should be comparable to lookup cost, not orders of magnitude higher. NruCache is the documented exception: it is intentionally simple and can scan O(n) on eviction; callers that need O(1) eviction should use Clock, LRU, or another direct-victim policy.

Alternatives considered.

Consequences.

See also. Appendix §5; src/store/handle.rs, src/ds/lazy_heap.rs, src/traits.rs (EvictingCache).

D9. Enum dispatch (DynCache) over Box<dyn Cache> for runtime selection

Context. Users want to pick a policy at runtime from configuration without writing one call site per policy. Cache<K, V> is object-safe, but Box<dyn Cache<K, V>> pays a vtable indirection on every method call.

Decision. CacheBuilder returns DynCache<K, V>, a wrapper around a closed CacheInner<K, V> enum with one variant per builder-wired policy. Method calls dispatch through match, which the optimiser devirtualises when the variant is invariant across an inner loop. CAR is implemented as CarCore but is not yet exposed through CachePolicy / DynCache.

Alternatives considered.

Consequences.

See also. Appendix §8, §13; builder-and-dyn-dispatch.md.

D10. Per-policy feature flags + curated default set

Context. Users who only need one or two policies shouldn’t pay compile time and binary size for the other 16.

Decision. Every policy is behind its own policy-* Cargo feature. policy-all enables every policy. The default feature set is a curated subset (policy-s3-fifo, policy-lru, policy-fast-lru, policy-lru-k, policy-clock) chosen to cover the most-recommended workloads. Optional capabilities (metrics, concurrency, serde, ttl) are gated the same way.

Alternatives considered.

Consequences.

See also. Appendix §13; Cargo.toml, builder-and-dyn-dispatch.md §”Send + Sync is conditional”.

D11. Two-layer metrics

Context. Universal counters (hit rate, eviction count) are part of the store trait surface and provide the observability baseline every implementation can report. Per-policy detailed metrics (Clock hand advances, ARC ghost hits, LRU recency-rank reads) are expensive to plumb and unnecessary for many consumers.

Decision. Two parallel metrics surfaces:

Alternatives considered.

Consequences.

See also. Appendix §6; metrics.md.

D12. Three-tier error model

Context. Different failures need different responses. A programmer who passes k = 0 to LRU-K should learn at the call site, not handle a Result. A user-supplied config file with small_ratio = 2.0 should be routed through a fallible constructor such as S3FifoCache::try_with_ratios. The current CacheBuilder::build path treats CachePolicy values as already validated program input and panics on invalid parameters; a future try_build would be the tier-2 surface for external configuration. An internal invariant violation should be diagnosable but not fatal in test runs.

Decision. Three tiers:

  1. Programming error → panic (assert!, panic!, debug_assert!).
  2. User-supplied input / expected resource failureResult<_, ErrorType>, using narrow error types such as ConfigError, StoreFull, LazyMinHeapError, or std::collections::TryReserveError passthrough.
  3. Invariant violationcheck_invariants methods returning Result<(), InvariantError> under debug_assertions / test.

Alternatives considered.

Consequences.

See also. Appendix §12; error-model.md.

D13. TTL as decorator (Expiring<C>) rather than per-policy embedded

Context. TTL is orthogonal to eviction policy. Embedding expires_at into every policy’s node would add 8 bytes per entry to every cache regardless of whether the user wants TTL.

Decision. Phase 1 ships TTL as a generic decorator Expiring<C, K, V, T> that wraps any C: Cache<K, V> with a shared ExpirationIndex (lazy min-heap of deadlines). The builder returns DynExpiringCache<K, V> when .with_default_ttl(...) is configured; otherwise it returns the plain DynCache<K, V>. Phase 2 would profile and selectively embed expires_at into hot policies (LRU, S3-FIFO) only if benchmarks justify it.

Alternatives considered.

Consequences.

See also. ttl.md.

D14. WeightStore as sibling of StoreCore / StoreMut, not subtype

Context. Byte-budgeted caches (images, blobs, variable-size values) need a second budget alongside entry count. Updates can legitimately fail when a larger replacement value exceeds the budget, which StoreMut::try_insert’s “updates always succeed” contract cannot express.

Decision. WeightStore<K, V, F> is a sibling of the store trait family, not a subtype. It returns Arc<V> (even single-threaded) and enforces a dual entry-count + weight cap. ConcurrentWeightStore does implement ConcurrentStoreRead / ConcurrentStore because those traits already use Arc<V> returns.

Alternatives considered.

Consequences.

See also. weighted-eviction.md.

D15. Mixed hasher defaults (RandomState at public boundaries, FxHash internally)

Context. A single hasher choice forces a single trade-off: HashDoS-resistance on public APIs (slower) or speed on internal hot paths (vulnerable when keys are user-controlled).

Decision. Public stores (HashMapStore, ClockRing) default to RandomState. Internal policy maps and WeightStore use FxHashMap. Shard routing (ShardSelector) uses keyed SipHash-1-3. Each public constructor that takes a non-randomised hasher requires explicit caller opt-in (with_hasher, KeysAreTrusted).

Alternatives considered.

Consequences.

See also. hashing.md.

Summary

The decisions above cluster around five concerns that consistently dominate benchmark and review feedback. The themes are not novel; the value in stating them here is naming where each was paid for in concrete code.

The trade against ergonomic Rust idioms — enum dispatch over Box<dyn Cache> (D9), arena handles over Box<Node> (D6), and owned values at concurrent boundaries (D4) — is deliberate. The decisions above name where that cost was paid and where it was refused.

Appendix A: Principles

The original 13 principles that shape cachekit. The decisions above are the implementation of these principles as concrete choices; this appendix preserves the principle statements and their stable section numbers so sibling design docs can cite design.md §N.

1. Workload First, Policy Second

Cache policy only matters relative to workload. Identify access patterns (hot-set, scan-heavy, mixed), measure reuse distance and read/write ratio, and choose accordingly: LRU / Clock for temporal locality, LRU-K / 2Q / SLRU for scan filtering, ARC / CAR for adaptive recency / frequency balance, S3-FIFO / Heap-LFU for strong general-purpose defaults under scans. All 18 policies ship as concrete types (CAR is the one not yet wired into DynCache — see the CAR builder gap). See docs/policies/ for the implemented catalog and docs/policies/roadmap/ for planned policies (LIRS, TinyLFU, SIEVE, GDS / GDSF, …). Design for the workload you expect — not the average of all workloads.

Realised by: the policy catalog itself; see also choosing-a-policy.md.

2. Memory Layout Matters More Than Algorithms

Memory layout often dominates policy. Prefer contiguous storage (Vec, slabs, arenas) and index-based indirection over pointer chasing; avoid excessive Box, Arc, and linked lists with heap-allocated nodes; store metadata in tightly packed structs; separate hot metadata from cold payloads. Cachekit realises this through reusable building blocks under src/ds/: SlotArena hands out stable Handles, IntrusiveList threads recency lists through arena slots without per-node allocation, and ClockRing keeps Clock-style state in a single contiguous array. See docs/policy-ds/ for the full primitive catalog. Cache misses caused by your own data structure are as bad as upstream misses.

Realised by: D6 (slab / arena / intrusive layout).

3. Concurrency Strategy Is Core Design, Not a Wrapper

Locking strategy shapes everything. Options: global lock (simple, dies under contention), sharded caches (per-shard lock), and lock-free (hard in Rust, only worth it if contention dominates). Cachekit ships the global-lock option as Concurrent* wrappers (parking_lot::RwLock around the single-threaded core, gated by the concurrency feature) and partial sharding at the data-structure / store layer (ShardedHashMapStore, ShardedSlotArena, ShardedFrequencyBuckets). A generic Sharded<C: Cache<K, V>> wrapping any policy is not yet shipped; RCU-style read paths and per-thread caches with periodic merge are roadmap. Avoid Arc<Mutex<…>> in hot paths.

Realised by: D4 (&V vs Arc<V> trait families), D5 (parking_lot::RwLock).

4. Avoid Per-Operation Allocation

Allocations kill throughput. Pre-allocate entry pools (see SlotArena and the free-list discipline in src/store/slab.rs) and node arrays (intrusive lists thread through arena slots rather than allocating per-node). Reuse free lists; size slabs once at construction. Use Vec with explicit capacity and rustc-hash’s FxHashMap for cheap key hashing in hot-path lookups (see hashing.md for the threat model). Avoid creating new Arc / String / Vec per lookup and hidden clones of K on the eviction path. If malloc shows up in your flamegraph, your cache is already slow.

Realised by: D7 (no per-operation allocation).

5. Eviction Must Be Predictable and Cheap

Eviction is the critical slow path; O(1) is the goal for hot-path policies. Maintain direct indices / Handles to eviction candidates (src/store/handle.rs; the EvictingCache capability trait carries evict_one), eviction lists or clock hands (intrusive list head, ClockRing hand), and lazy heaps where amortised O(log n) is acceptable (LazyMinHeap; used by Heap-LFU and TTL). NruCache deliberately trades that guarantee for simplicity and has O(n) worst-case eviction. Be careful with background eviction threads and unbounded lazy cleanup; bound it with rebuild thresholds (e.g. LazyMinHeap::with_auto_rebuild). For latency-sensitive caches, eviction cost must be comparable to lookup cost, not orders of magnitude higher.

Realised by: D8 (O(1) eviction via direct handles).

6. Metrics Are Not Optional

You cannot tune what you do not measure. Track at least hit / miss rate, eviction count and reason (capacity vs expiration), and insert / update rate. Cachekit exposes these in two layers: an unconditional store-layer baseline (StoreMetrics, seven counters that ship in every build) and a feature-gated policy-layer hierarchy (per-policy *Metrics recorders, *Snapshot types, Prometheus exporter) behind the metrics Cargo feature. Lightweight counters on the hot path, detailed metrics behind a feature flag — see Metrics design for the recorder / snapshot / exporter split. Metrics should guide design decisions, not justify them afterward.

Realised by: D11 (two-layer metrics).

7. Separate Policy From Storage

Design in layers: storage layer (how entries live in memory, allocation, layout, indexing — src/store/; design rationale in storage.md); policy layer (LRU, FIFO, LFU, LRU-K, 2Q, ARC, CAR, Clock, Clock-PRO, S3-FIFO, … — manipulates metadata and ordering only, src/policy/); capability layer (opt-in extension traits — RecencyTracking, FrequencyTracking, HistoryTracking, ExpiringCache — implemented when the signal exists, which is how Expiring<C> composes over any policy without touching policy code); integration layer (ties application objects to cache entries via CacheBuilder and the DynCache runtime dispatcher).

Realised by: D1 (policy / storage separation), D2 (object-safe kernel + capability traits), D9 (DynCache enum dispatch).

8. Beware of “Nice” Rust APIs in Hot Paths

Ergonomics often cost performance. Avoid in critical loops: heavy generics causing code bloat across many monomorphisations, trait objects for hot dispatch, closures capturing state, and iterator chains where a plain for loop would do. Prefer explicit loops, concrete types and monomorphised fast paths, and enum dispatch over Box<dyn Trait> when polymorphism is needed at the edges — this is exactly the trade DynCache makes (D9). You can wrap fast internals in nice APIs at the edges.

Realised by: D9 (enum dispatch over Box<dyn Cache>).

9. Scans Are the Enemy of Caches

Large sequential reads destroy LRU-style caches. Solutions: scan-resistant policies (LRU-K, 2Q, SLRU, ARC, CAR, Clock-PRO, S3-FIFO, Heap-LFU — all implemented today), explicit “scan mode” hints from the caller or workload layer, and bypass cache for known one-shot reads. If you ignore scans, your cache will look great in microbenchmarks and terrible in production.

Realised by: the scan-resistant policy catalog itself; benchmarked under the workloads in benchmarks/workloads.md.

10. Benchmark Like a System, Not a Library

Do not rely on uniform-random key benchmarks. Use Zipfian distributions, mixed read / write workloads, scan + point lookup mixtures, and time-varying hot sets. Measure throughput, tail latency, memory overhead, and eviction cost. Cachekit’s benchmark harness covers these dimensions — see benchmarking.md for the design (benchmark layers, monomorphic policy registry, JSON artifact schema), docs/benchmarks/workloads.md for the workload catalog, and the runners under benches/. A cache that is 5 % faster on uniform-random keys but 50 % worse under scans is a bad cache.

Realised by: the benchmark harness; design captured in benchmarking.md.

11. Rust Hot-Path Hazards Beyond Allocation

Arc is expensive in hot paths; minimise it and lift Arc::clone out of inner loops. Fight borrow-checker-induced indirection with index-based access (Handles, slot indices) instead of &mut chains; use interior mutability only where unavoidable, preferring Cell<T> over RefCell<T> when T: Copy and atomics when the value lives behind a shared reference. Beware hidden clones (particularly of keys on the eviction path), trait-object dispatch on read / insert (a specific instance of §8), and over-generic designs whose monomorphisation cost dwarfs their benefit. Rust can match C on hot paths, but only when systems-level discipline survives contact with the type system.

Realised by: D6 (slab / arena / intrusive), D9 (enum dispatch).

12. Design for Failure Modes

Ask: what happens under memory pressure, when eviction cannot keep up, and under pathological access patterns? Add backpressure or rejection when full (StoreFull), bypass modes, and emergency eviction strategies. Cross-thread back-pressure semantics across a layered cache stack remain a roadmap topic; today the answer is StoreFull propagation up to the caller. A cache that collapses under stress is worse than no cache.

Realised by: D12 (three-tier error model); see also error-model.md.

13. Compile-Time and Runtime Composition

Cachekit’s externally visible surface is shaped by two composition mechanisms that together let users pay only for what they use. Per-policy feature flags: every policy is behind a Cargo feature (policy-lru, policy-s3-fifo, …), with policy-all for “everything” and a small default of policy-s3-fifo, policy-lru, policy-fast-lru, policy-lru-k, policy-clock; optional capabilities are gated the same way (metrics, concurrency, serde, ttl). One sharp edge worth naming inline: the default feature set includes policy-fast-lru, which is !Send + !Sync, so default-feature DynCache is also !Send + !Sync. Capability traits + runtime dispatch: extension traits keep optional behaviour off the core Cache<K, V> trait; the builder returns a DynCache<K, V> that dispatches via an internal enum match rather than Box<dyn Cache>. When the builder is configured with .with_default_ttl(...), it returns a sibling DynExpiringCache<K, V> that threads the expiry check around each variant’s Cache call — a worked example of capability composition.

Realised by: D2 (capability traits), D9 (enum dispatch), D10 (per-policy feature flags), D13 (TTL as decorator).

See Also

Design docs:

Reference docs: