High-performance cache policies and supporting data structures.
Status: design rationale for cachekit’s panic-vs-
Resultdiscipline, the four error types in the public API, and the debug-only invariant checks. Companion todesign.mdandsrc/error.rs.
cachekit treats error handling as a design question, not an ergonomics question. The rule is:
Panic on programming errors. Return
Resultfor user-supplied input. Reserve invariant checks fordebug_assertions.
This document explains where each side of that rule applies, why the four shipped error types each exist as separate types, and what discipline a new error type needs to follow.
cachekit divides every failure mode into one of three tiers, each with its own response:
| Tier | Cause | Response | Example |
|---|---|---|---|
| 1. Programming error | Bug in the caller’s code, statically detectable in principle | Panic | LruK::with_k(10, 0) (k = 0) |
| 2. User-supplied input | Configuration arriving from outside the program | Result<_, ErrorType> |
S3FifoCache::try_with_ratios(_, 2.0, _) |
| 3. Invariant violation | Internal data-structure corruption (cannot reach in normal use) | debug_assert + InvariantError (test/debug only) |
pop_front while queue length is zero |
The tiers are not opinions — they map to specific Rust constructs and
runtime behaviours. Mixing them (panicking on tier 2, returning
Result from tier 3) produces APIs that are either ergonomically
heavy or operationally unsafe.
A “programming error” is a precondition violation the caller could
have prevented with a if or a type. cachekit panics in this case
rather than returning Result, because:
Result<_, "you passed 0 for capacity">
for a bug they could have prevented adds friction without
catching anything new.The shipped examples:
CacheBuilder::build panics on capacity == 0, k == 0 for LRU-K,
and probation_frac > 1.0 for 2Q. The validation is centralised in
validate_policy (src/builder.rs).LruCore::new, S3FifoCache::new) panic on
invalid arguments. The fallible counterparts (try_with_ratios,
try_with_capacity) exist for tier 2.assert!(*k > 0, "LruK: k must be greater than 0") in
CacheBuilder::validate_policy is the canonical shape: a clear
message that identifies the parameter and the constraint.The cost is that a panicking call site terminates under the crate’s
default panic = "abort" release profile. This is intentional —
cachekit’s panic = "abort" is documented in the
Cargo.toml release profile, and the rationale
is that a panic in cache code under load is a bug worth surfacing
through the supervisor / restart strategy, not unwinding.
Result for user-supplied inputWhen the failure mode is “user passes us configuration we don’t
recognise as valid,” return Result. The shipped error types each
cover a specific surface:
ConfigError — invalid configuration parameterspub struct ConfigError(String);
Defined in src/error.rs. Returned by fallible
constructors that accept user-tunable knobs:
S3FifoCache::try_with_ratios(capacity, small_ratio, ghost_ratio)try_build variants on CacheBuilderThe contained String carries a human-readable description of which
parameter failed validation. By convention messages are lowercase,
unpunctuated, and identify the parameter: "capacity must be greater
than zero", "small_ratio must be in 0.0..=1.0".
ConfigError’s presence on a constructor signals that the parameter
set can legitimately come from outside the program — a config file,
a CLI flag, an HTTP request — and the caller should handle invalid
input gracefully rather than crashing the process.
StoreFull — capacity-bound failurepub struct StoreFull;
Zero-sized type defined in
src/store/traits.rs. Returned by
StoreMut::try_insert and ConcurrentStore::try_insert when the
store is at capacity and the insert would exceed it. The contract:
StoreFull is not a panic. A full store under capacity
pressure is the expected outcome of try_insert. The caller —
typically a policy layered on top — must respond by evicting and
retrying.StoreFull is the
signal that says “you, policy, decide who to evict.” This is the
core of the policy/storage separation rule from
design.md §7.StoreFull adds nothing useful by retaining it.StoreFull is not in src/error.rs despite being an error
type. It lives alongside the trait that returns it because the
two are co-evolving and the surface is small enough that the
co-location aids readability.
LazyMinHeapError — ds-layer fallible constructionpub enum LazyMinHeapError {
CapacityTooLarge { requested: usize, max: usize },
Allocation(std::collections::TryReserveError),
}
Defined in src/ds/lazy_heap.rs.
Returned by LazyMinHeap::try_with_capacity when:
MAX_CAPACITY bound,
orThe enum exposes both failure modes distinctly because a caller may
want to retry on Allocation (transient memory pressure) but not on
CapacityTooLarge (logic bug or genuinely-too-big request that
won’t recover).
The pattern generalises: a future “fallible-construction” error type
on any ds primitive that pre-allocates should distinguish “you
asked for too much” from “we couldn’t get what you asked for.”
std::collections::TryReserveError — passthroughSome try_new constructors (HashMapStore::try_new,
ConcurrentHashMapStore::try_new) return the standard
TryReserveError directly rather than wrapping it. The reason: the
only failure mode is allocator pressure, and TryReserveError
already says exactly that. Wrapping it would add a layer for no
information.
The shape is: if cachekit has a distinct failure mode of its own
(CapacityTooLarge, StoreFull), wrap or define a new type; if the
only failure mode is “the allocator said no,” return the standard
type and let the caller’s error-handling stack absorb it.
pub struct InvariantError(String);
Defined in src/error.rs. Returned by
check_invariants methods on internal data structures:
impl<K, V> S3FifoCache<K, V> {
#[cfg(any(debug_assertions, test))]
pub fn check_invariants(&self) -> Result<(), InvariantError> {
if self.small.len() + self.main.len() != self.map.len() {
return Err(InvariantError::new("queue length mismatch"));
}
// …
Ok(())
}
}
Three properties define the tier:
check_invariants is called from tests,
fuzz harnesses, and debug_assertions paths. It is never called
from normal insert / get / evict.Result, not panics. Counter-intuitive given the
tier-1 rule. The reason: check_invariants is called by
diagnostic code that wants to report the violation (in a test
failure message, a fuzz reproducer, a debug-mode assertion’s
output) rather than crash. Returning Result lets the caller
format the failure; if they want to panic, they unwrap().InvariantError carries the same String-message shape as
ConfigError, by the same convention: lowercase, unpunctuated,
identifying the specific invariant.
A single CachekitError enum could in principle subsume all four.
cachekit doesn’t ship one, deliberately. Three reasons:
StoreFull
means “evict and retry”; ConfigError means “fix your config”;
LazyMinHeapError::Allocation means “back off and retry”;
InvariantError means “we have a bug, capture state.” A unified
enum forces every caller to either match exhaustively (most of
which can’t happen at their call site) or use a catch-all that
loses information.StoreFull lives in
src/store/traits.rs; LazyMinHeapError lives in
src/ds/lazy_heap.rs; ConfigError and InvariantError live
in src/error.rs. Co-location helps maintenance — adding a new
failure mode to one surface doesn’t ripple through the others.The cost is that downstream code wanting to catch “any cachekit error” has to enumerate all four. The mitigation is that no realistic downstream code wants that — each call site touches one surface at a time and handles that surface’s error.
The crate’s release profile sets panic = "abort":
[profile.release]
panic = "abort"
Two implications worth naming:
ConcurrentWeightStore (see
weighted-eviction.md) kills the
process; a parking_lot lock-poisoning concern is moot under
panic = "abort" because the process is gone before any
observer can read poisoned state.panic = "unwind" get unwind safety up
to the documented invariants. The
weighted-eviction.md clear-ordering
rule and the
concurrency.md panic-safety
notes apply only to this mode.The interplay matters for error model design: under abort, tier 1
panics are terminal and need to be debugged at development time;
under unwind, they are catchable but should still be treated as
bugs because the cache may be in an unspecified-but-not-corrupt
state.
Result does not coverThree failure modes are deliberately not represented as Result:
try_* constructors. LruCore::new(huge) aborts
on allocator failure. Use try_with_capacity to get a Result
surface (where available).check_invariants or by the policy’s tests.parking_lot::RwLock doesn’t poison,
doesn’t time out by default, and doesn’t return Result. A
contended cache blocks until it can proceed. Callers who need
timeouts wrap the cache themselves with a wider locking
discipline.Checklist for a new failure mode:
assert! / debug_assert! / panic!. No new
type needed.ConfigError
(with a clear message) or pass through TryReserveError.check_invariants method on the affected type
that returns Result<(), InvariantError>.StoreFull in src/store/traits.rs). Types specific to a
primitive live with the primitive (LazyMinHeapError).
Cross-cutting types (ConfigError, InvariantError) live in
src/error.rs.Display and Error. Both are required for
? interop with Box<dyn Error>. The convention is:
impl fmt::Display for MyError { … }
impl std::error::Error for MyError {}
Display writes the message; Error is empty unless the type
wraps another error (then source returns the inner error).
Send + Sync + Clone. All existing error types satisfy this.
The convention is #[derive(Debug, Clone, PartialEq, Eq, Hash)]
for value types and matching impls for enums. Errors that flow
between threads must be Send + Sync; errors that get cloned
into snapshots / test fixtures must be Clone.? and anyhow/thiserrorThe cachekit error types are intentionally plain types, not
thiserror-derived, to avoid forcing a thiserror dependency on
downstream users. They implement std::error::Error directly, so
they work with ?, Box<dyn Error>, and any error-aggregation
crate (including anyhow and thiserror::Error in user code).
A downstream thiserror-derived enum that includes a #[from]
cachekit::ConfigError works. A downstream anyhow::Result<_> that
absorbs cachekit errors via ? works. The choice not to bundle
either crate keeps the error layer dependency-free and gives
downstream the standard From and Display shape they expect.
parking_lot non-poisoning,
atomic check-and-act, lock-acquisition failure modesbuild validation, try_build-deliberately-absent
rationaleStoreFull’s role
and unwind-safety in clearsrc/error.rs — ConfigError,
InvariantErrorsrc/store/traits.rs — StoreFullsrc/ds/lazy_heap.rs —
LazyMinHeapError