CacheKit Docs

High-performance cache policies and supporting data structures.

View the Project on GitHub OxidizeLabs/cachekit

Workloads

Synthetic workload generators for cache benchmarking. These distributions model real-world access patterns to evaluate cache policy effectiveness beyond simple micro-benchmarks.

Located in benches/common/workload.rs.


Implemented Workloads

Uniform

Workload::Uniform

All keys in [0, universe) have equal probability. Baseline distribution that doesn’t favor any caching strategy. Useful for measuring raw overhead.

Models: Random access patterns, hash table lookups with no locality.


Hotset

Workload::Hotset { hot_fraction: 0.1, hot_prob: 0.9 }

Explicit hot/cold split. A fraction of keys (hot_fraction) receives most accesses (hot_prob).

Parameter Description Typical Values
hot_fraction Fraction of keys that are “hot” 0.1 (10%)
hot_prob Probability of accessing a hot key 0.8-0.95

Models: Database tables with popular rows, CDN with viral content.


Scan

Workload::Scan

Sequential access through keys 0, 1, 2, ..., universe-1, then wraps. Tests scan resistance - policies that protect the cache from being flushed by sequential scans.

Models: Table scans, batch processing, backup operations.


Zipfian

Workload::Zipfian { exponent: 1.0 }

Power-law distribution where rank-k item has probability proportional to 1/k^exponent. Models the natural skew in most real-world access patterns.

Parameter Description Typical Values
exponent Skew factor (higher = more skewed) 1.0 (standard Zipf)

Models: Web page popularity, word frequencies, social media engagement.


ScrambledZipfian

Workload::ScrambledZipfian { exponent: 1.0 }

Zipfian distribution with FNV-1a hash applied to keys. YCSB’s default distribution. Prevents hardware prefetching from skewing benchmark results by eliminating sequential locality in hot keys.

Parameter Description Typical Values
exponent Skew factor 1.0

Models: Same as Zipfian but more accurate for benchmarking since it avoids prefetch effects.


Latest

Workload::Latest { exponent: 1.0 }

Recently inserted keys are more likely to be accessed. Samples an offset from the most recent insert using Zipfian distribution, then accesses insert_counter - offset.

Parameter Description Typical Values
exponent How quickly interest falls off 1.0-1.5

Models: Social media feeds, news sites, activity logs, chat applications.


ShiftingHotspot

Workload::ShiftingHotspot { shift_interval: 10_000, hot_fraction: 0.1 }

The popular keys change periodically. Every shift_interval operations, the hotspot region moves. Tests how quickly policies adapt to changing access patterns.

Parameter Description Typical Values
shift_interval Operations between hotspot shifts 10,000-100,000
hot_fraction Fraction of keys that are hot 0.05-0.2

Models: Trending topics, seasonal products, time-based popularity shifts.


Exponential

Workload::Exponential { lambda: 0.05 }

Exponential decay distribution favoring lower-numbered keys. Popularity drops exponentially with key distance from 0.

Parameter Description Typical Values
lambda Decay rate (higher = steeper dropoff) 0.01-0.1

Models: Time-series data, event logs, recency-biased access.


Usage

use crate::common::workload::{Workload, WorkloadSpec, run_hit_rate};

let spec = WorkloadSpec {
    universe: 100_000,          // Key space size
    workload: Workload::ScrambledZipfian { exponent: 1.0 },
    seed: 42,                   // For reproducibility
};

let mut generator = spec.generator();
let hit_rate = run_hit_rate(&mut cache, &mut generator, 1_000_000, |k| Arc::new(k));
println!("Hit rate: {:.2}%", hit_rate.hit_rate() * 100.0);

Choosing a Workload

Goal Recommended Workload
General benchmarking ScrambledZipfian { exponent: 1.0 }
Scan resistance testing Scan or mix with Zipfian
Temporal locality Latest { exponent: 1.0 }
Adaptation testing ShiftingHotspot
Worst-case baseline Uniform
Explicit hot/cold Hotset

Roadmap

Future workloads organized by category. Priority indicated as P1 (high) to P4 (low).


Category 1: Statistical Distributions

Mathematical distributions for modeling different popularity curves.

Pareto (P2)

The 80/20 rule distribution. Different tail behavior than Zipf - models cases where a small percentage of items receive the vast majority of accesses.

Workload::Pareto { shape: 1.16 }  // shape=1.16 gives 80/20 split

Models: Wealth distribution, file sizes, resource usage patterns.

LogNormal (P3)

Log-normal distribution. Common for quantities that are products of many independent factors.

Workload::LogNormal { mu: 0.0, sigma: 1.0 }

Models: Response times, session lengths, file sizes, latency distributions.

Geometric (P4)

Memoryless distribution. Each key has geometrically decreasing probability.

Workload::Geometric { p: 0.01 }  // p = success probability

Models: Retry patterns, number of attempts until success, waiting times.

ZipfWithCutoff (P3)

Zipfian but caps maximum popularity to prevent extreme skew. More realistic for bounded systems.

Workload::ZipfWithCutoff { exponent: 1.0, max_freq_ratio: 100.0 }

Models: Rate-limited systems, fair-share caches, bounded popularity.

Weibull (P3)

Flexible shape parameter allows modeling various failure/reliability patterns. Generalizes exponential distribution.

Workload::Weibull { shape: 1.5, scale: 1000.0 }

Models: Failure times, component lifetimes, reliability analysis.

Beta (P4)

Bounded distribution on [0, 1]. Flexible shape for modeling ratios and probabilities.

Workload::Beta { alpha: 2.0, beta: 5.0 }

Models: Conversion rates, click-through ratios, bounded popularity scores.

Gamma (P4)

Generalization of exponential. Sum of exponential random variables. Models waiting times for multiple events.

Workload::Gamma { shape: 2.0, scale: 1.0 }

Models: Aggregate latencies, queue waiting times, multi-stage processes.

PowerLawCutoff (P3)

Power law with exponential cutoff. More realistic than pure power law - prevents unbounded popularity in finite systems.

Workload::PowerLawCutoff { exponent: 1.0, cutoff: 10_000.0 }

Models: Real-world popularity with natural limits, network degree distributions.

DoublePower (P4)

Different exponents for head vs tail. Models systems where very popular items behave differently from moderately popular ones.

Workload::DoublePower { head_exponent: 0.8, tail_exponent: 1.5, breakpoint: 100 }

Models: Celebrity effect, viral vs normal content, tiered popularity.


Category 2: Spatial Locality

Patterns where accessing one key makes nearby keys more likely.

Correlated / Stride (P1)

Access to key K makes K+1, K+2, … more likely. Fundamental pattern in sequential data processing.

Workload::Correlated {
    stride: 1,           // Step between correlated accesses
    burst_len: 8,        // Number of sequential accesses in burst
    burst_prob: 0.3,     // Probability of starting a burst
}

Models: Array traversals, database sequential scans, file system reads, B-tree leaf scans.

Loop (P1)

Repeatedly cycles through a fixed-size working set. Critical test for cache sizing - behavior changes dramatically when working set exceeds cache.

Workload::Loop { working_set_size: 1000 }

Models: Iterative algorithms, hash table resizing, repeated batch jobs.

RangeScan (P2)

Random starting point, then sequential burst of N keys. Models range queries in databases.

Workload::RangeScan { burst_len: 100 }

Models: Database range queries, pagination, time-range lookups.

RandomWalk (P3)

Current key ± small random delta. Models local exploration patterns where accesses stay near previous access.

Workload::RandomWalk { step_size: 10, wrap: true }

Models: Search exploration, cursor navigation, spatial queries.

LevyFlight (P3)

Random walk with heavy-tailed jump sizes. Occasional large jumps mixed with local exploration. Models foraging and search patterns.

Workload::LevyFlight { local_prob: 0.9, jump_exponent: 1.5 }

Models: Search engine crawling, user browsing, exploration algorithms.


Category 3: Multi-Region Patterns

Workloads with multiple distinct hotspots or regions.

Bimodal (P2)

Two separate hotspots with configurable split. Models systems with multiple independent popular categories.

Workload::Bimodal {
    region_a_fraction: 0.1,
    region_b_fraction: 0.1,
    region_a_prob: 0.4,
    region_b_prob: 0.4,
}

Models: E-commerce categories, multi-tenant systems, geographic regions.

Multimodal (P3)

Generalization of bimodal to N hotspots. Each region has its own size and access probability.

Workload::Multimodal {
    regions: vec![
        Region { fraction: 0.05, prob: 0.3 },
        Region { fraction: 0.10, prob: 0.4 },
        Region { fraction: 0.15, prob: 0.2 },
    ]
}

Models: Complex multi-category systems, hierarchical popularity.

Gaussian (P3)

Normal distribution centered on a point. Localized access with gradual falloff rather than sharp hot/cold boundary.

Workload::Gaussian { center: 0.5, std_dev: 0.1 }

Models: Spatial data, location-based access, smooth popularity gradients.


Category 4: Temporal Patterns

Workloads where access patterns change over time.

Bursty / SelfSimilar (P1)

Traffic arrives in bursts at multiple time scales. Exhibits long-range dependence - quiet periods followed by intense bursts.

Workload::Bursty {
    hurst: 0.8,          // Hurst parameter (0.5=random, 1.0=max correlation)
    base: Box::new(Workload::Zipfian { exponent: 1.0 }),
}

Models: Network traffic, web requests, API calls, any real-time system.

Periodic (P3)

Access pattern repeats with a fixed period. Models cyclical workloads.

Workload::Periodic {
    period: 1000,        // Operations per cycle
    phases: vec![
        Phase { workload: Workload::Zipfian { exponent: 1.0 }, duration: 800 },
        Phase { workload: Workload::Scan, duration: 200 },
    ]
}

Models: Daily patterns, scheduled jobs, maintenance windows.

ZipfianDrift (P3)

Skew parameter slowly changes over time. Models gradual shifts in access pattern characteristics.

Workload::ZipfianDrift {
    start_exponent: 0.8,
    end_exponent: 1.2,
    drift_over: 100_000,  // Operations to complete drift
}

Models: Evolving user behavior, seasonal changes in popularity distribution.

Diurnal (P4)

Different distributions at different simulated times. Models real-world day/night patterns.

Workload::Diurnal {
    day_workload: Box::new(Workload::Zipfian { exponent: 1.0 }),
    night_workload: Box::new(Workload::Uniform),
    day_fraction: 0.6,
}

Models: Business hours vs off-hours, time-zone effects.

WorkingSetChurn (P1)

Fixed-size working set that slowly drifts over time. More realistic than ShiftingHotspot for modeling gradual popularity changes.

Workload::WorkingSetChurn {
    working_set_size: 1000,
    churn_rate: 0.001,  // Fraction of working set replaced per operation
}

Models: Cache warmup, gradual content rotation, evolving popular items.

FlashCrowd (P1)

Sudden spike in traffic to specific keys. Models viral content or breaking news scenarios where popularity explodes suddenly.

Workload::FlashCrowd {
    base: Box::new(Workload::Zipfian { exponent: 1.0 }),
    flash_prob: 0.001,        // Probability of flash event starting
    flash_duration: 1000,     // Operations during flash
    flash_keys: 10,           // Number of keys affected
    flash_intensity: 100.0,   // Multiplier on access probability
}

Models: Viral content, breaking news, product launches, celebrity posts.

ThunderingHerd (P2)

Many simultaneous requests for the same key after a cache miss. Models the stampede effect when cached item expires.

Workload::ThunderingHerd {
    base: Box::new(Workload::Zipfian { exponent: 1.0 }),
    herd_size: 100,           // Concurrent requests on miss
    herd_prob: 0.01,          // Probability of herd event
}

Models: Cache expiration stampedes, cold start scenarios, failover recovery.

OnOff (P3)

Alternating active/idle periods per key or globally. Models intermittent access patterns.

Workload::OnOff {
    on_duration: 100,
    off_duration: 1000,
    on_workload: Box::new(Workload::Zipfian { exponent: 1.0 }),
}

Models: Batch jobs, scheduled tasks, user session patterns.

MMPP (P4)

Markov-Modulated Poisson Process. Request rate varies according to a hidden Markov chain. Standard model for network traffic.

Workload::MMPP {
    states: vec![
        State { rate: 1.0, workload: Workload::Zipfian { exponent: 1.0 } },
        State { rate: 10.0, workload: Workload::Zipfian { exponent: 0.8 } },
    ],
    transition_matrix: [[0.99, 0.01], [0.05, 0.95]],
}

Models: Network traffic, variable load patterns, hidden state systems.

RateVariation (P3)

Request rate changes over time independent of key distribution. Models varying load levels.

Workload::RateVariation {
    base: Box::new(Workload::Zipfian { exponent: 1.0 }),
    rate_pattern: RatePattern::Sinusoidal { period: 10_000, amplitude: 0.5 },
}

Models: Daily traffic patterns, load testing, capacity planning.


Category 5: Composite / Mixed Workloads

Combinations of multiple patterns or operation types.

ScanResistance (P1)

Mix of point lookups (Zipfian) and periodic sequential scans. The key benchmark for demonstrating S3-FIFO, 2Q, and Clock-PRO advantages over LRU.

Workload::ScanResistance {
    scan_fraction: 0.2,      // 20% of ops are scans
    scan_length: 1000,       // Each scan touches 1000 keys
    point_exponent: 1.0,     // Zipfian for point lookups
}

Models: OLTP + analytics mixed workloads, caches serving batch + interactive.

Interleaved (P2)

K independent streams interleaved. Models multi-tenant or multi-user systems where each stream has its own access pattern.

Workload::Interleaved {
    streams: 10,
    stream_workload: Box::new(Workload::Zipfian { exponent: 1.0 }),
    stream_universe_fraction: 0.1,  // Each stream uses 10% of key space
}

Models: Multi-tenant caches, connection pooling, user sessions.

ProducerConsumer (P3)

Writes follow one distribution, reads follow another. Models systems where data ingestion differs from data retrieval.

Workload::ProducerConsumer {
    write_workload: Box::new(Workload::Latest { exponent: 1.0 }),
    read_workload: Box::new(Workload::Zipfian { exponent: 1.0 }),
    write_fraction: 0.2,
}

Models: Message queues, event sourcing, write-behind caches.

ReadModifyWrite (P2)

Simulates read-then-update patterns. After reading a key, high probability of writing the same key.

Workload::ReadModifyWrite {
    base: Box::new(Workload::Zipfian { exponent: 1.0 }),
    rmw_prob: 0.3,  // 30% of reads followed by write
}

Models: Counters, sessions, shopping carts, any read-update cycle.

ReadWriteMix (P2)

Configurable read/write ratio with separate distributions for each.

Workload::ReadWriteMix {
    read_workload: Box::new(Workload::Zipfian { exponent: 1.0 }),
    write_workload: Box::new(Workload::Uniform),
    read_fraction: 0.95,
}

Models: General CRUD applications, configurable read-heavy/write-heavy.

Mixture (P1)

Weighted combination of N distributions. Meta-workload for creating complex realistic patterns from simple components.

Workload::Mixture {
    components: vec![
        (0.7, Box::new(Workload::Zipfian { exponent: 1.0 })),
        (0.2, Box::new(Workload::Scan)),
        (0.1, Box::new(Workload::Uniform)),
    ],
}

Models: Any complex real-world pattern, configurable benchmarks.

TimeVaryingMixture (P2)

Mixture weights change over time. Models gradual shifts between different access patterns.

Workload::TimeVaryingMixture {
    components: vec![
        Box::new(Workload::Zipfian { exponent: 1.0 }),
        Box::new(Workload::Scan),
    ],
    weight_function: WeightFunction::Linear {
        start: vec![0.9, 0.1],
        end: vec![0.5, 0.5],
        over_ops: 100_000,
    },
}

Models: Gradual workload evolution, A/B test transitions, migration patterns.

Conditional (P3)

Different distribution based on key properties. Models heterogeneous access patterns within same key space.

Workload::Conditional {
    condition: |key| key % 100 == 0,  // Every 100th key
    if_true: Box::new(Workload::Zipfian { exponent: 0.5 }),
    if_false: Box::new(Workload::Zipfian { exponent: 1.5 }),
}

Models: Tiered data, index vs data pages, metadata vs content.


Category 6: Adversarial / Stress Tests

Workloads designed to stress-test cache policies or expose weaknesses.

CacheThrash (P2)

Access pattern sized to exactly overflow cache by a small margin. Tests worst-case eviction behavior.

Workload::CacheThrash { overflow_factor: 1.1 }  // 10% larger than cache

Models: Worst-case scenarios, capacity planning edge cases.

Adversarial / AntiLRU (P2)

Cyclic pattern designed to defeat LRU. Accesses keys 0..N in cycle where N > cache_size. Every access is a miss for pure LRU.

Workload::Adversarial { cycle_size: cache_size + 1 }

Models: Sequential scans larger than cache, worst-case for recency policies.

AntiLFU (P3)

One-time accesses to many unique keys. Defeats frequency counting by never accessing keys more than once.

Workload::AntiLFU { unique_fraction: 0.8 }  // 80% of accesses are unique keys

Models: Crawlers, one-shot requests, high-cardinality workloads.

FlipFlop (P2)

Alternates between two completely disjoint working sets. Tests adaptation speed when access pattern changes completely.

Workload::FlipFlop {
    set_a_size: 1000,
    set_b_size: 1000,
    flip_interval: 10_000,
}

Models: Failover scenarios, A/B deployments, drastic workload changes.

Belady / OPT (P4)

Theoretical optimal - always evicts the item used furthest in the future. Requires pre-computed access trace. Used as upper bound for hit rate.

Workload::Belady { trace: precomputed_trace }

Models: Theoretical maximum, used for comparing policy effectiveness.


Category 7: Application-Specific Patterns

Workloads modeling specific application domains.

WebCrawler (P3)

Correlated bursts of requests to same domain/prefix. Models web crawling or API scraping patterns.

Workload::WebCrawler {
    domain_count: 1000,
    pages_per_domain: 100,
    crawl_depth: 10,
}

Models: Search engine crawlers, site scrapers, link checkers.

DatabaseIndex (P2)

B-tree traversal patterns. Root/internal nodes accessed more frequently than leaves. Models index lookups.

Workload::DatabaseIndex {
    fanout: 100,
    levels: 4,
    point_query_prob: 0.8,
}

Models: B-tree indexes, database buffer pools, file system metadata.

LSMCompaction (P4)

Models LSM-tree compaction access patterns. Sequential reads of sorted runs, writes to new levels.

Workload::LSMCompaction {
    levels: 4,
    size_ratio: 10,
    compaction_prob: 0.01,
}

Models: RocksDB, LevelDB, Cassandra compaction workloads.

SessionBased (P2)

Keys grouped by session. Sessions have Zipfian popularity, keys within session accessed with temporal locality.

Workload::SessionBased {
    session_count: 10_000,
    keys_per_session: 50,
    session_exponent: 1.0,
}

Models: Web sessions, user activity, shopping carts.

GraphTraversal (P3)

BFS/DFS patterns on graph structures. Accesses follow edges, exhibiting locality based on graph structure.

Workload::GraphTraversal {
    node_count: 100_000,
    avg_degree: 10,
    traversal: TraversalType::BFS,
}

Models: Social graph queries, recommendation systems, knowledge graphs.


Category 8: Size-Aware Workloads

For caches with weighted/sized eviction.

VariableSize (P3)

Key access distribution combined with size distribution. Tests size-aware eviction policies.

Workload::VariableSize {
    access: Box::new(Workload::Zipfian { exponent: 1.0 }),
    size_distribution: SizeDistribution::Pareto { min: 100, shape: 1.5 },
}

Models: HTTP caches, object stores, CDNs with variable object sizes.

SmallHotLargeCold (P3)

Small objects are hot, large objects are cold. Common in practice - popular items tend to be smaller.

Workload::SmallHotLargeCold {
    small_size: 1_000,
    large_size: 100_000,
    small_fraction: 0.1,
    small_access_prob: 0.9,
}

Models: Image thumbnails vs full images, summary vs detail data.

CostBenefit (P4)

Each key has an associated cost (fetch latency). Tests GDSF-style policies that consider cost/size tradeoffs.

Workload::CostBenefit {
    access: Box::new(Workload::Zipfian { exponent: 1.0 }),
    cost_distribution: CostDistribution::Exponential { mean: 10.0 },
}

Models: Multi-tier storage, remote fetches with varying latency.


Category 9: Trace Replay

Replay real-world access patterns from trace files.

FileTrace (P3)

Replay from trace file. Supports common formats (MSR, Twitter, YCSB).

Workload::FileTrace {
    path: "traces/twitter_cluster52.trace",
    format: TraceFormat::Twitter,
}

Trace sources:

Models: Production workloads, real-world validation.


Category 10: Markov / State-Based

Workloads where next access depends on current state or previous accesses.

MarkovChain (P2)

Next key depends on current key with transition probabilities. Models dependent access patterns with memory.

Workload::MarkovChain {
    states: 1000,
    transition_matrix: sparse_matrix,  // Or generator function
    initial_state: 0,
}

Models: User navigation, page sequences, stateful protocols.

StackDistance (P2)

Sample directly from LRU stack distance distribution. Gives precise control over reuse behavior and hit rate.

Workload::StackDistance {
    distribution: vec![
        (0.3, 1..10),      // 30% within last 10 accesses
        (0.4, 10..100),    // 40% within last 100
        (0.3, 100..10000), // 30% older
    ],
}

Models: Precise hit rate targeting, stack distance analysis validation.

ReuseDistance (P2)

Specify exact reuse distance distribution directly. Number of unique keys accessed between repeated accesses to same key.

Workload::ReuseDistance {
    distribution: ReuseDistribution::Exponential { mean: 100.0 },
}

Models: Working set analysis, cache sizing studies.

StateMachine (P3)

General finite state machine with per-state workloads and transition rules.

Workload::StateMachine {
    states: vec![
        State { name: "browsing", workload: Workload::Zipfian { exponent: 1.0 } },
        State { name: "checkout", workload: Workload::Loop { working_set_size: 20 } },
    ],
    transitions: vec![
        Transition { from: "browsing", to: "checkout", prob: 0.05 },
        Transition { from: "checkout", to: "browsing", prob: 0.8 },
    ],
}

Models: User journeys, workflow patterns, protocol states.


Category 11: ML/AI Workloads

Patterns specific to machine learning and AI systems.

EmbeddingLookup (P2)

Very large embedding tables with extreme power-law access. Models recommendation systems and NLP embeddings.

Workload::EmbeddingLookup {
    table_size: 10_000_000,
    exponent: 1.5,           // Steeper than typical Zipf
    batch_size: 64,          // Lookups come in batches
    batch_locality: 0.3,     // Correlation within batch
}

Models: RecSys embeddings, word2vec, transformer vocabularies.

FeatureStore (P2)

Feature access patterns for ML inference. Features grouped by entity with temporal patterns.

Workload::FeatureStore {
    entity_count: 1_000_000,
    features_per_entity: 100,
    entity_exponent: 1.0,
    feature_correlation: 0.8,  // Features accessed together
}

Models: ML feature stores, real-time inference, feature serving.

BatchInference (P3)

Batched requests with locality within batch. Models ML serving patterns.

Workload::BatchInference {
    batch_size: 32,
    intra_batch_locality: 0.5,
    inter_batch_workload: Box::new(Workload::Zipfian { exponent: 1.0 }),
}

Models: Model serving, batch prediction, GPU inference batching.

TrainingDataAccess (P4)

Patterns for training data access. Sequential epochs with shuffling.

Workload::TrainingDataAccess {
    dataset_size: 1_000_000,
    batch_size: 256,
    shuffle: true,
    prefetch: 10,
}

Models: ML training data loaders, dataset caching.


Category 12: Hierarchy / Structure

Workloads with hierarchical or structured key relationships.

Hierarchical (P2)

Tree-structured keys where parent nodes accessed before children. Models hierarchical data access.

Workload::Hierarchical {
    levels: 4,
    fanout: 10,
    level_access_prob: vec![0.4, 0.3, 0.2, 0.1],  // Root accessed most
}

Models: File systems, URL hierarchies, organizational data.

Namespace (P2)

Keys organized in namespaces with locality within namespace. Models multi-tenant or categorized data.

Workload::Namespace {
    namespace_count: 100,
    keys_per_namespace: 10_000,
    namespace_exponent: 1.0,
    intra_namespace_locality: 0.8,
}

Models: Multi-tenant caches, database schemas, API endpoints.

PrefixLocality (P3)

Keys with common prefixes accessed together. Models key naming conventions with hierarchical structure.

Workload::PrefixLocality {
    prefix_levels: 3,
    prefix_branching: 10,
    locality_decay: 0.7,  // Locality decreases with prefix distance
}

Models: Key-value stores with hierarchical keys, file paths, URLs.

TreeTraversal (P3)

Specific tree traversal patterns (preorder, inorder, postorder, level-order).

Workload::TreeTraversal {
    tree_size: 10_000,
    fanout: 4,
    traversal: TraversalOrder::LevelOrder,
    subtree_prob: 0.3,  // Probability of full subtree traversal
}

Models: Index scans, DOM traversal, syntax tree processing.


Category 13: Edge Cases / Degenerate

Extreme or pathological workloads for stress testing.

SingleKey (P4)

Only one key ever accessed. 100% hit rate after first access. Tests minimal overhead.

Workload::SingleKey { key: 0 }

Models: Baseline overhead measurement, singleton caches.

AllUnique (P4)

Every access is to a new, never-before-seen key. 0% hit rate. Tests miss handling and eviction overhead.

Workload::AllUnique

Models: Worst-case scenarios, cache-hostile workloads, unique ID generation.

TwoKeys (P4)

Alternates between exactly two keys. Tests minimal working set handling.

Workload::TwoKeys { key_a: 0, key_b: 1, prob_a: 0.5 }

Models: Binary state systems, toggle patterns.

RepeatedBurst (P3)

Same key accessed N times consecutively, then moves to next key. Tests frequency counting and burst handling.

Workload::RepeatedBurst { burst_size: 100, key_selection: Workload::Uniform }

Models: Batch processing per key, repeated retries, polling patterns.

Pathological (P3)

Specifically designed to expose worst-case behavior for a given policy. Parameterized by target policy.

Workload::Pathological { target_policy: Policy::LRU, cache_size: 1000 }

Models: Policy stress testing, worst-case analysis.


Category 14: Protocol-Specific

Workloads modeling specific protocols and systems.

DNS (P3)

DNS query patterns with TTL effects and hierarchical resolution.

Workload::DNS {
    domain_count: 100_000,
    query_exponent: 1.0,
    ttl_distribution: TTLDistribution::Realistic,
    recursive_prob: 0.1,
}

Models: DNS caches, resolver caches, CDN DNS.

HTTP (P3)

HTTP cache patterns including conditional requests, Vary headers, and range requests.

Workload::HTTP {
    url_distribution: Box::new(Workload::Zipfian { exponent: 1.0 }),
    conditional_get_prob: 0.3,
    not_modified_prob: 0.8,
    range_request_prob: 0.1,
}

Models: HTTP proxy caches, browser caches, CDN edge caches.

CDN (P3)

CDN access patterns with geographic distribution, long tail, and edge effects.

Workload::CDN {
    content_count: 1_000_000,
    popularity_exponent: 1.2,
    geographic_regions: 10,
    regional_locality: 0.7,
    long_tail_fraction: 0.9,
}

Models: CDN edge caches, video streaming, static asset delivery.

Memcached (P2)

Memcached-style patterns with GET/SET/DELETE/INCR operation mix.

Workload::Memcached {
    key_distribution: Box::new(Workload::Zipfian { exponent: 1.0 }),
    get_fraction: 0.85,
    set_fraction: 0.10,
    delete_fraction: 0.03,
    incr_fraction: 0.02,
}

Models: Memcached deployments, session stores, rate limiters.

Redis (P3)

Redis-style patterns with data structure operations and pub/sub.

Workload::Redis {
    key_distribution: Box::new(Workload::Zipfian { exponent: 1.0 }),
    string_ops: 0.5,
    hash_ops: 0.2,
    list_ops: 0.15,
    set_ops: 0.1,
    sorted_set_ops: 0.05,
}

Models: Redis caches, leaderboards, real-time analytics.


Implementation Priority Summary

P1 - Critical (Implement First)

Workload Category Rationale
ScanResistance Composite Key differentiator for scan-resistant policies
Correlated Spatial Fundamental pattern, very common
Loop Spatial Critical edge case for cache sizing
WorkingSetChurn Temporal Realistic drift modeling
Bursty Temporal Real-world traffic is bursty
FlashCrowd Temporal Critical real-world scenario (viral content)
Mixture Composite Meta-workload, combines others flexibly

P2 - High Value

Workload Category Rationale
Pareto Statistical Important alternative to Zipf
RangeScan Spatial Database workloads
Bimodal Multi-Region Multi-category systems
Interleaved Composite Multi-tenant scenarios
ReadModifyWrite Composite Common CRUD pattern
ReadWriteMix Composite Configurable benchmarks
TimeVaryingMixture Composite Gradual workload evolution
CacheThrash Adversarial Stress testing
Adversarial Adversarial Policy robustness
FlipFlop Adversarial Adaptation testing
ThunderingHerd Temporal Concurrency stress test
DatabaseIndex Application B-tree workloads
SessionBased Application Web applications
MarkovChain Markov Models dependent access patterns
StackDistance Markov Direct control over reuse behavior
ReuseDistance Markov Precise working set analysis
EmbeddingLookup ML/AI ML systems are huge cache users
FeatureStore ML/AI Real-time ML inference
Hierarchical Hierarchy File systems, URLs
Namespace Hierarchy Multi-tenant systems
Memcached Protocol Common deployment pattern

P3 - Medium Value

Workload Category Rationale
LogNormal Statistical Response time modeling
ZipfWithCutoff Statistical Bounded systems
Weibull Statistical Reliability modeling
PowerLawCutoff Statistical Realistic tail behavior
Multimodal Multi-Region Complex scenarios
Gaussian Multi-Region Smooth gradients
RandomWalk Spatial Search/exploration patterns
LevyFlight Spatial Foraging patterns
Periodic Temporal Scheduled workloads
ZipfianDrift Temporal Evolving patterns
OnOff Temporal Intermittent access
RateVariation Temporal Variable load levels
ProducerConsumer Composite Async systems
Conditional Composite Heterogeneous access
AntiLFU Adversarial LFU stress test
RepeatedBurst Edge Cases Burst handling
Pathological Edge Cases Policy stress testing
WebCrawler Application Crawler patterns
GraphTraversal Application Graph databases
StateMachine Markov User journeys
BatchInference ML/AI Model serving
PrefixLocality Hierarchy Key naming conventions
TreeTraversal Hierarchy Index scans
DNS Protocol DNS caching
HTTP Protocol HTTP proxy caches
CDN Protocol Edge caching
Redis Protocol Redis patterns
VariableSize Size-Aware Object caches
SmallHotLargeCold Size-Aware Common size pattern
FileTrace Trace Real-world validation

P4 - Low Priority

Workload Category Rationale
Geometric Statistical Niche use cases
Beta Statistical Bounded distributions
Gamma Statistical Multi-stage processes
DoublePower Statistical Complex popularity curves
Diurnal Temporal Complex to model well
MMPP Temporal Network modeling, complex
Belady Adversarial Theoretical only
SingleKey Edge Cases Trivial baseline
AllUnique Edge Cases Zero hit rate baseline
TwoKeys Edge Cases Minimal working set
LSMCompaction Application Very specific
TrainingDataAccess ML/AI Training-specific
CostBenefit Size-Aware Requires cost model

Workload Count Summary

Category Implemented Roadmap Total
Core (Implemented) 8 - 8
Statistical Distributions - 9 9
Spatial Locality - 5 5
Multi-Region Patterns - 3 3
Temporal Patterns - 11 11
Composite / Mixed - 7 7
Adversarial / Stress - 5 5
Application-Specific - 5 5
Size-Aware - 3 3
Trace Replay - 1 1
Markov / State-Based - 4 4
ML/AI - 4 4
Hierarchy / Structure - 4 4
Edge Cases / Degenerate - 5 5
Protocol-Specific - 5 5
Total 8 71 79

References

Benchmarks & Traces

Cache Papers

Distributions

Traffic Modeling

ML/AI Systems

Stack Distance Analysis