Chapter 2 — Exercises & Labs (Application Mode)¶
Measure theory meets sampling: every probabilistic definition in Chapter 2 has a concrete simulator counterpart. Use these labs to validate the \(\sigma\)-algebra intuition numerically.
Lab 2.1 — Segment Mix Sanity Check¶
Objective: verify that empirical segment frequencies converge to the segment distribution \(\mathbf{p}_{\text{seg}}\) from [DEF-2.2.6].
This lab is implemented in scripts/ch02/lab_solutions.py (see ch02_lab_solutions.md for a full transcript).
from scripts.ch02.lab_solutions import lab_2_1_segment_mix_sanity_check
_ = lab_2_1_segment_mix_sanity_check(seed=21, n_samples=10_000, verbose=True)
Output (actual):
======================================================================
Lab 2.1: Segment Mix Sanity Check
======================================================================
Sampling 10,000 users from segment distribution (seed=21)...
Theoretical segment mix (from config):
price_hunter : p_seg = 0.350
pl_lover : p_seg = 0.250
premium : p_seg = 0.150
litter_heavy : p_seg = 0.250
Empirical segment frequencies (n=10,000):
price_hunter : p_hat_seg = 0.335 (Δ = -0.015)
pl_lover : p_hat_seg = 0.254 (Δ = +0.004)
premium : p_hat_seg = 0.153 (Δ = +0.003)
litter_heavy : p_hat_seg = 0.258 (Δ = +0.008)
Deviation metrics:
L∞ (max deviation): 0.015
L1 (total variation): 0.030
L2 (Euclidean): 0.018
[!] L∞ deviation (0.015) exceeds 3$\sigma$ (0.014)
Tasks
1. Repeat the experiment with different seeds and report the \(\ell_\infty\) deviation \(\|\hat{\mathbf{p}}_{\text{seg}} - \mathbf{p}_{\text{seg}}\|_\infty\); relate the result to the law of large numbers discussed in Chapter 2.
2. Run scripts/ch02/lab_solutions.py::lab_2_1_degenerate_distribution and interpret each test case in terms of positivity/overlap from §2.6 (support coverage for Radon–Nikodym derivatives).
Lab 2.2 — Query Measure and Base Score Integration¶
Objective: link the click-model measure \(\mathbb{P}\) defined in §2.6 to simulator code paths, and verify square-integrability predicted by [PROP-2.8.1].
from scripts.ch02.lab_solutions import lab_2_2_base_score_integration
_ = lab_2_2_base_score_integration(seed=3, verbose=True)
Output (actual):
======================================================================
Lab 2.2: Query Measure and Base Score Integration
======================================================================
Generating catalog and sampling users/queries (seed=3)...
Catalog statistics:
Products: 10,000 (simulated)
Categories: ['dog_food', 'cat_food', 'litter', 'toys']
Embedding dimension: 16
User/Query samples (n=100):
Sample 1:
User segment: litter_heavy
Query type: brand
Query intent: litter
Sample 2:
User segment: price_hunter
Query type: category
Query intent: litter
...
Base score statistics across 100 queries × 100 products each:
Score mean: 0.098
Score std: 0.221
Score min: -0.558
Score max: 0.933
Score percentiles:
5th: -0.258
25th: -0.057
50th: 0.095
75th: 0.248
95th: 0.466
[OK] Scores are square-integrable (finite variance) as required by Proposition 2.8.1
[OK] Score std $\approx 0.22$ (finite second moment)
[!] Scores NOT bounded to [0,1]---Gaussian noise makes them unbounded
Tasks
1. Examine the score distribution: compute mean, std, min, max, and selected quantiles (5%, 95%). Note that scores are not bounded to \([0,1]\) but are square-integrable with finite variance, as predicted by [PROP-2.8.1]. What empirical distribution do we observe? Do any scores fall outside \([-1, 2]\)?
2. Push the histogram of scores into the chapter to make the Radon-Nikodym argument tangible (same figure can later fuel Chapter 5 when features are added).
Lab 2.3 — Textbook Click Model Verification¶
Objective: verify that toy implementations of PBM ([DEF-2.5.1], [EQ-2.1]) and DBN ([DEF-2.5.2], [EQ-2.3]) match their theoretical predictions exactly.
from scripts.ch02.lab_solutions import lab_2_3_textbook_click_models
_ = lab_2_3_textbook_click_models(seed=42, verbose=True)
Output (actual):
======================================================================
Lab 2.3: Textbook Click Model Verification
======================================================================
Verifying PBM [DEF-2.5.1] and DBN [DEF-2.5.2] match theory exactly.
--- Part A: Position Bias Model (PBM) ---
Configuration:
Positions: 10
theta_k (examination): exponential decay with lambda=0.3
rel(p_k) (relevance): linear decay from 0.70 to 0.25
Theoretical prediction [EQ-2.1]:
P(C_k = 1) = rel(p_k) * theta_k
Simulating 50,000 sessions...
Position | theta_k | rel(p_k) | CTR theory | CTR empirical | Error
----------------------------------------------------------------------
1 | 0.900 | 0.70 | 0.6300 | 0.6305 | 0.0005
2 | 0.667 | 0.65 | 0.4334 | 0.4300 | 0.0034
3 | 0.494 | 0.60 | 0.2964 | 0.2957 | 0.0007
4 | 0.366 | 0.55 | 0.2013 | 0.2015 | 0.0002
5 | 0.271 | 0.50 | 0.1355 | 0.1376 | 0.0020
6 | 0.201 | 0.45 | 0.0904 | 0.0888 | 0.0015
7 | 0.149 | 0.40 | 0.0595 | 0.0587 | 0.0008
8 | 0.110 | 0.35 | 0.0386 | 0.0387 | 0.0001
9 | 0.082 | 0.30 | 0.0245 | 0.0250 | 0.0005
10 | 0.060 | 0.25 | 0.0151 | 0.0148 | 0.0003
Max absolute error: 0.0034
checkmark PBM: Empirical CTRs match [EQ-2.1] within 1% tolerance
--- Part B: Dynamic Bayesian Network (DBN) ---
Configuration:
rel(p_k) * s(p_k) (relevance * satisfaction):
[0.14, 0.12, 0.11, 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03]
Theoretical prediction [EQ-2.3]:
P(E_k = 1) = prod_{j<k} [1 - rel(p_j) * s(p_j)]
Simulating 50,000 sessions...
Position | P(E_k) theory | P(E_k) empirical | Error
-------------------------------------------------------
1 | 1.0000 | 1.0000 | 0.0000
2 | 0.8600 | 0.8580 | 0.0020
3 | 0.7538 | 0.7536 | 0.0002
4 | 0.6724 | 0.6714 | 0.0010
5 | 0.6095 | 0.6081 | 0.0014
6 | 0.5608 | 0.5595 | 0.0012
7 | 0.5229 | 0.5238 | 0.0009
8 | 0.4936 | 0.4953 | 0.0017
9 | 0.4712 | 0.4728 | 0.0017
10 | 0.4542 | 0.4565 | 0.0023
Max absolute error: 0.0023
checkmark DBN: Examination probabilities match [EQ-2.3] within 1% tolerance
--- Part C: PBM vs DBN Comparison ---
Examination probability at position 5:
PBM: P(E_5) = theta_5 = 0.271 (fixed by position)
DBN: P(E_5) = 0.610 (depends on cascade)
Key insight:
DBN predicts HIGHER examination at later positions because users
who reach position 5 are 'unsatisfied browsers' who continue scanning.
PBM's fixed theta_k is simpler but ignores this selection effect.
Tasks
1. Verify that the DBN simulation in scripts/ch02/lab_solutions.py::simulate_dbn implements [EQ-2.3]: \(P(E_k = 1) = \prod_{j < k} [1 - \text{rel}(p_j) \cdot s(p_j)]\), then vary satisfaction probabilities and re-run.
2. Compare PBM and DBN examination probabilities at position 5. Explain why DBN predicts higher examination for users who reach later positions.
Lab 2.4 — Nesting Verification ([PROP-2.5.4])¶
Objective: verify PROP-2.5.4: under a parameter specialization, the Utility-Based Cascade Model (Section 2.5.4) reproduces the PBM per-position marginal factorization.
from scripts.ch02.lab_solutions import lab_2_4_nesting_verification
_ = lab_2_4_nesting_verification(seed=42, verbose=True)
Output (actual):
======================================================================
Lab 2.4: Nesting Verification ([PROP-2.5.4])
======================================================================
Goal: Verify [PROP-2.5.4](a): under the parameter specialization,
the Utility-Based Cascade reproduces PBM's per-position marginal factorization.
--- Configuration ---
Full Utility-Based Cascade:
alpha_price = 0.8
alpha_pl = 1.2
sigma_u = 0.8
satisfaction_gain = 0.5
abandonment_threshold = -2.0
PBM-like Configuration:
alpha_price = 0.0
alpha_pl = 0.0
sigma_u = 0.0
satisfaction_gain = 0.0
abandonment_threshold = -100.0
Simulating 5,000 sessions for each configuration...
--- Results ---
Position | Full CTR | PBM-like CTR | Difference
--------------------------------------------------
1 | 0.4168 | 0.5096 | -0.0928
2 | 0.2394 | 0.3620 | -0.1226
3 | 0.1376 | 0.2342 | -0.0966
4 | 0.0726 | 0.1502 | -0.0776
5 | 0.0448 | 0.0872 | -0.0424
6 | 0.0272 | 0.0444 | -0.0172
7 | 0.0078 | 0.0246 | -0.0168
8 | 0.0068 | 0.0128 | -0.0060
9 | 0.0024 | 0.0064 | -0.0040
10 | 0.0004 | 0.0034 | -0.0030
--- Stop Reason Distribution ---
Reason | Full Config | PBM-like
---------------------------------------------
exam_fail | 94.6% | 99.3%
abandonment | 5.1% | 0.0%
purchase_limit | 0.2% | 0.0%
end | 0.2% | 0.7%
--- Interpretation ---
Key observations:
1. PBM-like config has no satisfaction-based abandonment (threshold = -100)
2. PBM-like config has no purchase-limit stopping
3. PBM-like CTR matches PBM marginal factorization: CTR_k ≈ theta_k × rel(p_k)
max |CTR_empirical - theta_k×rel(p_k)| = 0.0047
4. Full config CTR varies with utility (price, PL, noise)
This verifies [PROP-2.5.4](a): under the specialization,
the Utility-Based Cascade reproduces PBM's per-position marginal factorization.
Tasks 1. Re-run the lab with different seeds and verify that the PBM factorization error stays small in PBM-like mode (the printed max error should remain near 0 up to Monte Carlo noise). 2. Progressively re-enable utility terms (\(\alpha_{\text{price}}\), then \(\alpha_{\text{pl}}\)) and observe how CTR by position and stopping reasons change relative to the PBM-like specialization.
Lab 2.5 — Utility-Based Cascade Dynamics ([DEF-2.5.3])¶
Objective: verify the three key mechanisms of the production click model from Section 2.5.4: position decay, satisfaction dynamics, and stopping conditions.
from scripts.ch02.lab_solutions import lab_2_5_utility_cascade_dynamics
_ = lab_2_5_utility_cascade_dynamics(seed=42, verbose=True)
Output (actual):
======================================================================
Lab 2.5: Utility-Based Cascade Dynamics ([DEF-2.5.3])
======================================================================
Verifying three key mechanisms:
1. Position decay (pos_bias)
2. Satisfaction dynamics (gain/decay)
3. Stopping conditions
Configuration:
Positions: 20
satisfaction_gain: 0.5
satisfaction_decay: 0.2
abandonment_threshold: -2.0
pos_bias (category, first 5): [1.2, 0.9, 0.7, 0.5, 0.3]
Simulating 2,000 sessions...
--- Part 1: Position Decay ---
Position | Exam Rate | CTR|Exam | pos_bias
--------------------------------------------------
1 | 0.767 | 0.387 | 1.20
2 | 0.520 | 0.563 | 0.90
3 | 0.349 | 0.401 | 0.70
4 | 0.197 | 0.353 | 0.50
5 | 0.100 | 0.485 | 0.30
6 | 0.052 | 0.533 | 0.20
7 | 0.025 | 0.353 | 0.20
8 | 0.015 | 0.600 | 0.20
9 | 0.005 | 0.400 | 0.20
10 | 0.002 | 1.000 | 0.20
Observation: Examination rate decays with position, matching pos_bias pattern.
--- Part 2: Satisfaction Dynamics ---
Sample satisfaction trajectories (first 5 sessions):
Session 1: 0.00 -> -0.20 (exam_fail)
Session 2: 0.00 -> -0.20 -> 0.22 -> 0.02 -> -1.75 (exam_fail)
Session 3: 0.00 -> -0.20 -> 0.18 -> -0.29 (exam_fail)
Session 4: 0.00 -> -0.20 -> 0.23 -> 0.03 -> -0.44 -> -0.64 -> -0.33 -> -0.53 ... (exam_fail)
Session 5: 0.00 -> -0.20 (exam_fail)
Final satisfaction statistics:
Mean: -0.49
Std: 0.71
Min: -3.47
Max: 1.79
--- Part 3: Stopping Conditions ---
Stop Reason | Count | Percentage
---------------------------------------------
exam_fail | 1900 | 95.0%
abandonment | 98 | 4.9%
purchase_limit | 2 | 0.1%
end | 0 | 0.0%
Session length statistics:
Mean: 2.0 positions
Std: 1.9
Median: 2
Clicks per session:
Mean: 0.90
Max: 7
--- Verification Summary ---
checkmark Position decay: Examination rate follows pos_bias pattern
checkmark Satisfaction dynamics: Trajectories show gain on click, decay on no-click
checkmark Stopping conditions: All three mechanisms observed (exam, abandon, limit)
Tasks
1. Plot the engagement trajectory \(H_k\) (called satisfaction in code) for 10 representative sessions. Identify sessions that ended due to: (a) examination failure, (b) threshold crossing, (c) purchase limit.
2. Verify that the mean examination decay matches the position bias vector pos_bias used in the model.
3. Modify satisfaction_gain and satisfaction_decay parameters. Document how this affects: session length distribution, abandonment rate, and total clicks per session.