Importance Sampling

Why smart sampling beats brute force when you need to estimate expected values in enormous game trees — illustrated with heads-up poker.

You Can't Search the Whole Tree

Pick a concrete question: you have two heads-up poker strategies and you want to know which one makes more money. In theory, the answer is just arithmetic. Deal out nine cards — two for each player, five for the board — and build the full game tree. At every decision node each strategy says “fold with probability 0.12, call with probability 0.55, raise with probability 0.28, all-in with probability 0.05,” or whatever its policy dictates. Walk every branch, weight each leaf by the product of probabilities along its path, and sum. That gives you the expected value (EV) for that particular deal.

For a single deal the tree is manageable — roughly a thousand nodes when players start with 50 BB and may fold, call, raise pot, or go all-in. But you don't care about one deal. You care about the average EV across every possible deal. There are C(52, 9) ≈ 3.7 billion distinct nine-card draws, each spawning its own tree. Exhaustive evaluation is hopeless.

Uniform Sampling: Fast but Noisy

The obvious shortcut is Monte Carlo sampling. Shuffle the deck, peel off nine cards, trace one random trajectory through the game tree (choosing each action according to the strategy’s own probabilities), record the payoff, and repeat. Average the payoffs. By the law of large numbers this converges to the true EV eventually.

The trouble is how long “eventually” takes. To see why, think about what most sampled hands look like. The majority end quickly: one player folds preflop, the payoff is +0.5 or −1 BB, and you learn almost nothing. Once in a while both players hit a strong hand, the pot balloons to 100 BB, and that one sample yanks the running average by several BB. Those rare, large-magnitude outcomes dominate the variance of the estimator.

Concrete example

Suppose 90% of trajectories end in a ±1 BB fold, and 10% end in a ±50 BB all-in. The variance of a single sample is roughly 0.9·1² + 0.1·50² = 250.9. After 50 uniform samples, the standard error of the mean is √(250.9/50) ≈ 2.2 BB — a huge error bar relative to typical strategy-EV differences of a fraction of a BB.

Importance Sampling: Spend Attention Where It Matters

The insight is simple: if the all-in hands are what make the estimate noisy, sample them more often. Instead of drawing trajectories from the strategy’s natural probability distribution p, draw from a proposal distribution q that puts extra weight on branches where a lot of chips move. To compensate for this deliberate bias, multiply each sample’s payoff by the likelihood ratio p/q:

E[X] ≈ (1/N) Σi Xi · p(Xi) / q(Xi)

This is still an unbiased estimator of the same EV — the p/q ratio exactly cancels the sampling bias — but its variance can be much lower. The ideal proposal is q ∝ p·|payoff|, which in the limit gives zero variance (every sample contributes equally to the sum).

Same example, importance-sampled

Go back to the 90/10 fold/all-in split. Set the proposal to allocate half the samples to folds and half to all-ins (instead of 90/10). Each fold sample now carries weight 0.9/0.5 = 1.8 and payoff ±1 BB, contributing ±1.8 BB. Each all-in sample carries weight 0.1/0.5 = 0.2 and payoff ±50 BB, contributing ±10 BB. The variance of a single weighted sample drops to roughly 0.5·1.8² + 0.5·10² = 51.6 — about a fifth of the uniform-sampling variance. After 50 importance-sampled draws, the standard error is √(51.6/50) ≈ 1.0 BB instead of 2.2.

With a better-tuned proposal (closer to the optimal q ∝ p·|payoff|) the improvement is larger still.

In practice you rarely know the ideal proposal, because it requires knowing the payoffs you're trying to estimate. But you don't need the ideal — any reasonable approximation helps. A fast heuristic evaluation (say, a simplified equity calculator) can tell you which branches are likely to be big pots, and that's enough to build a proposal that materially cuts variance.

When to Reach for Importance Sampling

The pattern generalises beyond poker. Whenever you are estimating an expectation and a small fraction of outcomes accounts for most of the signal — rare defaults in credit risk, tail events in option pricing, high-reward trajectories in reinforcement learning — uniform sampling wastes most of its budget on the uneventful majority. Importance sampling lets you redirect that budget toward the outcomes that actually move the needle, then correct for the uneven sampling via reweighting.

The only real danger is choosing a proposal that is worse than uniform — one that systematically avoids high-impact outcomes, so the rare times they do appear they carry enormous weights and blow up the variance. The rule of thumb is conservative: if your proposal at least covers the support of the original distribution and concentrates some extra mass on the tails, you almost certainly come out ahead.

You Can't Search the Whole Tree

Suppose you want to know how much a poker strategy is worth. In principle, the answer is straightforward: build the full game tree for a given deal, assign a probability to every action at every decision node (fold, call, raise, all-in), and compute the probability-weighted average payoff. That weighted average is the expected value (EV).

For a single deal this is tractable — the tree below has about 900 nodes. But a strategy's true EV is an average over all possible deals. There are C(52, 9) ≈ 3.7 billion ways to choose 9 cards, and each generates its own tree. Exhaustive search is out of the question.

Full tree search via DFS on a cherry-picked deal (J♠T♠ vs 9♦8♦), then repeated faster on 5 random deals. Each node is explored and EVs are propagated back to the root.
Key idea: A single deal's game tree is searchable, but averaging over all ≈3.7 billion possible deals is not. We need to sample.

Uniform Sampling: Fast but Noisy

The simplest escape is Monte Carlo sampling: draw deals at random, play out a single trajectory through each tree according to the strategies' action probabilities, and average the outcomes. By the law of large numbers, the average converges to the true EV.

The catch is variance. Most randomly-sampled hands are uninteresting — one player folds preflop and the payoff is ±1 BB. The occasional all-in pot worth 100 BB dominates the estimate, so convergence is slow. After 50 samples you might still be far from the truth.

50 randomly sampled trajectories traced through their game trees, accumulating a running EV estimate. Note how the estimate jumps around as high-stakes hands land.
Key idea: Uniform sampling is unbiased but high-variance. Rare, high-impact outcomes dominate the error.

Importance Sampling: Spend Attention Where It Matters

Importance sampling replaces the uniform distribution over trajectories with a proposal distribution that deliberately over-samples the high-impact branches — the ones where a lot of chips move. To stay unbiased, each sample is re-weighted by the ratio of the original probability to the proposal probability:

E[X] = Σi Xi · p(Xi) / q(Xi) · (1/N)

If the proposal q is proportional to p · |payoff|, the high-variance outcomes are sampled more often but down-weighted proportionally, and the boring folds are sampled less often but up-weighted. The net effect is a dramatically tighter estimate for the same number of samples.

Left: 100 uniform-probability trajectories and their EV estimate. Right: 100 importance-sampled trajectories (proposal ∝ p·|EV|) and their EV estimate. Both are compared against the true EV.
Key idea: By sampling branches in proportion to their impact — not just their probability — importance sampling achieves lower variance for the same computational budget. The estimate converges faster because attention is concentrated where it matters.

When to Reach for Importance Sampling

Importance sampling shines whenever you need to estimate an expectation and the quantity you're averaging has high variance under the natural distribution. Poker EV is a textbook case: the natural action-probability distribution puts most weight on cheap folds, while the signal lives in the rare large pots. But the same principle applies to option pricing, rare-event simulation, reinforcement-learning returns, and any other setting where a few trajectories carry most of the information.

The core trade-off is simple: you exchange the problem of reducing sample variance for the problem of choosing a good proposal distribution. A perfect proposal (proportional to p · |f|) gives zero variance; a bad one can make things worse. In practice, even a rough approximation — like using a fast heuristic evaluation to guide sampling — usually beats uniform sampling by a wide margin.