Backtesting Risk Models

Backtesting is the process of comparing a risk model's predictions against actual outcomes. It answers a fundamental question: is the model working? For VaR models, backtesting checks whether the predicted loss threshold is breached at the expected frequency. Backtesting is heavily tested on the FRM exam and is a core regulatory requirement under Basel.

The Backtesting Framework

The basic VaR backtesting procedure is:

  1. Record the daily VaR estimate at a chosen confidence level (e.g., 99%)
  2. Observe the actual P&L outcome the following day
  3. Count exceptions — days where the actual loss exceeds the VaR estimate
  4. Compare the exception count against the expected count over the observation window

For a 99% VaR model over 250 trading days, you would expect approximately 2.5 exceptions. Significantly more exceptions suggest the model understates risk; significantly fewer may indicate excessive conservatism.

Key Statistical Tests

Kupiec's POF Test (Proportion of Failures)

The simplest backtest — a binomial test of whether the observed exception rate matches the expected rate:

  • Null hypothesis: The true exception rate equals the model's predicted rate (1% for 99% VaR)
  • Test statistic: Likelihood ratio comparing observed vs. expected failure proportion
  • Limitation: Only tests the unconditional coverage; ignores clustering

Christoffersen's Conditional Coverage Test

This extends Kupiec by testing both coverage and independence:

  • Coverage component — Are there the right number of exceptions? (same as Kupiec)
  • Independence component — Are exceptions randomly distributed, or do they cluster?
  • Why it matters — Clustered exceptions suggest the model fails to adapt to regime changes, which is worse than random exceptions

Basel Traffic Light Approach

The Basel Committee uses a practical classification system for internal models:

ZoneExceptions (250 days, 99%)Consequence
Green0–4No action required
Yellow5–9Increased scrutiny; possible capital multiplier increase
Red10+Presumption model is inadequate; significant capital surcharge

The capital multiplier (applied to VaR for market risk capital) ranges from 3.0 (green) to 4.0 (red zone), directly impacting regulatory capital requirements.

Challenges in Backtesting

Several practical issues complicate backtesting:

  • Sample size — 250 observations per year is a small sample for testing a 99% quantile; statistical power is low
  • P&L definition — Should you use hypothetical P&L (positions frozen) or actual P&L (including intraday trading)?
  • Changing portfolios — VaR was computed for yesterday's portfolio; today's portfolio may differ
  • Model risk — A model may pass backtests in calm periods but fail spectacularly in stress periods
  • Expected Shortfall backtesting — ES is harder to backtest than VaR because it involves the average of tail losses, not just a single quantile

Beyond VaR Backtesting

Modern risk management extends backtesting beyond VaR:

  • ES backtesting — Tests like the Acerbi-Szekely test compare realized tail losses to ES predictions
  • Stress test validation — Comparing scenario predictions to actual crisis outcomes
  • Credit model backtesting — Comparing predicted default rates to realized defaults across rating grades
  • Model comparison — Using backtesting to rank competing models (e.g., historical simulation vs. Monte Carlo)

FRM Exam Focus Areas

For FRM Part 1 and Part 2, know:

  • How to calculate expected exceptions for a given confidence level and sample size
  • Kupiec's POF test setup and interpretation
  • Christoffersen's independence and conditional coverage tests
  • The Basel traffic light framework and capital multiplier implications
  • Limitations of backtesting (low power, Type I/II error tradeoffs)
  • Why ES backtesting is more challenging than VaR backtesting

Backtesting is the cornerstone of model validation. A risk model that cannot be backtested cannot be trusted.