CUPED

CUPED - Controlled-experiment Using Pre-Existing Data

CUPED (short for Controlled-experiment Using Pre-Existing Data) is a technique which leverages user information from before an experiment to reduce the variance, and increase confidence in experimental metrics. This can help to debias experiments which have meaningful pre-exposure bias (e.g. the groups were randomly different before any treatment was applied). Our Cloud product uses a 7-day window for CUPED calculation. For Warehouse Native customers, a 7-day window is recommended, but you have the flexibility to customize it to any length.

The methodology for simple aggregations is described in the original Microsoft paper, as well as our in-depth article on the technique.

CUPED for Ratio Metrics

The Microsoft paper also gives details on how to implement CUPED for those with a different analysis unit (Appendix B). On Statsig, we extend it to work for our ratio metrics, where each experiment unit is represented by a numerator and a denominator. The variance reduction process is performed by finding the variance of experiment data, pre-experiment data, and the covariance between the two. Denote the numerator, denominator, pre-experiment numerator, and pre-experiment denominator of a unit as

Y

N

X

, and

M

, respectively. Using the CUPED-reduced variance formula,

Var(\frac{Y_{cv}}{N_{cv}})=Var(\frac{Y}{N})+\theta^2 Var(\frac{X}{M})-2\theta Cov(\frac{Y}{N}, \frac{X}{M})

where optimal

\theta

is found as

\frac{Cov(\frac{Y}{N}, \frac{X}{M})}{Var(\frac{X}{M})}

expanded to \

\frac{Cov(\frac{Y}{\mu_N}-\frac{\mu_Y N}{\mu^2_N}, \frac{X}{\mu_M}-\frac{\mu_X M}{\mu^2_M})}{Var(\frac{X}{\mu_M}-\frac{\mu_X M}{\mu^2_M})}

At this point, we have

\frac{\hat{Y_{c}}}{\hat{N_{c}}}=\frac{Y_{c}}{N_{c}}-\theta( \frac{X_{c}}{M_{c}} - \mathbb{E}[R])

\frac{\hat{Y_{t}}}{\hat{N_{t}}}=\frac{Y_{t}}{N_{t}}-\theta( \frac{X_{t}}{M_{t}} - \mathbb{E}[R])

While

\mathbb{E}[R]

is hard to deduct, we recognized that the expectation term is the same for both group. We decided to substitute

\mathbb{E}[R]

with

\frac{X_{c}}{M_{c}}

so the formulas above are transformed to these following two:

\frac{Y_{cv}(control)}{N_{cv}(control)}=\frac{Y(control)}{N(control)}

\frac{Y_{cv}(test)}{N_{cv}(test)} \\ :=\frac{Y(control)}{N(control)} - (\frac{Y(control)}{N(control)} - \theta \frac{X(control)}{M(control)}) + (\frac{Y(test)}{N(test)} - \theta\frac{X(test)}{M(test)}) \\ :=\frac{Y(test)}{N(test)} - \theta\frac{X(test)}{M(test)} + \theta \frac{X(control)}{M(control)}

Using the optimal

\theta

, we are hoping to reduce group-level variance by plugging the parameter back in to calculate the adjustment. Please note that across-group

\theta

does not necessarily reduce variance for one group, or the sum of variances of all groups, but in most cases it does. Our simulation shows that 98.3% of metrics saw a decrease by CUPED. Statsig will use CUPED variance when all of the following are met:

Core assumptions of the CUPED model are satisfied; this can be violated due to rounding error or other data artifacts
- E(X_hat) = E(X)
- The pooled variance of the adjusted population across groups is < the variance of the unadjusted population
Enough units have pre-experiment values (> 100)
Enough percentage of units have pre-experiment values (> 5%)

Get Started

Experiments

Feature Management

Analytics

AI Evals

Other Features

Tutorials

CUPED

CUPED - Controlled-experiment Using Pre-Existing Data

CUPED

CUPED for Ratio Metrics

Get Started

Experiments

Feature Management

Analytics

AI Evals

Other Features

Tutorials

​CUPED - Controlled-experiment Using Pre-Existing Data

​CUPED

​CUPED for Ratio Metrics

CUPED - Controlled-experiment Using Pre-Existing Data

CUPED

CUPED for Ratio Metrics