Use Cases
Count distinct metrics have two primary use cases:- Measuring interactions and surface area. For example, you might to count the number of entities a user has engaged with on a video streaming platform, or measure if your new recommendation engine increases the diversity of products clicked
- As a denominator in ratio metrics, especially common when you want to normalize by a unit other than your experiment’s unit of analysis. For example, a B2B experiment might run an experiment at the company level, but measure “Clicks per USER” by making a ratio metric of COUNT(clicks)/COUNT_DISTINCT(user_id).
Calculation
At the unit level, count distinct metrics use COUNT_DISTINCT on their input column. At the group level, the mean is calculated as the SUM of the unit-level COUNT_DISTINCTs, divided by the count of UNIQUE UNITS exposed to the experiment. This would look like the SQL below:Methodology Notes
In the metrics page view, we use APPROX_COUNT_DISTINCT (or equivalent) to avoid massive compute jobs on analytical count distinct, and because the approximate error becomes acceptably small for the topline estimate. For experiment result loads, the calculation is analytical and exact to avoid jitter or bias from approximation error.Options
- Metric Breakdowns
- You can configure Metadata Columns to group results by, getting easy access to dimensional views in pulse results
- Winsorization
- Specify a lower and/or upper percentile bound to winsorize at. All values below the lower threshold, or above the upper threshold, will be clamped to that threshold to reduce the outsized impact of outliers on your analysis
- CUPED
- Specify if you want to calculate CUPED, and the lookback window for CUPED’s pre-experiment data inputs
- Thresholding
- Turn this metric into a 1/0 unit count metric counting if the unit’s total count equals to or surpasses (>=) a given threshold
- Cohort Windows
- You can specify a window for data collection after a unit’s exposure. For example, a 0-1 day cohort window would only count actions from days 0 and 1 after a unit was exposed to an experiment
- Only include units with a completed window can be selected to remove units out of pulse analysis for this metric until the cohort window has completed
- You can specify a window for data collection after a unit’s exposure. For example, a 0-1 day cohort window would only count actions from days 0 and 1 after a unit was exposed to an experiment