Skip to main content
This page is intended to give you a high level overview of the pipeline Statsig Warehouse Native will run on your warehouse.

Main Steps

The main steps in the pipeline are:
  • Identifying users’ first exposures
  • Annotating Metric Sources with exposure data
  • Creating metric-user-day level staging data
  • Running intermediate rollups for better performance
  • Calculating group-level summary Statistics
Pipeline View

Types of DAGs

Statsig lets you run your pipeline in a few different ways:
  • A Full Refresh totally restates the experiment’s data and calculates it from scratch. This is useful for starting an experiment, or if underlying data has changed
  • An Incremental Refresh appends new data to your experiment data. This reduces the cost of running scheduled updates to your results
  • A Metric refresh allows you to update a specific metric in case you changed a definition, or want to add new metrics to your analysis

Artifacts and Entity Relationships

The following tables will be generated and stored in your warehouse per-experiment. You have full access to these data sources for your own analysis, models, or visualizations. For experiments, experiment_id will be the name of the experiment; for Feature Gates, experiment_id will be the name of the gate along with the specific rule ID (e.g. chatbot_llm_model_switch_31e9jwlgO1bSSznKntb2gp_exposures_summary) This is not an exhaustive list, but includes most of the core result/staging tables that you might be interested in using for your own analysis. Note - These are internal tables and will change as the product evolves. Changes will be documented here.
TableDescriptionNotes
first_exposures_<experiment_id>Deduplicated and stitched (for experiments with ID resolution) first exposure eventsUseful for ad-hoc analysis
exposures_summary_<experiment_id>Timeseries of exposures per group for display in Pulse
unit_day_metrics_<experiment_id>User-day level metric aggregations tableUseful for ad-hoc analysis
unit_covariate_metrics_<experiment_id>User-level pre-experiment aggregations for regression adjustment/CUPED
funnel_events_<experiment_id>Staging table for running funnel analysis
percentile_values_<experiment_id>Staging table for running percentile analysis
distinct_values_<experiment_id>Staging table for running count distinct analysis
windowed_metrics_<experiment_id>Staging table for generating running totals when restating Pulse
ratio_aggregations_<experiment_id>Staging table for generating running totals when restating Pulse
results_<rollup>_<experiment_id>Outputs of Statistical Analysis for different rollups (e.g. daily, days-since-exposure, cumulative, 7-day). Exported to StatsigPulse inputs - useful for replicating Statistical analysis
ratio_results_<rollup>_<experiment_id>Outputs of Statistical Analysis for ratio metrics in different rollups (e.g. daily, days-since-exposure, cumulative, 7-day). Exported to StatsigPulse inputs - useful for replicating Statistical analysis
The high level relationships/contents of these tables are represented below - refer to the Main Steps image below for scheduling details.
WHN ER Diagram

Other Jobs

Alongside and inside this main flow, Statsig will also:
  • Run Health Checks and a Summary View for exposures
  • Calculate top dimensions for dimensional metrics
  • Calculate funnel steps
  • Run CUPED and Winsorization procedures during the group-level summaries to reduce variance and outlier influence
  • Calculate inputs to the Delta Method to avoid bias on Ratio and Mean metrics
Statsig generates experiment-level tables - this makes it easy to do your own follow-up analyses on specific experiments.

Visibility

Clicking into the history icon on your pulse results, you’ll be able to see the Jobs and IDs we ran for each pulse reload, alongside relevant information on compute time and cost. This will also be fully transparent from your own Warehouse’s history and usage management, but having the costs in console is helpful knowledge for the cross-functional experimentation teams running the analysis.

Exposure Export Table

Statsig dedupes and records production exposures into the forwarded exposures table configured in your warehouse Data Connection. This table contains each user’s first exposure to an experiment. For feature gates, we dedupe and record exposures for partial rollouts (e.g. 5% or 50% rollouts - but not 0% or 100% rollouts).
Column NameData TypeDescription
experiment_idstringThe identifier for the gate/experiment
group_idstringgroupID for experiments; ruleID+Pass/Fail for gates
group_namestringName of the experiment group (e.g. Control vs Test)
user_idstringThe ID passed in as the Statsig userID
stable_idstringStatsig Client SDK managed stable device identifier
[your custom ids]stringOne column for every custom unitID you use on Statsig
timestamptimestampTimestamp of the first exposure
user_dimensionsobjectWarehouse specific object with all the user dimensions
user_dimensions is populated in the daily deduplicated export. Fast-forwarded exposure rows can omit some fields in this object until the next daily load.

Common Fields in user_dimensions

user_dimensions contains user attributes captured alongside the first exposure. The exact shape can vary by SDK and project configuration, but these are some of the most common fields you may see:
FieldDescriptionNotes
osNormalized operating system nameCanonical OS field on the exposure side.
os_versionOperating system versionDerived from SDK metadata or user agent parsing.
browser_nameBrowser nameDerived from SDK metadata or user agent parsing.
browser_versionBrowser versionDerived from SDK metadata or user agent parsing.
device_modelDevice modelForwarded or inferred when available.
ipIP addressPresent when available from the SDK or request context.
countryCountryDerived from request context or IP lookup.
localeLocaleForwarded or inferred when available.
languageLanguageForwarded or inferred when available.
appVersionApplication versionForwarded when present on the SDK user object.
sessionIDSession identifierForwarded when present on the SDK user object.
appIdentifierApplication identifierForwarded when present on the SDK user object.
Additional non-null fields from the SDK user object may also appear in user_dimensions. Custom IDs are typically exported as dedicated top-level columns in the exposure table rather than being queried from this object. Input fields such as deviceOS and systemName are used to derive the exported os field. If you want to analyze operating system on forwarded exposures, query user_dimensions.os.

Event Export Table

If you log custom events through a Statsig SDK, Statsig also forwards those events into a configurable table in your warehouse. This is the table used when Warehouse Native customers rely on Statsig SDK logging for outcome events.
  • Use user_object for user fields associated with the event.
  • Use statsig_metadata for SDK and exposure-processing metadata.
  • Use company_metadata for the event metadata payload you logged.
Column NameData TypeDescription
user_idstringThe ID passed in as the Statsig userID
stable_idstringStatsig Client SDK managed stable device identifier
[your custom ids]stringOne column for every custom unitID you use on Statsig
timestamptimestampEvent timestamp
event_namestringName of the logged custom event
event_valuestringOptional event value
user_objectobjectWarehouse specific object containing user fields associated with the event
statsig_metadataobjectWarehouse specific object containing Statsig SDK and exposure metadata
company_metadataobjectEvent metadata payload logged with the event

Common Fields in user_object

user_object contains user fields associated with the event. It often includes the same common fields as user_dimensions, plus any additional non-null fields sent on the SDK user object.
FieldDescriptionNotes
osNormalized operating system nameCanonical OS field on the event-side user object.
os_versionOperating system versionDerived from SDK metadata or user agent parsing.
browser_nameBrowser nameDerived from SDK metadata or user agent parsing.
browser_versionBrowser versionDerived from SDK metadata or user agent parsing.
device_modelDevice modelDerived from deviceModel when provided.
ipIP addressPresent when available from the SDK or request context.
cityCityAdded when geographic inference is available.
stateState or regionAdded when geographic inference is available.
countryCountryDerived from request context or IP lookup.
localeLocaleForwarded or inferred when available.
languageLanguageForwarded or inferred when available.
appVersionApplication versionForwarded when present on the SDK user object.
sessionIDSession identifierForwarded when present on the SDK user object.
appIdentifierApplication identifierForwarded when present on the SDK user object.

Common Fields in statsig_metadata

statsig_metadata contains SDK-level and exposure-processing metadata associated with the event. These are some of the most common customer-facing fields:
FieldDescriptionNotes
deviceTypeHigh-level device categoryDerived from the normalized OS, for example Desktop or Mobile.
targetAppIDTarget app identifierSDK target app metadata.
statsigTierStatsig environment tierFor example prod or staging.
keyEnvironmentSDK key environmentEnvironment associated with the SDK key.
keyIDSDK key identifierUseful for debugging ingestion and environment issues.
samplingRateEvent sampling ratePresent when the event is sampled.
is_botBot classificationSet when Statsig classifies the event as bot traffic.
billing_typeExposure billing classificationCommon on exposure-related events.
groupIDExperiment group identifierCommon on exposure-related events.
ruleIDRule identifierCommon on exposure-related events.
continuous_rollout_idContinuous rollout identifierPresent for continuous rollout exposures.
configExposureTypeExposure config typeFor example dynamic_config or experiment.
isSwitchbackSwitchback flagPresent when the exposure is associated with a switchback experiment.
is_autotuneAutotune flagPresent when the exposure is associated with an autotune experiment.
Other SDK debugging and exposure-processing metadata may also appear in statsig_metadata.

What Goes in company_metadata

company_metadata stores the metadata payload logged with the event. This object does not have a fixed schema and will vary based on the event and your SDK usage. Most event-specific business context, such as price, currency, category, plan, screen, route, or nested objects like cart, items, and context, will appear here.