← Back to BlogTechnical

Geo Testing for Incrementality: A Step-by-Step Guide

Geo testing measures incremental ad impact by comparing markets where you advertise to markets where you don't. Here's how to run one properly.

Go Funnel TeamMarch 4, 20268 min read

Geo testing is the most trustworthy incrementality method available to most brands

Platform-native conversion lift studies are convenient but limited. They only measure one platform at a time, they rely on the platform's own data, and they can't capture cross-channel effects. Geo testing solves all three problems.

The concept: select geographic markets, turn off advertising in some of them (the holdout), keep advertising in the rest (the treatment), and compare business outcomes. Because the holdout markets receive zero paid media, any difference in conversion rates represents the true incremental impact of your advertising.

Google, Meta, and independent researchers consistently find that geo tests produce the most reliable incrementality estimates. A 2023 meta-analysis of 84 incrementality studies found that geo tests produced results within 5% of the true treatment effect, compared to 12-18% variance for user-level holdout tests.

Step 1: Select and match your markets

This is where most geo tests fail. Poor market matching produces unreliable results.

How to match markets properly

Pull 12-26 weeks of historical data for each geographic market (DMA, state, metro area, or postal code depending on your business). Calculate these metrics for each market:

Weekly conversion volume
Weekly revenue
Conversion rate trend (is it growing or declining?)
Seasonality pattern
Population or addressable market size
Historical ad spend per capita

Use Euclidean distance or Pearson correlation to identify market pairs with the most similar historical patterns. Your treatment and holdout groups should have parallel trends in the pre-test period -- this is the critical assumption for valid causal inference.

How many markets do you need?

The minimum viable geo test uses 4 markets: 2 treatment, 2 holdout. But more markets improve statistical power:

4-6 markets: Detects 20%+ lift with 90% confidence over 4 weeks
8-12 markets: Detects 10-15% lift with 90% confidence over 4 weeks
16+ markets: Detects 5-10% lift with 95% confidence over 2-3 weeks

For national brands, a common setup is 40 treatment DMAs and 10 holdout DMAs. For regional brands, metro areas or postal codes work better.

Randomization matters

Don't hand-pick holdout markets based on convenience. After matching, randomly assign matched pairs to treatment or holdout. If you pick your smallest markets as holdouts because "it won't hurt revenue much," you introduce selection bias that invalidates the results.

Step 2: Design the test

Define the treatment clearly

"Turn off ads" seems simple, but you need to decide:

Which channels? A full-media blackout (all paid channels) measures total advertising impact. A single-channel blackout (only Meta, only Google) measures that channel's incremental contribution.
What about organic? Keep organic social, email, and SEO running in both groups. You're measuring paid media incrementality, not total marketing impact.
What about retail? If you sell through Amazon or physical retail in holdout markets, those sales channels should remain active. The test measures advertising impact on your tracked conversion events.

Set the test duration

Longer tests produce more reliable results but cost more in forgone revenue. The minimum duration depends on your conversion volume:

High-volume (100+ daily conversions nationally): 2-3 weeks minimum
Medium-volume (30-100 daily conversions): 4-6 weeks minimum
Low-volume (under 30 daily conversions): 6-8 weeks minimum

Add your average purchase consideration window to the test duration. If customers typically take 14 days from first ad exposure to purchase, a 2-week test misses half the conversions. Run for 4 weeks minimum.

Calculate statistical power

Before starting, verify your test can detect a meaningful effect size. Use a power analysis:

Baseline: Average weekly conversions in holdout markets during the pre-test period
Expected lift: Your hypothesis (start with 15-20% if you've never tested)
Significance level: 0.05 (95% confidence)
Power: 0.80 (80% probability of detecting a real effect)

If the power analysis says you need 8 weeks but you can only run 4, you need more holdout markets, not a shorter test.

Step 3: Run the pre-test period

Before changing any media, run a 2-4 week "pre-test" observation period where all markets receive normal advertising. This establishes:

Parallel trends: Treatment and holdout markets should track closely during the pre-test. If they diverge before the test starts, your market matching is off.
Baseline metrics: You need pre-test conversion rates to calculate lift accurately.
Noise estimation: The natural week-to-week variance in each market tells you what fluctuation is normal versus what represents a real treatment effect.

Plot weekly conversions for treatment and holdout groups during the pre-test. The lines should be roughly parallel. If holdout markets are trending up while treatment markets are flat, the test will overestimate (or underestimate) the treatment effect.

Step 4: Execute the test

Launch day

Turn off paid media in holdout markets at midnight. Don't taper -- a clean start makes analysis easier.
Verify that ads are actually not serving in holdout markets (check delivery reports by geography).
Document any external events: competitor launches, weather events, local news, or holidays that could differentially affect treatment vs. holdout markets.

During the test

Do not adjust budgets or targeting in treatment markets. The budget freed up from holdout markets should be turned off, not reallocated. If you increase spend in treatment markets to compensate, you're testing increased spend, not the presence of advertising.
Monitor for contamination. Customers in holdout markets might see your ads on national TV, streaming platforms, or while traveling. Note any national campaigns running during the test that could reduce the observed treatment effect.
Track offline effects. If you have physical stores, phone orders, or Amazon sales, track those in both treatment and holdout markets. Some of the advertising impact may shift channels rather than disappear.

Step 5: Analyze the results

The basic calculation

Incremental lift = (Treatment conversion rate - Holdout conversion rate) / Holdout conversion rate

But for geo tests, you need to account for pre-test differences. Use the difference-in-differences (DiD) method:

DiD lift = (Treatment post - Treatment pre) - (Holdout post - Holdout pre)

This removes any pre-existing differences between markets, isolating the effect of the media change.

Handling the statistical analysis

For geo tests with matched market pairs, a paired t-test on the market-level differences works. For randomized market groups, a two-sample t-test on aggregated conversion rates is appropriate.

More sophisticated approaches:

CausalImpact (by Google): A Bayesian structural time-series model that creates a synthetic control from pre-test data. It estimates what would have happened in treatment markets if the treatment hadn't been applied.
Synthetic control method: Creates a weighted combination of holdout markets that best matches the treatment market's pre-test behavior, then measures divergence during the test.

Both methods handle seasonality, trend, and regression to the mean automatically.

Calculating incremental CPA and ROAS

Incremental conversions = Total treatment market conversions - (Holdout conversion rate x Treatment market audience size)

Incremental CPA = Total test-period spend / Incremental conversions

Incremental ROAS = Incremental revenue / Total test-period spend

Step 6: Validate and apply

Sensitivity checks

Remove the single largest and smallest holdout markets and recalculate. If the result changes dramatically, your test is fragile.
Check for "leakage" -- did any holdout markets receive impressions from national campaigns or cross-market digital delivery?
Compare your geo test results to platform-reported metrics for the treatment markets. The gap between incremental and platform-reported results is your "attribution inflation factor."

Applying results to budget decisions

A geo test tells you the incremental value of your current spend level. It does not tell you what would happen at higher or lower spend levels. Use the result to:

Justify current spend (if incremental ROAS exceeds targets)
Identify waste (if incremental ROAS is below targets)
Set a baseline for future tests (re-test quarterly to track changes)
Calibrate your attribution model (adjust platform-reported numbers by the observed inflation factor)

Frequently Asked Questions

How do I handle geo tests for purely online businesses with no physical presence?

Geographic targeting works for online businesses through IP-based geo-targeting. You can restrict ad delivery by DMA, state, or metro area in Meta, Google, and most programmatic platforms. The holdout markets still generate online conversions -- you just measure whether those conversions decrease when ads are paused. One nuance: VPN usage and mobile location inaccuracy can cause 3-5% of impressions to leak into holdout markets. Account for this by applying a small correction factor based on estimated leakage rates.

What's the minimum ad spend needed for a reliable geo test?

The constraint isn't spend -- it's conversions. You need at least 100-200 total conversions across your holdout markets during the test period for the results to be statistically meaningful. Work backward: if your holdout markets represent 10% of your geography and your business generates 1,000 conversions per month nationally, holdout markets should generate roughly 100 conversions per month. A 4-week test would give you about 100 holdout conversions, which is borderline. Either extend to 6 weeks or increase your holdout to 15-20% of markets.

Can geo tests measure brand advertising effectiveness?

Yes, and this is one of geo testing's biggest advantages over platform-based measurement. Brand campaigns (TV, billboards, podcasts, YouTube pre-roll) are difficult to measure with pixel-based attribution because they don't generate direct clicks. Geo tests bypass this limitation entirely. Run brand advertising in treatment markets only and measure the difference in total business outcomes (all-channel conversions, search volume for branded terms, direct traffic, store visits). Several studies have shown that brand campaigns produce incremental lifts of 5-15% in treatment markets, with the effect persisting 2-4 weeks after the campaign ends.

Go Funnel uses server-side tracking and multi-touch attribution to show you which ads actually drive revenue. Book a call to see your real numbers.

Want to see your real ROAS?

Connect your ad accounts in 15 minutes and get attribution data you can actually trust.

Book a Call

Attribution