AB Testing Sample Size of 4
Tom Fishburne, marketoonist.com

The goal of performing an A/B test is to get the statistically significant results required to make a sound, data-driven decision. Whether you’re testing a landing page, ad creative or something else, an accurate and useful test must have the proper sample size. Otherwise, making decisions without data to back them up is risky and costly.

We know all too well, however, that the answer to “what sample size is big enough for A/B testing?” is often “it depends.” But that type of answer only leads to more confusion. To avoid the complications, and for A/B testing to be a practical solution, all you really need are a simplified approach and a tech tool or two.

Proper Sample Size Is a Requirement for Trustworthy Results

How do you know if and when your A/B tests will reach statistical significance? How do you defend the validity of your test results when concerns arise from stakeholders? You ensure you have the proper testing sample size from the get-go.

Running an experiment with too limited of a sample size may lead to inaccurate results that can’t be trusted. On the other hand, running a test beyond the necessary threshold required to evaluate an outcome results in wasted time and resources that could be spent elsewhere. This is why the calculation is so important.

Sample Size Isn’t One-Size-Fits-All

There isn’t one sample size number that fits every single experiment you’ll ever run. Although developing a single testing rule would be convenient and simple, it won’t result in the best outcome. Sample size should differ depending on factors such as traffic, conversion rates and level of confidence.

How to Determine the Proper A/B Testing Sample Size

Determining the proper A/B testing sample size requires some technical math. In its most basic form, the formula breaks down into these specific inputs:

    • Conversion rate: The first input is the current conversion rate of the landing page or element you’re attempting to improve. In A/B testing speak, it’s the “current conversion rate of your control.”
    • Minimum detectable effect: The minimum detectable effect (MDE) is a calculation estimating the minimum change in conversion rate you want to detect. This is the input that will determine how sensitive your A/B test will be. For example, you may wish to detect a 15% lift in conversion rate at the end of your test. This is your MDE.
    • Statistical significance: You must decrease the probability of random chance from your experiments. Statistical significance explains the likelihood that the difference in your baseline conversion rate and the conversion rate of the variation is not just luck. According to Optimizely, “A result of an experiment is said to have statistical significance, or be statistically significant, if it is likely not caused by chance for a given statistical significance level.” The standard percentage for statistical significance is 95%.
    • Statistical power: The power of your test is the probability that it will detect differences or reach a result. The higher the power, the higher the probability of difference detection if differences exist.

Source: AB Tasty

In the above example, you would need a sample size of approximately 30,244 visitors per variation to achieve detectable effects of 10% or higher.

How Long Should You Run Your A/B Test?

A/B test duration depends on these same factors plus your average daily visitor amount and the number of variations you’re testing. Just like sample size, the duration depends on each experiment. A minimum of 100 conversions is the standard.

We typically recommend performing an experiment that spans one entire business cycle. For example, consumer shopping activity varies by the week. To run an accurate test, you’ll need a minimum of one of these weekly cycles to best understand consumer behavior.

Sample Size Calculation Tools

Technology takes the guesswork out of constructing the detailed formula used to calculate sample size. There are many calculators and tech tools available to help you quickly identify the right number. Here are some of our favorites:

    • AB Tasty: AB Tasty’s A/B Test Sample Size Calculator helps you determine the correct sample size, as well as the advised duration for your experiment. They also include a Minimum Detectable Effect Calculator to use if your effect of variation is difficult to predict.
    • Optimizely: Optimizely’s A/B Test Sample Size Calculator uses a “two-tailed sequential likelihood ratio test and false discovery rate controls” to calculate statistical significance. This means you don’t have to use the calculator to ensure the validity of your results. Instead, you can use it to determine how long it will take to see if your results are significant.
    • VWO: VWO’s A/B Test Duration Calculator helps you determine the duration of your test based on estimated existing conversion rate, minimum detectable effect, number of variations and average number of daily visitors.
    • Cro Metrics GrowthMap: Want to review program performance at a glance at any time? Our GrowthMap dashboard gives you a quick view of your completed tests that includes wins, insights and revenue impact. You can also view all live and upcoming tests along with the associated audience and primary metric summaries. We’ve integrated the sample size formula into the dashboard for easy planning and testing.

Troubleshooting A/B Sample Size Testing Problems

Unequal Sample Size

In some tests, you may have an odd number of variables, resulting in unequal traffic allocation. Or a stakeholder may request an experiment that calls for an unequal split. For example, you may need to try and run a test by allocating 20% of your traffic to the control and 80% to a variation.

This can be done. Yet, there are some consequences. Your test may have less statistical power. You may also need to run the test longer to achieve statistically valid results and ensure that the variation is driving a positive user outcome.

Small Sample Size

Sometimes, a sample size calculation will require you to have a level of traffic you know you can’t reach. For example, the test may call for 15,000 visitors, but you can only reach 5,000. In this case, you’ll need to assess what level of confidence you’re willing to accept.

Smaller sample sizes will reduce your level of confidence. It’s common practice to try and reach a 95% confidence level. Yet, this may not be possible for a smaller sample size. If it would take too long to reach a significant result, it might be time to table your test or reassess the level of risk you’re willing to accept.

If you’ve already tested, you can also consider the conversion lift. Is the increase considerably higher than the control? If it’s not, there probably isn’t enough proof behind the result to make the change. Rest assured, data doesn’t go to waste—use what you learned to inform your next decision.

Minimal Test Performance

In this scenario, you may calculate a sample size based on a 5% minimum detectable effect. However, when you run the test, you find that the effect is actually just 1%. In this case, you must rerun the sample size calculation to estimate the test duration. Then, you must determine if the test is worth running to the necessary sample size or pivot.

What To Do If You Can’t Run a Test You Want To Run

Any experience can be tested—it just comes down to compromise. Time and resources are always limited. It’s important to rigorously evaluate your ideas and assess the potential, level of effort, complexity and sample size necessary to estimate test duration.

Remember, even if you continue with your test and can’t reach the ideal sample size you calculated, you can still use the data gathered to make a more informed decision in the future.

“Conversion rate optimization must be a blend of science, math and creativity. Some people are so incredibly math-driven that they tie their own hands. For example, they may say, ‘If we don’t have 99% confidence, we can’t do anything.’ But in the real world, optimization decisions aren’t always perfectly clear. The truth is, you’re running a test to gather data to assess risk/opportunity to help make a more informed decision—not guarantee results. You may not always have the highest level of confidence, and external variables are always changing, but you’ll be better enabled to make decisions you previously couldn’t without the data.”
—Grant Tilus, Sr. Growth Product Manager, Cro Metrics

Of course, it would be remiss of us to not remind you that, without statistical confidence, there will be an additional risk. Always complete a calculation when evaluating test ideas for prioritization and to ensure they’re worth your while.

Learn More About A/B Testing With Cro Metrics

When performing marketing experiments, the stakes are high. Without properly determining A/B sample size, you run the risk of spending time and cash on ineffective tests. These calculations should be a critical part of testing, just as testing should be an extension of your planning process.

We know A/B testing can seem complicated; we’re here to support you. To learn more about A/B testing and fostering a culture of experimentation within your organization, subscribe to our CRO newsletter.