A beginner's guide to A/B testing

An Introduction to A/B testing

If you're thinking of running A/B tests, you've likely scoured the first couple pages of Google results looking for best practices, how-tos, and ideas.

We wanted to give you a single place to address your A/B test concerns. Add this page to your A/B test research repertoire for questions that might cross your mind before, while, or after you run a test.

What is A/B testing?

Let's call your current web page A. You decide that modifying this page in a certain way might increase its conversions. Let's call this new, modified version B.

A/B testing is presenting two versions of your web page (A and B) to an audience sharing common characteristics at random to find results that support or disprove your hypothesis.

Original/Version A: Control
Version B/n: Variant

If you decide to create more versions of your control (A), it becomes an A/B/n test.

The primary objective of A/B testing is to improve user experience, eventually increasing the chance of converting visitors to leads. By asking focused questions about changes to your website, you can collect data about the impact of those changes.

Difference between A/B, split URL, and multivariate tests

A/B test: Testing versions of a web page that contain small differences.

Split URL test: A version of an A/B test that involves a complete design overhaul of the original page while maintaining its core idea.

Multivariate test: An advanced A/B testing strategy in which you test multiple variables simultaneously. For example, you're running a multivariate test on a landing page if you decide to create two or more variations with updated headline text, different signup form length, and distinct screenshots.

Learn more about the difference between an A/B test and split URL test.

Can A/B testing penalize organic search ranking (SEO)?

The idea that A/B testing harms search engine rankings because your page could be classified as duplicate content is, in fact, a myth. Google will not penalize your site for running A/B tests as long as you don't present different versions by user agent (browser details, operating system, and device). If you're still concerned, you can always add a "no index" tag to your variation pages.

A/B test prep with quantitative and qualitative analysis

You must have a clearly defined hypothesis before you set up an A/B test. A crucial part of a winning hypothesis? Identifying the problem you want to solve. And to develop a clearly defined hypothesis, you must conduct a detailed qualitative and quantitative analysis.

Quantitative analysis

Track website metrics to zero in on pages that have relatively high traffic yet also a considerably high count of bounces/dropoffs. Continue to analyze visitor behavior on these pages with heatmaps, funnel analysis, and session recordings. Reports from each of these will reveal points of friction on a web page.

Qualitative analysis

Create a one-on-one connection with your visitors using onsite polls, in-app surveys, and usability testing. Collect feedback about user experience and isolate instances that prevent visitors from converting.

Combine data from both of these to identify a problem your visitors are facing that's costing you conversions.

For example: Your problem might be visitors are dropping off the product page because they aren't sure if the product fits their requirements. You speculate this is because neither the images nor the copy touch on the product specification clearly. So your hypothesis might be that adding more product details will result in increased conversions.

Once you've collected sufficient insights from your quantitative and qualitative analysis, you can move on to the next stage of setting up your A/B test: defining a hypothesis.

Defining a winning hypothesis for an A/B test

A hypothesis is an idea or theory you have that you're testing, with the goal being either proving it or rejecting it. It's important that a hypothesis be based on a clearly defined problem, and, more importantly, that it's firmly grounded in data and has a quantifiable result.

For example, you your visitors' form-filling patterns and find that a large portion of your visitors abandon the form when they encounter the phone number field. Your hypothesis in this case could be: "Removing the phone number field from the form will increase successful form submissions."

You now test this idea by creating a variation without the phone number field in the form and then measuring the number of successful form submissions. If the number increases, you've proved your hypothesis was correct. If it stays the same or decreases, you reject your hypothesis as incorrect.

In general, every winning hypothesis has three elements: A problem backed by evidence, a change, and a measurable outcome.

Having a clearly defined hypothesis:

Helps develop reliable conclusions from the test, even if it's inconclusive
Makes it easier to map conversion lifts to specific tests
Will help you build robust hypotheses for subsequent tests for incremental increases in conversions

Check out this template with an example of setting up a solid testing hypothesis.

Creating variations and crafting changes in an A/B test

You might have come across case studies of a conversion rate increasing while testing different colors of a CTA, an alternate copy, or even a different hero image. Don't let these mislead you into believing that you can only change one core element at a time (and most certainly do not believe that one small tweak can lift your conversions).

You can tweak your variation in all kinds of ways, as long as it matches the hypothesis's intent.

For example:

Do	Don't
Hypothesis: "Making the CTA more prominent will draw visitors' attention and consequently get more clicks."	Hypothesis: "Changing the CTA color will draw visitors' attention and consequently get more clicks."
Change: Increasing CTA size Moving it to a fold that gets the most engagement Updating the CTA text to a more action-based one Giving it a new color that matches the page's palette while also standing out	Change: Giving the CTA a shade that is in stark contrast to the page to make it pop

Don't

Hypothesis:

"Making the CTA more prominent will draw visitors' attention and consequently get more clicks."

Hypothesis:

"Changing the CTA color will draw visitors' attention and consequently get more clicks."

Change:

Increasing CTA size
Moving it to a fold that gets the most engagement
Updating the CTA text to a more action-based one
Giving it a new color that matches the page's palette while also standing out

Change:

Giving the CTA a shade that is in stark contrast to the page to make it pop

Deciding the number of variations to test

There is no cap on the number of variations you can add to your A/B test! But remember that with each variation you add to the test, you are increasing the chances of the test returning a false positive result. Even if you absolutely need to run tests with more than two variations, try to make each starkly different from the others. In general, however, you should run small tests with distinct variations to get reliable results.

Deciding which element to test

You can test CTAs, copy length, copy content, layout, and more. There is no set rule for what you should test on your website; anything you can quantify the effect of can be tested. The key here is that the changes in the variations should be based on a shared theme and work together to solve a common problem.

If you're looking for inspiration, here are 20 A/B test ideas to get you started.

Understanding statistical significance and its importance

Achieving statistical significance in your tests means that when the statistical results of your test come back showing which one resulted in more conversions, they support the idea that the increase in conversions is directly tied to the changes you made and are not simply due to chance.

For example, if you set the statistical significance at 95%, that means there is a 5% or less chance that the improvement in conversions from the changes you made in the variations is due to pure luck or chance.

You might see a clear winning variation while running your tests—if you have substantial traffic, you might see one variation performing better than the others in a matter of days. However, if the test hasn't reached significance, you shouldn't jump the gun and implement the changes of the winning variation. To carry forward the conversion lift from the test into your original website, you must monitor the statistical significance of the test closely.

Splitting traffic between variations in an A/B test

There isn't a magic number set for a visitor count per variation that makes the test successful. The right visitor count depends on the number of conversions each variation needs for the test to reach statistical significance. This count—the conversions you get per variation—relies heavily on the duration of the test and the change in each variation.

For example, say a SaaS business runs a test on the length of a signup form on its home page. With at least 500 visitors per day (~250 per variation), the test can reach significance in eight days. Whereas, another test on an inner level page, comparing different copies, might not reach significance in spite of having 1000 visitors (~500 per variation) per day for 20 days.

In the former case, in spite of having fewer visitors than the latter, one variation clearly overpowered the other when it came to the conversion count per variation.

So, the number of visitors each test needs to be successful is a variable value. You have to watch it for statistical significance.

Allotting a variation to a visitor on cookie value

So, does a visitor see the original in one session and the variation in the next? No, they don't—once a visitor lands on your site, they are assigned a cookie. This cookie is used to track user information in tests. It helps determine if a website visitor is part of a test, and if the goals set for the test are accomplished by the visitor. They also track the variation a website visitor has viewed and are used to serve the same variation to the visitor consistently over multiple visits.

The number of cookies used to track your visitor during tests varies across tools and has a lifetime value (usually of 13 months). During this period, if the same visitor visits the control/variant multiple times, they will still be counted as a single visit.

Running multiple tests on overlapping traffic

If you're running multiple tests simultaneously, be meticulous about the shared traffic between these tests. Unless the interaction is minimal between the tests, it's best not to run multiple tests with overlapping traffic. Such tests, if set up carelessly, will end up showing you false positive reports.

Even if you do decide to run such tests, make sure you split the traffic evenly between intertwined user experiences.

For example:
The usual course of a visitor journey on your website is: wishlist ---> cart

Say you're running an A/B test on the wishlist page (W1 vs W2) and another A/B test on the cart page (C1 vs C2). Rememeber to split the traffic 50-50 from W2 to C1 and C2.

Deciding duration and frequency of A/B testing

Ideally, you should run an A/B test long enough to have a sample inclusive of visitors from various sources. Say you've run a new social media campaign to bring visitors to your site, and you see a lift in the conversions—do not pause the test, even if it reaches statistical significance. This sample is biased (selection bias), and is not representative of the average visitor persona mix that lands on your website.

Additionally, make sure you account for a change in the influx of visitors over weekdays and weekends, through various traffic sources, and via ongoing marketing activities. All of these come together to form a complete business cycle. For most businesses, this lasts 2-4 weeks.

Once you have the sample locked and the test reaches statistical significance (95% or more), you can stop the test and begin to derive meaningful insights from it.

In short, to stop a test:

Check if you've avoided selection bias
Make sure you've accounted for a complete business cycle
Verify if your test has reached the statistical significance you configured it for—ideally 95% or more

Deciding testing frequency

You should never stop optimizing. You can run tests all month long, across multiple business cycles, or more—just ensure that all your tests are grounded in a solid hypothesis. With every test, you'll be able to understand your visitor a little better; you can use this knowledge across other marketing channels like email blasts, PPC ads, and content generation.

Gaining actionable insights from A/B tests

Each A/B test whets your marketing strategy, sharpening it to match your visitors' requirements, and lifting your conversion rates over time. For instance, you want to monitor the conversions from the top 5 geo-locations that consume your services. Let's say they are: United States, UK, Canada, Australia, and India. You can run tests targeting audiences specifically from these locations and personalize their website experience based on the test results. You can even drill down the A/B test reports to find how the audience sub-segments behave under testing conditions. Continually tweak this cycle to find various versions of your website resonates best with your specific target audience segments.

Finally, remember that not all tests will have a winning variation. However, an inconclusive result is still a learning opportunity for your team. You should get down to the nitty-gritty details about your visitors in such tests. Even in this case, you can segment reports to find the difference in conversion behavior across demographics. You can also analyze micro conversions such as time spent on page and overall engagement. Finally, don't discard the hypothesis—instead, try a different approach to test it.

Learn a hack to make the best of any inconclusive A/B test.

BONUS:

Running A/B tests on major forms of visitor interaction

If you have the right tools, you can run A/B tests on any type of interaction you have with your visitors. Here are two such examples:

Emails: Test out two or more versions of the emails you send out. For instance, you can run a test between a plain text email and template-based one to check which type gets more clicks or responses.

PPC ads: Google Ads and several other PPC ad platforms have introduced settings to run A/B tests on the ads that you put out. This is a great way of optimizing your ad expense while sustaining the conversions that come from each campaign.

How to create an A/B test?

Learn how to create and launch an A/B test with Zoho PageSense.