A/B testing is a powerful tool for anyone looking to improve their marketing performance. Compared to other types of testing, these tests are often easier to implement and interpret, especially for beginners. They let you find out whether a specific change gets you closer to the results you want…or not. A/B testing can be used for most marketing materials or channels where you can measure the results–email, websites, direct mail, PPC ads, etc. For the purposes of this article, we’ll be focused on using it to improve your website.
With these tests, you compare two different versions of your website to determine which version will get you the most traction. Typically, your experiment’s success is measured by clicks, conversions, or engagement. With A/B testing software, like Zoho PageSense, Google Optimize, or Adobe Target, you can send part of your traffic to one or more modified versions of your site. The rest of the traffic goes to the original version. After you’ve run the test on a statistically significant number of visitors, you can see which version performed the best. And if there’s a strong “winner,” you can push that version to your live site.
Why use A/B testing?
Are you getting disappointing results from your website, and looking to make a change? Or maybe, you have some ideas to improve your site and you want to make sure you’re making the right decision? A/B testing can help.
You may discover that running even a few experiments on crucial spots of your website can create big gains from relatively small tweaks. Because of this, every business with an active website should be performing regular testing of some kind. Much as regular doctor’s appointments are key to maintaining good health, regular testing will help you diagnose and address any potential website shortcomings. And, like we already mentioned, A/B tests are one of the easiest experiments you can perform, making them a great place to start.
Having a regular testing schedule ensures that you don’t overemphasize the results of a single experiment. Comprehensive testing gives you a complete picture of your user base. You can learn which kinds of changes appeal to them, and which don’t. The peace of mind that comes with scientifically vetting your website decisions—rather than going on instinct alone—is priceless.
How do I perform an A/B test?
Step one: Prepare
Before setting up your first experiment, you need to plan how to structure and prioritize your experiments. This does two things:
- Makes sure you’re focusing on specific, actionable changes
- Gives you a next step to move to, if your first experiment doesn’t yield strong results
Start by creating a ranked list of the areas of your website you want to improve. Make sure to take into account both ease of implementation and the potential for measurable gains. This list can serve as the outline for your long-term testing plans, with changes being made as you continue to learn.
While the potential benefits of A/B testing are exciting, it’s important to pace yourself to avoid disappointment. It’s good to establish a “set it and forget it” mindset with your testing. After all, unless you’ve got a massive user base, it’s going to take some time to get significant data, and you don’t want to burn yourself out.
You’ll also want to resist the temptation to run multiple tests at the same time. While simultaneous testing is technically possible, it comes with risks and complications that can pollute your data, unless approached very carefully. It’s better for beginners to avoid these potential complications by sticking to one test at a time.
Step two: Run the test
Once you’ve decided where you want to start, the only thing left to do is perform your first test.
Choose your variables
First, you’ll pick your independent variable–the element you’ll be modifying in your test. This could be the placement of a CTA on your homepage, or a new background color on your checkout screen. Then, you’ll determine your dependent variable, which is the metric you’ll use to measure the success or failure of your change. With the previous examples, you might choose the number of clicks on your CTA, or the number of completed purchases in your checkout process.
Decide how long it will run
Finally, you’ll need to decide the runtime of your experiment. The majority of testing time is spent simply waiting for your tool to collect enough data to identify a strong pattern, which makes it hard to know when to end it. For those of us who didn’t excel in statistics, navigating these concepts can feel like a real chore. Thankfully, most testing tools handle the heavy lifting and provide preset guidelines for determining experimental validity.
That said, there are a few key concepts to keep in mind to help you validate your data:
Statistical significance
Statistical significance measures how confident you can be that your results aren’t a fluke. Think about it this way:
If you flip a coin three times in a row, and every time it comes up tails, you might conclude that the coin will always land on tails. But with such a small sample size, this conclusion would have a low statistical significance. However, let’s say you flipped the coin another thousand times. After that, you’d be able to make more confident and credible predictions about the next thousand flips and the thousand after that.
The larger your data set, the greater the statistical significance. The greater the statistical significance, the more confident you can be that your data represents a fundamental trend in your user base.
How to set significance and validity
While not universal, reaching 95% statistical significance makes it a safe bet that you have enough data. This is a good balance between validity and expediency. After all, the higher you want your statistical confidence to be, the more data points (in this case, website visitors) you need.
Depending on the kind of element you’re testing and how much traffic your site gets, you may want to increase or decrease your experiment’s statistical validity. For example, let’s say you’re testing a site element that doesn’t affect your revenue directly–something like the header image or color. In that case, you may want to reduce statistical significance to 90%. This will allow you to pick a winner more quickly, especially if you don’t get a lot of traffic yet.
However, you may be testing a critical element that will impact your revenue directly—your pricing page or main CTA, for example. In that case, you could consider increasing the statistical validity to 99%. At that point, you’ll be able to move forward with a very high degree of confidence. There is a trade off in testing time, though. Increasing from 95% to 99% can double the number of visitors you’ll need, which can double the experiment duration.
It can take a long time for an experiment to reach statistical significance. The exceptions tend to be businesses that get a massive amount of traffic, or a chance that has an immediate and measurable impact on conversations. If you change something and it increases the conversion rate by 200%, that will be identified much more quickly than a chance that only increases conversions by 5%.
However, there are certain advantages to a test having a long runtime. It’s often recommended to collect data for at least two sales cycles to avoid any anomalies, such as national holidays or weekend lulls that could skew your results.
Step three: Interpret results
Once your test has been completed, you have a decision to make. If there’s a very strong trend, that decision is easy. All you need to do is implement the winning version and move forward with your next test.
But what if you don’t have a strong result? What if both versions are only a few points apart, or even a virtual tie?
This could mean that there wasn’t a big enough difference between your test versions. A more radical change might result in a stronger response from your visitors. It could also mean that the variable you’re testing doesn’t make a big impact on customer decisions. In that case, you’ll need to decide if it’s worth the effort to continue tweaking the same experiment, or if you should move on to the next one.
This is why it’s important to have a long-term testing schedule already in place. Having that longer-term plan and schedule lets you move onto the next experiment without becoming hyper-fixated on one underwhelming outcome. You can then make a plan to revisit your experiment later, once you’ve had more experience under your belt. Even if each test only results in small improvements, when you look back later, the overall impact will add up.