Should you run multiple A/B tests at a time?
With the increasing costs of website traffic and the decreasing attention span of users, the velocity of A/B testing is becoming a critical metric. That’s when you start to wonder if you can test more of those hypotheses you have written down.
Marketers and especially business owners are tempted to want to test something on every single page of their website and funnel. What they’re not taking into consideration, though, is the fact that this can often times cause more damage than good.
Ever wondered if running multiple A/B tests at the same time can lead to poor decision-making or the implementation of a variant that is less than ideal for your audience?
In this article, I will try to explain a bit about our process of testing and how we deal with testing multiple variants without increasing traffic costs or time needed for testing.
What happens when you’re running multiple A/B tests at a time?
The answer to this question is long and complicated but, basically, you want to make sure that your experiments are not testing conflicting elements and that the interference between them is not extremely high.
In order to better understand it, let’s consider we have the following example:
E1 – Experiment 1 – Testing 2 variations. E1 is on the Product page of ecommerce website.
E2 – Experiment 2 – Testing 2 variations. E2 is on the cart page of ecommerce website.
Let’s say you want to test adding a new section on the product page to increase trust with your users and you also want to test adding testimonials on the cart page, for added social proof.
Your options would be:
- Launch the two tests at a time, without considering any interference between them
- Run the tests sequentially: Once E1 is finalized, launch E2.
- Roll the tests at a time but on different segments of users (using testing lanes)
The safest choice to have a relevant result would be to go with option number 2, which is to run one test fully and then launch the next test.
However, this would definitely not be beneficial to your testing velocity and would incur considerable testing costs. Basically, you would probably be able to run somewhere between 6 and 12 tests a year, maximum, on most small to medium websites.
Option number three doesn’t present itself well either, as you’re likely to have users that are exposed to one combination of the experiments but also combinations that get no users so that’s far from ideal.
As you can see in the image above, you will have users that will not be exposed at all to combination BB from the two experiments.
If you would go with option number 1, you would be completely disregarding the interference between the tests and then you’re exposing yourself to releasing an unvalidated user experience on your live environment.
The main problem with running concurrent A/B tests is that you increase the risk of a Type I error or false positives. This makes it harder to gain some actionable insights that you can rely on.
We encounter that most of the businesses that are requesting testing multiple items at a time are smaller businesses that are the most impacted by high daily average variance.
There isn’t a one-fits-all formula
There doesn’t seem to be a full agreement on this subject yet so you would still need to ask yourself what is more important to you and what will help you reach your business objectives.
It could be that you treasure accuracy more or you treasure testing velocity more. There is that question on whether you’re in the business of science or in the business of making money, which I can’t really say I agree with.
You need to ask yourself what the real possibility of interference is and how it would affect the results. The success of your A/B testing program will be determined by the number of tests you run and the percentage of winning tests and their impact on your bottom line. Now, if you end up drastically cutting down the number of tests you run in order to avoid data being polluted, you risk not testing enough and then your CRO program will most likely fail.
What we do with some of our clients with lower traffic volumes is that we define different conversion tracks and funnels and make sure we prevent the overlap of tests in that way.
Going back to our example above, we would firstly monitor the impact of the change on the product page on the “add to cart” conversion and then we would see from all the people on the cart how many are actually initiating the checkout on each of the variations. The funnel is a sum of steps and we think that by improving each step along the way, we maximize our clients’ chances of getting more conversions – we prioritize the main conversion rate on each step of the funnel and then we analyze how all the changes combined impacted the overall conversion rate.
It might not be the most orthodox way of doing it but our clients are able to be one step ahead of their competitors this way. In the end, their competition is not stopping so why should they?
So, for the TL;DR, here is my take on running multiple tests at a time:
- Try to test separate conversion paths – unless you’re really testing conflicting hypotheses and fear a massive pollution effect, just do it and keep an eye on the results with advanced segmenting in Google Analytics.
- Run tests on different segments of traffic – just create mutually exclusive tests or define your tested audience in such a way that there is minimum overlap. You will be able to do this with higher traffic volume websites.
- Put your tests together and run them as a multivariate test – if you have the traffic for it and if you’re testing for the same conversion (purchase, lead submit, etc).