A Definitive Guide to Converting Failed A/B Tests Into Wins

As a marketer, there aren’t many things sweeter than running successful A/B tests and improving your website conversion rate.

It’s sweet because getting a winning A/B test is hard work.

To carry out a successful A/B test, marketers need to follow a robust process. They need to develop data-driven hypotheses, create appropriate website variations, and test on targeted audience. And even by following such a structured process, marketers tend to win just one out of three A/B tests.

[What’s more worrying is that the percentage of winning A/B tests overall is only 14% (one out of seven). That’s largely because most of the marketers still don’t follow a documented process for A/B testing (and CRO as a whole). For instance, only 13% of eCommerce businesses base their testing on extensive historical data.]

 

But, here is a good news: Your failed A/B tests can still be of value.

By analyzing the A/B tests that didn’t win, you can highlight flaws in your approach, improve the tests, and even identify hidden winners.

This post talks about the key things you can do after encountering an unsuccessful test.

For convenience sake, we’ve segregated unsuccessful tests into two parts: inconclusive tests, and tests with negative results.

When A/B Tests Give Inconclusive Results

An inconclusive result is when an A/B test is unable to declare a winner between variations. Here’s what you need to do with such a test:

Finding Hidden Winners

Even when your A/B test hasn’t found a winner among different variations, there are chances that you can still uncover wins by slicing and dicing your test audience.

What if the A/B test produced results for specific segments of your traffic (segmented the basis of traffic source, device type, etc.)?

This scenario is similar to the Simpson’s Paradox. Let’s understand it with a simple example.

A gender bias study among the UC Berkeley admissions in 1973 showed that men had a higher chance of being admitted as compared to women.

Simpson's paradox example

However, the department-specific data showed that women had a higher admission rate for most departments. Actually, a large number of women had applied for departments with low admission rate (in contrast to a small number of men).

Simpson's paradox example 2

We can see how multiple micro-trends skewed the overall study result.

Likewise, an A/B test can be found working for some traffic segments and not working for some, leading to an inconclusive result.

You can reveal hidden winners (traffic segments where an A/B test delivered results) with post result segmentation.

For instance, you can find if your website conversion rate improved specifically for new visitors or old ones; for paid traffic or organic traffic; or for desktop traffic or mobile traffic.

The analysis can help you identify segments that have the most potential. For example, your inconclusive A/B test might have increased conversions for “returning visitors.” You can run a new (or the same old) test targeting only the returning visitors.

Post Result Segmentation for an A/B Test

Nonetheless, it’s essential to observe the number of visitors for each segment. The conversion rate and other data points for different segments can only be trusted if the individual segment traffic is large enough.

Tracking the Right Metric(s)

The effectiveness of an A/B test’s result depends largely on the metric you’re tracking.

A lot of times, A/B tests aim at improving only micro-conversions for a website. Mostly, that’s either because the test is carried out at a beginning stage of a conversion funnel or on less-critical web pages. Such tests do not track changes in a website’s macro conversions, and fail to notice any rise in the bottom-line (sales/revenue).

When your A/B test is inconclusive, you need to check if you’re optimizing for the correct metric. If multiple metrics are involved, you need to analyze all of them individually.

Let’s suppose, you run an eCommerce store. You create a variation for your product description page that mentions “free shipping,” with the objective of increasing add-to-cart actions (a micro conversion). You A/B test the variation with the control page that gives “no information on shipping.” To your surprise, the test couldn’t come up with a clear winner. Now, you need to see whether the variation boosted your revenue (macro conversion), or not. If it did, the reason can be simple: the “free shipping” variation might have led only the users with high purchase-intent to the checkout page, thus, increasing the number of conversions.

If you realize that you weren’t tracking the most relevant metric with your A/B test, you need to edit the test with new goals. With new metrics in place, you can run the test for a while longer, and find improvements.

It’s advisable to keep your eyes on both micro and macro conversions.

Micro and macro conversions

Analyzing Visitors’ Behavior

Using on-site analysis tools, you can uncover a lot of insights which plain data just can’t offer. With the help of heatmaps/scrollmaps and visitor recordings, you can observe the behavior of your users (A/B test participants) and find probable causes that led to an inconclusive test.

Heatmaps can tell you if the element you’re testing is going unnoticed by most users. For instance, if you’re testing a variation of a CTA button that lies deep down the fold, heatmaps/scrollmaps can highlight the number of users that are reaching the CTA button. An A/B test might be inconclusive if only a handful of users are reaching the CTA button.

Here’s how a scroll map looks:

Scroll Map - VWO Pricing Page

In the same case, visitor recordings can show you how users are interacting with the content and elements above the CTA. With high engagement above the CTA, users might have already made up their mind about their next action (a conversion or an exit). Hence, any changes in the CTA would not affect users and would result in an unsuccessful A/B test.

Apart from giving insights on specific pages, visitor recordings can help you understand user behavior across your entire website (or, conversion funnel). You can learn how critical the page on which you’re testing is in your conversion funnel. Consider a travel website where users can find holiday destinations using a search box and a drop-down navigation bar. An A/B test on the navigation bar will only work if users are actually engaging with it. Visitor recordings can reveal if users are finding the bar friendly and engaging. If the bar itself is too complex, all variations of it can fail to influence users.

Double Checking Your Hypothesis

Whenever an A/B test fails to provide a result, the blaming-fingers invariably point to the hypothesis associated with it.

With an inconclusive A/B test, the first thing to check is the credibility of the test hypothesis.

Start with reviewing the basis of your hypothesis. Ideally, all your test hypothesis should be either backed by website data analysis or user feedback. If that’s not the case, you need to backtrack, and validate your hypothesis with either of the two methods.

When your hypothesis is, in fact, supported by website data or feedback, you need to assess whether your variation closely reflects it. You can also take help of on-site analysis tools, and find ways to improve your variations.

Funnel data analysis
Sample website data that can be used to create hypothesis (Source)

Here’s an example: Let’s suppose you have a form on your website, and data analysis tells you that a majority of users drop off on the form. You hypothesize that reducing friction on the form will increase submissions. For that, you cut down the number of form-fields and run an A/B test. Now, if the test remains inconclusive, you need to see if you’ve removed the friction-inducing form fields or not. Form-analysis can help you find exactly those form-fields that lead to the majority of drop-offs.

Reviewing the Variations

One of the biggest reasons A/B tests remain inconclusive is that the difference between test variations is minuscule.

Now, I know, there are numerous case studies boasting double/triple-digit improvement in conversion rate by just “changing button color.” But what we don’t see are all the tests that fail to achieve the same feat. There probably are tens/hundreds of such failed tests for every single winning test.

For instance, Groove (a helpdesk software), ran six different A/B tests with trivial changes. All of them proved to be inconclusive. Have a look:

CTA button color change A/B test

CTA Text change A/B test

Keeping this in mind, you need to go through your test variations and see if they really have noticeable changes.

If you’re testing for minor elements, you need to start being more radical. Radical or bold A/B tests are usually accompanied by strong hypotheses, tending to deliver results more often.

(Interestingly, testing radical changes is also advisable when you have a low traffic website.)

Deriving Further Learnings from the Tests

So you’ve finished a thorough analysis of your inconclusive A/B test using the above-mentioned points. You now know what went wrong and where you need to improve. But, there’s more.

You also get to know about the elements that (possibly) don’t influence users for conversions.

When your inconclusive test had no hidden winners, you tracked the correct metrics, your hypothesis was spot on, and your variations were disparate enough, you can safely assume that the element tested just didn’t bother your users. You can recognize that the element is not high on your criticality list.

This will help you create a priority list of elements for your future A/B testing.

When A/B Tests Give Negative Results

A negative result for an A/B test means that the control beat the variation. Even with a failed test, you can gain insights and conduct future tests effectively.

Finding What Went Wrong

There could be many reasons because of which your A/B test returned a negative result. Having the hypothesis wrong, or executing the variation poorly are among them.

A negative result will make you question the test hypothesis. Did you follow a data-driven approach to come up with the hypothesis? Did you blindly follow a “best practice?”

Unbounce highlights a few cases where A/B tests performed against “common expectations.”

Example: ”Privacy assurance with form” best practice failed
Example: ”Privacy assurance with form” best practice failed

These tests again emphasize the importance of a data-driven process behind A/B testing and CRO. A negative A/B test result can prove to be a wake-up call for practicing the same.

Knowing Your Users’ Preference

Negative A/B test results let you understand your users’ preferences better. Specifically, you get to know your users’ dislikes (in the form of the changes you made to the losing variation).

Since you know what your users don’t like with your website, you can build on hypotheses about what they might like. In other words, you can use your negative test results to create better tests in the future.

Let’s talk about the Unbounce example used in the point above. The A/B test was performed on a form, where the variation flaunted privacy assurance, saying “100% privacy – we will never spam you.” The variation couldn’t beat the control — it reduced conversions by 17.80%. Upon analyzing the result, it was deduced that users didn’t like the mention of the word “spam.” Knowing what the users hated, the next test was run with a different variation. The form still had privacy assurance but this time it read “We guarantee 100% privacy. Your information will not be shared.” (No mention of the dreaded “spam” word.) This time the result changed — the variation ended up increasing signups by 19.47%.

Learing used from failed A/B test for a win

What’s Your Take?

How often do you encounter failed A/B tests? We’d love to know your thoughts on how to tackle them. Post them in the comments section below.

12

The post A Definitive Guide to Converting Failed A/B Tests Into Wins appeared first on VWO Blog.