10 mistakes in A/B testing of mobile apps

1. Improper user segmentation

An important prerequisite for successful A/B testing is proper user segmentation based on various user characteristics. By no means should anyone group of users significantly outweigh any other group in any of the variants you’re testing. For instance, the ratio between men and women in the control sample must be approximately equal to the corresponding ratio in the test variant. Predominance of any one of substantial groups might render the results of such testing completely invalid. For instance, a certain test variant might have a higher purchase conversion because there was a predominance of users with newer versions of iPhones.

What to do?

You must always check the proportions of users in your variants. The ratio of men, new users, iPhone users, etc., must be approximately the same for the control variant and the one you’re testing. Present your test variants to audiences of similar compositions.

2. Running tests without defining and detailing the problem/hypothesis

Running experiments without proper understanding of the business task in question often leads to a large amount of variants to test. If you run such a test with an excessively broad range of alternative variants of your product design, then, for a set of 10 different app variants, for example, the probability of arriving at an incorrect result will grow exponentially. This is a good illustration of the multiple-testing problem. Therefore, it becomes an important stage of testing preparation to elaborate on the problem you would like to solve by experimenting.

What to do?

  • Focus on the problem you would like to solve. This way, you will avoid having too many variants to test.
  • If it is extremely important to include several variants to test, use statistical corrections.

3. Ignoring external factors

User behavior often changes depending on external factors. A classic example here is the effect of seasonality:

  • Food delivery and taxi services, etc. have more orders in bad weather;
  • Dating apps are used more actively on Fridays and Saturdays;
  • Flower e-commerce can usually be seen blowing up during holidays.

What to do?

Always account for your product’s external factors. For instance, you should not stop an experiment in the middle of the week since the overall results might change during the weekend.

4. Neglecting statistical data

After breaking down your user base into several groups, it’s not enough to just calculate the metrics (e.g., the conversion or average income) and choose the variant with the highest value. In this case, we ignore the randomness of our data. It is more likely that the metric values will change after the experiment, so we need to account for this fact in advance to come to the correct conclusion based on the data we obtain from our A/B test.

What to do?

When deciding on the winning option, use the statistical methods that allow you to take randomness into consideration.

5. Testing hypotheses on a small amount of traffic

Even if you’ve accounted for the seasonality factor, the number of users accumulated during the testing period might still be insufficient to discover any statistically significant effects caused by the product change. The number of users needed to ensure that your conclusions are correct depends on the statistical test itself, the degree of the effect intended, and the statistical probabilities of an error.

What to do?

Calculate the number of users needed in advance to be able to detect even the smallest intended effect of the changes. Learn more.

6. Premature test termination

The methodology of a classic A/B test is such that, depending on the sample of users involved in the experiment, statistically significant effects of the change may come to light and disappear alternately. By stopping our experiment right after we have arrived at a statistically significant result, we make the mistake of peeking.

What to do?

Decide on your test termination time in advance. It might depend on the minimum amount of users needed, — which can be found using special-purpose calculators, — and on the seasonality factor.

7. Insubstantial difference between the variants being tested

Let’s assume you’re running an A/B test for a mobile screen with a button, where the button’s color varies only insignificantly between the two variants being tested. In this case, the difference in user behavior will be negligible.

What to do?

Only test the options that can clearly change user behavior and affect the metrics. The difference between the variants being tested must be visually noticeable and apparent.

8. Using conversion as the only metric of the experiment

The strategy of choosing the variant with the best conversion is not always optimal when you’re trying to maximize profits. Oftentimes, a much better solution would be to attract a more paying audience. This makes it necessary to run tests not only for conversion but for financial metrics as well. However, do keep in mind that, methodologically speaking, the A/B tests for conversion and those for financial metrics are slightly different. To identify statistically significant results in each one, you need to use specific statistical criteria.

What to do?

Apart from conversion, also run experiments to optimize for financial metrics.

9. Using classic experiments only

In classic A/B tests, users are distributed equally between the groups. If you’re not a tech giant with a huge amount of data, then each A/B test has certain undesirable side effects. Throughout the experiment, we are forced to present options that are not economically beneficial, again and again. However, algorithms exist that enable us to change traffic distribution between the variants throughout the experiment. Among those, there is Thompson’s algorithm that applies Bayesian statistics to multi-armed bandits problems. This algorithm involves recalculating win probabilities for every variant at every step of the experiment and directing the traffic to where the probability to win is the highest for the stage.

What to do?

When possible, use Bayesian statistics to determine the winner. It should be noted, however, that every method has its limitations, so using them blindly might not give the desired results.

10. Using testing tools unsuitable for the test requirements

There are many universal platforms and solutions for A/B testing web- and mobile products. A lot of analytical platforms are starting to add A/B-testing modules to their solutions. However, those do not actually solve all the problems encountered by developers, product managers, and marketers of mobile apps.

  • A free pricing plan for a quick start;
  • Classic A/B testing with optimization for any metrics using the standard frequency approach methodology (the traffic is distributed equally into every variant, and the conclusions are made based on confidence intervals);
  • Accelerated experiments with optimization for conversion using Bayesian statistics and multi-armed bandits (the traffic is distributed proportionally to the probabilities of a certain variant being superior at every step);
  • Accelerated experiment with optimization for financial metrics using Bayesian statistics (the traffic is distributed equally, and the best variant is determined based on the probability of its superiority).

What to do?

Use the leading testing tools. Those will not only let you run simple tests with “manual” traffic distribution but also accelerate your experiments using automatic algorithms. You will also be able to further optimize the result for your financial metrics.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Proba

Proba

proba.ai is a tool for A/B testing in mobile apps. Carry out experiments faster, and at a better price — using the mobile app product hypothesis testing tool.