10 mistakes in A/B testing of mobile apps

7 min readDec 6, 2021

What affects your product monetization? The design, the special offers, the game mechanics, the pricing plans, etc. However, all of the above have no one-fits-all formula to them that would allow you to endlessly gain profits. At the same time, these decisions just have to be made. What can help you out here is A/B testing: a tool that enables quantitative measurement of the effects that a certain change in your product has.

Then again, A/B testing does not necessarily lead to positive economic effects overall. So, in this article, we will take a look at the main problems that arise during A/B testing and the main ways to deal with those.

1. Improper user segmentation

An important prerequisite for successful A/B testing is proper user segmentation based on various user characteristics. By no means should anyone group of users significantly outweigh any other group in any of the variants you’re testing. For instance, the ratio between men and women in the control sample must be approximately equal to the corresponding ratio in the test variant. Predominance of any one of substantial groups might render the results of such testing completely invalid. For instance, a certain test variant might have a higher purchase conversion because there was a predominance of users with newer versions of iPhones.

The characteristics we use to divide users into groups are determined individually for every product depending on the target audience.

What to do?

You must always check the proportions of users in your variants. The ratio of men, new users, iPhone users, etc., must be approximately the same for the control variant and the one you’re testing. Present your test variants to audiences of similar compositions.

2. Running tests without defining and detailing the problem/hypothesis

Running experiments without proper understanding of the business task in question often leads to a large amount of variants to test. If you run such a test with an excessively broad range of alternative variants of your product design, then, for a set of 10 different app variants, for example, the probability of arriving at an incorrect result will grow exponentially. This is a good illustration of the multiple-testing problem. Therefore, it becomes an important stage of testing preparation to elaborate on the problem you would like to solve by experimenting.

What to do?

Focus on the problem you would like to solve. This way, you will avoid having too many variants to test.
If it is extremely important to include several variants to test, use statistical corrections.

3. Ignoring external factors

User behavior often changes depending on external factors. A classic example here is the effect of seasonality:

Food delivery and taxi services, etc. have more orders in bad weather;
Dating apps are used more actively on Fridays and Saturdays;
Flower e-commerce can usually be seen blowing up during holidays.

Thus, when comparing samples, it makes sense to at least select them from the same seasonal range. It is also preferable to use the ranges where your core audience is present.

What to do?

Always account for your product’s external factors. For instance, you should not stop an experiment in the middle of the week since the overall results might change during the weekend.

4. Neglecting statistical data

After breaking down your user base into several groups, it’s not enough to just calculate the metrics (e.g., the conversion or average income) and choose the variant with the highest value. In this case, we ignore the randomness of our data. It is more likely that the metric values will change after the experiment, so we need to account for this fact in advance to come to the correct conclusion based on the data we obtain from our A/B test.

What to do?

When deciding on the winning option, use the statistical methods that allow you to take randomness into consideration.

5. Testing hypotheses on a small amount of traffic

Even if you’ve accounted for the seasonality factor, the number of users accumulated during the testing period might still be insufficient to discover any statistically significant effects caused by the product change. The number of users needed to ensure that your conclusions are correct depends on the statistical test itself, the degree of the effect intended, and the statistical probabilities of an error.

What to do?

Calculate the number of users needed in advance to be able to detect even the smallest intended effect of the changes. Learn more.

6. Premature test termination

The methodology of a classic A/B test is such that, depending on the sample of users involved in the experiment, statistically significant effects of the change may come to light and disappear alternately. By stopping our experiment right after we have arrived at a statistically significant result, we make the mistake of peeking.

The analogy here could be a round-robin chess tournament with 50 players, i.e. every one of the players needs to face every other one. If we end the tournament after just 5 rounds to decide on the winner, our judgment would be far from objective since this supposed winner could, by chance, only be getting low-rating opponents at the start.

What to do?

Decide on your test termination time in advance. It might depend on the minimum amount of users needed, — which can be found using special-purpose calculators, — and on the seasonality factor.

7. Insubstantial difference between the variants being tested

Let’s assume you’re running an A/B test for a mobile screen with a button, where the button’s color varies only insignificantly between the two variants being tested. In this case, the difference in user behavior will be negligible.

What to do?

Only test the options that can clearly change user behavior and affect the metrics. The difference between the variants being tested must be visually noticeable and apparent.

8. Using conversion as the only metric of the experiment

The strategy of choosing the variant with the best conversion is not always optimal when you’re trying to maximize profits. Oftentimes, a much better solution would be to attract a more paying audience. This makes it necessary to run tests not only for conversion but for financial metrics as well. However, do keep in mind that, methodologically speaking, the A/B tests for conversion and those for financial metrics are slightly different. To identify statistically significant results in each one, you need to use specific statistical criteria.

What to do?

Apart from conversion, also run experiments to optimize for financial metrics.

9. Using classic experiments only

In classic A/B tests, users are distributed equally between the groups. If you’re not a tech giant with a huge amount of data, then each A/B test has certain undesirable side effects. Throughout the experiment, we are forced to present options that are not economically beneficial, again and again. However, algorithms exist that enable us to change traffic distribution between the variants throughout the experiment. Among those, there is Thompson’s algorithm that applies Bayesian statistics to multi-armed bandits problems. This algorithm involves recalculating win probabilities for every variant at every step of the experiment and directing the traffic to where the probability to win is the highest for the stage.

What to do?

When possible, use Bayesian statistics to determine the winner. It should be noted, however, that every method has its limitations, so using them blindly might not give the desired results.

10. Using testing tools unsuitable for the test requirements

There are many universal platforms and solutions for A/B testing web- and mobile products. A lot of analytical platforms are starting to add A/B-testing modules to their solutions. However, those do not actually solve all the problems encountered by developers, product managers, and marketers of mobile apps.

If you want to focus on developing your product, make it profitable, or increase the current metrics, consider trying out the Proba service. Its features include:

A free pricing plan for a quick start;
Classic A/B testing with optimization for any metrics using the standard frequency approach methodology (the traffic is distributed equally into every variant, and the conclusions are made based on confidence intervals);
Accelerated experiments with optimization for conversion using Bayesian statistics and multi-armed bandits (the traffic is distributed proportionally to the probabilities of a certain variant being superior at every step);
Accelerated experiment with optimization for financial metrics using Bayesian statistics (the traffic is distributed equally, and the best variant is determined based on the probability of its superiority).

The service allows you to publish the results in a single click to apply the interface changes in favor of the most successful variant.

What to do?

Use the leading testing tools. Those will not only let you run simple tests with “manual” traffic distribution but also accelerate your experiments using automatic algorithms. You will also be able to further optimize the result for your financial metrics.

We have listed, based on our own experience, the main mistakes that occur during A/B testing. Share your own cases and mistakes in the comments, and we’ll work those out together!

10 mistakes in A/B testing of mobile apps

1. Improper user segmentation

What to do?

2. Running tests without defining and detailing the problem/hypothesis

What to do?

3. Ignoring external factors

What to do?

4. Neglecting statistical data

What to do?

5. Testing hypotheses on a small amount of traffic

What to do?

6. Premature test termination

What to do?

7. Insubstantial difference between the variants being tested

What to do?

8. Using conversion as the only metric of the experiment

What to do?

9. Using classic experiments only

What to do?

10. Using testing tools unsuitable for the test requirements

What to do?

Written by Proba

No responses yet