A/B testing versus Multi-armed bandits

6 min readApr 4, 2022

This escalated quickly into a robbery of A/B testing

In my previous article, I stated numerous drawbacks of A/B testing when it comes to fixed sample sizes, the slow learning, and the insufficient insights you get out of A/B testing (unless a data analyst spends a tremendous amount of time manually analyzing the outcome).

General background (no worries, we dive into the details later)

Essentially, A/B testing is an exploration technique where we explore what would be the best fit for the population in order to maximize a chosen metric or a set of metrics. How do we visualize this?

Didn’t we have information long before the cutoff time to start making adjustments to the allocation of the 3 variants? Well, we do, even if the results are not significant after a day, this information can be used to correct our estimate (in Bayesian statistics, we refer to this as a prior). Here is what it should look like:

Multi-Armed Bandit

This is a Multi-Armed Bandit. Wait… Why does it have to be the winner that takes all? This is a Contextual Bandit but we will keep this for a future article.

Contextual bandits

So we have introduced a more general framework where Exploration and Exploitation can be carried out concurrently. However, these are some high-level concepts without much depth. Let’s dig into it now.

Multi-Armed Bandit

General idea: Rather than having a fixed traffic split across the variants, the traffic is adjusted depending on past observations.

Problem solved: Cheaper to implement (opportunity cost is lower), sample size issue, better for time-sensitive features, campaigns, or push notifications.

Issue: Requires a single metric to optimize for.

How do we run this? Although many experimentation methods are available, for the sake of simplicity, let's use 2 of…


Strategy/Data/Leadership ~~ Twitter data science ~~ ex-gojek~~ web3 enthusiast