A/B testing versus Multi-armed bandits

Adrien
6 min readApr 4, 2022

This escalated quickly into a robbery of A/B testing

In my previous article, I stated numerous drawbacks of A/B testing when it comes to fixed sample sizes, the slow learning, and the insufficient insights you get out of A/B testing (unless a data analyst spends a tremendous amount of time manually analyzing the outcome).

General background (no worries, we dive into the details later)

Essentially, A/B testing is an exploration technique where we explore what would be the best fit for the population in order to maximize a chosen metric or a set of metrics. How do we visualize this?

Didn’t we have information long before the cutoff time to start making adjustments to the allocation of the 3 variants? Well, we do, even if the results are not significant after a day, this information can be used to correct our estimate (in Bayesian statistics, we refer to this as a prior). Here is what it should look like:

Multi-Armed Bandit

This is a Multi-Armed Bandit. Wait… Why does it have to be the winner that takes all? This is a Contextual Bandit but we will keep this for a future article.

--

--

Adrien

Strategy/Data/Leadership head of DS at OCBC ~~ exTwitter ~~ ex-gojek