This escalated quickly into a robbery of A/B testing
In my previous article, I stated numerous drawbacks of A/B testing when it comes to fixed sample sizes, the slow learning, and the insufficient insights you get out of A/B testing (unless a data analyst spends a tremendous amount of time manually analyzing the outcome).
General background (no worries, we dive into the details later)
Essentially, A/B testing is an exploration technique where we explore what would be the best fit for the population in order to maximize a chosen metric or a set of metrics. How do we visualize this?
Didn’t we have information long before the cutoff time to start making adjustments to the allocation of the 3 variants? Well, we do, even if the results are not significant after a day, this information can be used to correct our estimate (in Bayesian statistics, we refer to this as a prior). Here is what it should look like:
This is a Multi-Armed Bandit. Wait… Why does it have to be the winner that takes all? This is a Contextual Bandit but we will keep this for a future article.
So we have introduced a more general framework where Exploration and Exploitation can be carried out concurrently. However, these are some high-level concepts without much depth. Let’s dig into it now.
Multi-Armed Bandit
General idea: Rather than having a fixed traffic split across the variants, the traffic is adjusted depending on past observations.
Problem solved: Cheaper to implement (opportunity cost is lower), sample size issue, better for time-sensitive features, campaigns, or push notifications.
Issue: Requires a single metric to optimize for.
How do we run this? Although many experimentation methods are available, for the sake of simplicity, let's use 2 of…