Long live A/B testing
A/B testing is the backbone of tech companies. They try, fail, and implement fast what works and what does not work. They believe in it as if this was the C-level making decisions and they have to obey the A/B testing results. I am the guy working in data science telling you that maybe you should listen to less to the “sample size calculator” the “paired t-test result”.
The more you do something, the more you get confident doing it. The more you automate your A/B testing the more you feel like “this works like a charm and I get 1% uplift, this pays for my salary, right?”
So what goes wrong in practice? What makes that you implemented 100 feature changes with 1% uplift and your final metric moved by 3% after 1 year of tough work. Let’s have a look at why your 2-week successful A/B testing went wrong.
Novelty effect
New features that are customer-facing have a natural novelty effect, other features may have a competitor reaction effect. Novelty generates clicks and then it fades away. Typically, you will get the following graph with novelty effect.
Competitor’s reactions will take a bit more time but will eventually make your feature “the norm” and the impact will diminish. In some system only the final outcome matters … In this case the declining…