A/B testing for the real world: Round 1

Why A/B test in the real world?

A/B testing is not just for the internet. You can apply the same methodology and principles to all parts of your business. You can and should think even further and take a page from Science. Applying the scientific method to make inferences about all parts of your business will be good for you. Why?

It gives you a framework for quickly proving or disproving hypotheses in a very fast way. This allows you to find out what works and what not much faster than just trying all options. For the engineers out there think O(n) vs log(n) (binary search). 

The scientific method allows you to get to a solution in a fraction of the time it would take to try ALL possible permutations.

What are you measuring?

The first part to this is to define the metric(s) that you want to track. If at the end of the day you care about moving sales, make sure you are tracking sales. There is a tendency among marketers to hide behind secondary metric such as awareness, facebook likes and what their friends think. 

I said this before: such secondary metrics are highly correlated with short term, tangible metrics. The only scenario where secondary metrics move but primary metrics don't is when your acquisition funnel is broken.  You can infer if that is the case from digital metrics - you don't need a 3-month brand study to come to that conclusion. It will make you slow and unable to respond to the real world.

How are you measuring success?

This is perhaps the crux of the problem. Most organizations need a paradigm shift in how they measure success. Usually managers (read people with an MBA background) do not have a solid statistical, scientific grounding. Of course, that is not always the case (I know a few MBAs with a very technical background are reading this blog), but on average. 

This makes it your responsibility to explain that Analytics is not black magic. There has been a lot written on the topic of Organizational Change. Google it.

Now to what should have been the title of this post: anecdotes are not FUCKING evidence. Pardon my Latin. If your friends/employees are liking and engaging with a campaign - that is not equal to campaign success (see comic above).

Ok, so if that is not success what is? To begin with you need data to compare AGAINST.

The need for Benchmarking 

Benchmarking is basically comparing against a standard. You need data point to compare the performance of your metric of interest against. That could include industry standards (we know e-commerce ads click trough rates are x %) or looking at comparable companies. Usually it is hard to find perfectly comparable companies so ideally you'd want to compare against your own data as the most reliable source.  

Establishing a baseline

Establishing a baseline from your own data should be the first step towards implementing the scientific method. It is a much better yardstick to compare against and it can be complementary to industry intelligence. The more data points you have, the more accurately can you draw conclusions.

You can look at your trend and extrapolate what that would have looked like in the future. Then compare that against the actual effect on your metric of interest. The difference between the original extrapolation and the value of the metric is the total effect your intervention had. BUT be careful extrapolating from the last point or too little of the data can lead to erroneous inference. Basically looking at too little data can mean you will miss long term big picture trends. In calculus that is called a local maximum/minimum.

One alternative would be to have a fancy Bayesian probability distribution on as much data as possible (more about this in a future post). Hover, even with such an approach  taking too much data could lead to an overly inflated/deflated baseline. To visualize this think about a situation where your product is in growth mode then reaches steady state. This approach will take the growth-mode stage into account and EXPECT future growth at a similar pace, which is often unrealistic.

So a baseline is NICE but not ideal. It ignores local points of inflection and can make erroneous predictions. Also, it does not control for other stuff happening in the real world. Economists will tell you this is not ceteris paribus (pardon my Latin), which means "everything else being constant" (a phrase that some of the people I work with dread me saying - probably because I say it so much). So how do we solve this problem?

Establishing a counterfactual: A/B Testing for the real world

Let's take it up a few notches. You're in the big leagues now.

Imagine if you could have a baseline AND you could control for everything else such as seasonality, competitor's behaviour, updates to your product, yada yada yada. Achieving ceteris paribus is an economist's wet dream. 

Well, you can do that! At least to some extent. Having a test and control group, for example having 2 similar geographies and trying something in one but not in the other would achieve most of that! This is basically A/B testing for the real world! Well, it would be a pseudo-control group (so some natural imperfection will occur), but you can mitigate for that by having MANY control groups. Than you can build a "composite" counterfactual taking samples from both based on historical trends with Bayesian statistics. Think about it like minimizing risk trough a diversified portfolio of stocks and bonds. Similar concepts only this time you're minimizing the risk of being wrong.

This is  controlling for exogenous variables at it's best: scientific rigor with business applicability. It is basically applicable on anything that can be measured: soft or hard metrics, sales, active users, number of sessions, views, sentiment, buzz, word of mouth. And you can actually infer causality, not just correlation as we are controlling for everything else (hopefully).A control group will also mitigate the issue of local extremes! YAY!

The nice people @Google have a beautiful paper on this and a great R package. Check it out here.

Go out and do it 

Ok, I hope you feel inspired to take your business analytics to the next level!

A few points to bear in mind as you go out in the real world:

- Rigour is necessary and this is a collective effort, as I said before.
- Remember to asks a lot of questions and don't assume anything.
- Asks the right questions: will the metrics you're measuring be actionable? Will they make a business impact?
- Find a good control for your experiment design
- If you don't have a good control, establish a baseline with the right amount of data 
- Most times you need to make judgement calls and use your intuition 
- Cut the bullshit

More in round 2

In round 2 we will focus on interpretation and making the right inferences from your data. We'll also cover the common pitfalls and caveats as well as what to do when there is not enough data and how to make sure your data is of adequate quality. See you next time!

My name is Dan, and this was the Dumi-truth.