Most marketers and web developers are familiar with the A/B Testing: The testing of 2 different versions of an object or a web page to see which one performs better. In theory, this is an excellent practice and should be performed to get the best performing website or campaign possible. However, this is often without a cost, especially if you are working with a web optimization firm. The question is not whether A/B testing can be effective, but in reality is it fiscally responsible and ultimately necessary.
Sample A out performed Sample B by 20%! Is my data actionable?
There are 2 ways to determine that you tests are statistically valid: One-Tailed Test or Two-Tailed Test. Peter Borgen, a contributor at SumAll, eloquently describes the difference between the 2 test on his blog:
"The short answer is that with a two-tailed test, you are testing for the possibility of an effect in two directions, both the positive and the negative. One-tailed tests, meanwhile, allow for the possibility of an effect in only one direction, while not accounting for an impact in the opposite direction."
One-tailed requires a smaller sample size, is more convenient, and will likely yield a result that appears conclusive. They will tell you that sample A is better that sample B, but it will not tell you if it is doing worse. Two-tailed requires a large sample size, may take longer, and may be more expensive to conduct. However, it will produce results that are truly actionable.
More on One-Tailed Tests and Two-Tailed Tests:
One-tailed tests are not always bad, it is just important to understand their downside. In fact, there are many times when it makes sense to use an one-tailed test to validate your data. Personally, if you paying for website optimization or it is a major decision, I would need to have Two-tailed validation. Chris Stucchio has a great summary of when it's "ok" to use one-tailed testing here.
If you're paying for a service, you deserve actionable results.
Again, the challenge of A/B Testing is to get an result from the test that is statistically significant. I have never been told what form of testing (nor have I asked), was used to evaluate my data. If you move forward with an optimization firm ask them:
- What type of evaluation are they using? One-Tailed or Two-Tailed Test
- What is the sample size?
- What is their confidence interval?
- Are the results statistically significant enough that I can make a decision with them?
Ultimately, I want to know that A/B Testing in effect increase my user acquisition and not just tell me what I want to hear. I work for a smaller firm with limited resources, and I do not want to waste them bogus results. If something is too good to be true, it probably is. Spend the time researching and finding a firm that will delivery the result you need, because as the end of the day - if you show your boss a presentation that will increase user acquisition by 20%, and months later the results are not there: the only loser is going to be you.
Have no data is better than bad data. Doing testing in house and using your experience, logic, and free online tools can help you improve your performance without spending a dime. Run test frequently, and for long periods of time, and run calculations to see if you data is significant.
If you are new to A/B Testing and how it works, these resource are a great places to start:
To learn more about A/B Testing Statistics, One-Tailed Test, and Two-Tailed Tests use the following links:
One tailed vs two tailed A/B tests - your decision procedure is the deciding factor.
What you really need to know about mathematics of A/B split testing
Here is a tool for testing statistical significant from your own tests: