# Size does matter when it comes to testing

You may be making the wrong marketing decisions if you're not using statistical analysis to read your test results. Statistical analysis is complex, but statistical models on the Internet make it easier.

Statistical analysis is important because if test sample sizes are not big enough, then the results may not represent the real world results when taking products or creative to market.

The first thing to look for when testing is difference in response. There are models that show positive and negative difference. For this article, positive differences will be used.

For example, an A/B split Web page test to 4,000 consumers (2,000 for each page) that generates a 2 percent response for page A and a 2.4 percent response for page B would give the appearance page B is 20 percent better than page A.

However, the 20 percent lift is an estimate based on a single experiment with a small sample size and relatively low response rates. The real world rollout may not actually be a 20 percent lift. In fact, there could be a decrease in response. An analogy is flipping a quarter 5 times. It may land on its head 4 out of the 5 times even though there is a 50/50 chance it will land on heads or tails. You never can be 100 percent certain of results because you are always just taking an estimate, unless the test was run on 100 percent of the population.

There is an inverse relationship between the sample size and the response rate. The higher the response rate, the lower the required sample size. The lower the response rate, the higher the required sample size to get the same level of confidence, or statistical validity, in a test.

When testing most new concepts, a company could start with a fairly small sample size and a 90 percent confidence level. In that case, the test and response rates above would require 5,000 views per page for it to reach the conclusion that page B is better than page A. A company could reach this level of confidence because if it repeated the test, based on statistical analysis, 90 percent of the time page B would have a higher response rate than page A. However, such a company may be taking some risk if it rolled out Page B because there would be a 10 percent margin for error that the result would not be replicated if tested again or rolled out. Another way to look at this is to say: 9 out of 10 times there should be a lift in response with page B… and 1 out of 10 times there may not be a lift in response. This "margin of error" is the same given in political polls and other surveys.

It is also important to understand that just because the stat model shows a 90 percent confidence level with a 20 percent response lift… it does not mean there will always be a 20 percent lift in response for page B. There may only be a 5 percent lift, or 30 percent lift, but there should be a lift, 90 percent of the time.

The second thing to look for testing is the real response lift. To validate the exact amount of the lift in response, run tests to an even larger sample size. In the example above, to be sure the real response lift was at least 10 percent (yielding an overall response rate of no less than 2.2 percent) using 2 percent response for page A and 2.4 percent for page B, a sample size of 16,000 page views would be required for each page to see that the 2.4 percent response rate of Page B would not drop below 2.2 percent... 90 percent of the time. Or, 1 out of 10 times there may be a difference less than 10 percent.

There are other statistical models and approaches as well as many other facets to statistics just as there are other factors that go into making business decisions - knowledge, experience, instinct.

Even with all the statistical models, some things may not replicate or may be inexplicable. To paraphrase Aristotle, 'It is very probable that many improbable things will happen.' But there are good statistical tools to help verify sample sizes and test results to help make the right decision, most of the time.