Have you ever wondered why so many of your rollouts are disappointing? Part of the answer lies with statistical sampling theory, and the other part does not. This article will focus on statistical sampling theory, and leave to a future article a discussion of the nonstatistical reasons.
Universe versus test panel performance. When testing it is important to understand the difference between the performance of an entire rollout universe and that of an “nth'd” test panel. The key is to remember that test panel performance is generally similar to, but almost never identical to, that of the rollout universe. The following example will provide clarification:
Assume that a promotion to a universe of 200,000 yields a 0.8 percent response. This, of course, is the true responsiveness of all the customers to the promotion.
Now, assume that a random sample (nth) of 10,000 is contacted and responds at 0.72 percent. This is the performance of the test panel and not the true performance of the 200,000 universe. And what direct marketers are interested in is not this test panel responsiveness, but that of the full universe.
Finally, assume that four more nth'd test panels of 10,000 are contacted, with response rates of 0.82 percent, 0.86 percent, 0.79 percent and 0.78 percent, respectively. These diverse results illustrate why, from the performance of test panels, you can never know exactly what is the true performance of the full universe.
Though the universe performance is never exactly knowable from testing, you can have a degree of confidence that it will lie within a certain range of the panel results. And, it is statistics that quantifies this confidence and range. In essence, if the universe performance is the bull's-eye, then the test panel results are the shotgun pellets. A few will be way off, and one or two might hit the mark. Most, however, will be somewhere in between.
The normal distribution, or bell curve. Now, let's assume that you draw many hundreds of random samples of 10,000 from the universe of 200,000, track their corresponding response rates and then graph their frequency. Statistical sampling theory says that the graph will look something like the figure on this page:
This is what is known as a normal distribution, or bell curve. The exact quantification of the distances between the various test panel response rates and the one true universe response rate would require a discussion of statistical sampling theory and formulas that is beyond the scope of this article. For our purposes, all you need to know is that many of the test panel response rates will be not too far from the true universe rate of 0.8 percent, and some will be moderately far off. In other words, most of the test panel response rates will cluster with reasonable proximity around the true universe rate.
A few of the observations, however, will be relatively far away. These are what statisticians refer to as “outliers.” The larger the distance from the true universe response rate, the fewer the number of outliers. In other words, among the pool of outliers, there will be a select few that deviate particularly far from the norm.
This is the same effect that is seen in athletics. To make the starting football lineup at any large high school where the sport is popular, a player is likely to be somewhat of an outlier in athletic ability; that is, a reasonable distance to the right of the midpoint of the normal distribution for male athleticism. To make the starting lineup at a big-time football university, a player has to be even more of an outlier; that is, even farther to the right of the midpoint. And, to make the NFL, a player has to be an extreme outlier; that is, way to the right of the midpoint. The difference is that in direct marketing testing you try to avoid the outliers.
In a normal distribution, the shape of the right side of the curve will, by definition, mirror that of the left. Leaving the football arena and returning to the hundreds of test panels drawn from a universe with one true response rate of 0.8 percent, there will be as many “high response” outliers to the right of 0.8 percent as there will be “low response” ones to the left. Again, it is just like football where, for every starter, there is a klutz who keeps tripping over his own feet.
True winner lists. Assume that a hypothetical direct marketer promotes 50 new lists a year. For each of these lists, focus first on the one true universe response rate rather than on the variable test rates.
For most direct marketers, there are fewer winner lists than there are loser lists; that is, those whose one true universe response rate is above some acceptable cutoff. This makes intuitive sense because direct marketing is a mature industry, and postal rates have been increasing faster than inflation. Mailboxes are glutted, as thousands of catalogers, continuities, magazines, credit card issuers and store-traffic-hungry retailers compete for the attention of consumers.
With fewer winners than losers, it is generally the case that the true aggregate universe response rate across all rental lists will be below that which is required to generate an acceptable acquisition cost per new customer. In other words, if a typical direct marketer were to roll out all of its test rental lists, the overall response rate would be below the acceptable acquisition cost. In fact, it is likely that it would be well below this hurdle.
So assume that the hypothetical direct marketer's true overall universe response rate across all 50 test lists is 0.8 percent, and that 1 percent is required to generate an acceptable acquisition cost per new customer. Some of the lists will have universe response rates that are close to the average of 0.8 percent, others will be not too far off, and a handful will be a good distance from 0.8 percent. However, by definition, they will hover around 0.8 percent rather than the breakeven of 1 percent, which means that relatively few will be greater than the 1 percent required for rollout.
False negatives. Now, switch focus from the one true universe response rate to the variable response rates that are inherent in test panels. Recall from statistical sampling theory that for each of the lists, the test panel response rate will almost never be identical to the true universe rate. One-half of the time the test panel will be greater than the true universe rate. Likewise, one-half of the time it will be less than the universe rate.
Therefore, some of the small number of true winner lists will appear from the test panel response rates to be below the 1 percent cutoff, or losers. These are statistical mirages that are referred to as false negatives. The problem is if they are not rolled out or retested, they will never be identified.
False positives. Likewise, some of the large number of true loser lists will appear to be above the cutoff, or winners. These comprise another form of statistical mirage known as false positives. Only upon rollout will their response rates reflect the ugliness that is their reality. Mix them in with the relatively small number of lists that are both true winners and whose test panel response rates are above 1 percent, and it is apparent why so many rollouts are disappointing.
Minimizing the false negatives and false positives. An important way to minimize the number of false negatives and false positives is to increase the size of the test panels. The higher the test quantity, the narrower the distribution of test panel results around a true universe response rate.
With a narrower distribution, fewer of the true winner lists will display test results below the 1 percent cutoff to create a false negative. Likewise, fewer of the true loser lists will have test results that exceed the cutoff, thereby creating a false positive. Taken together, the effect will be far fewer rollouts that are disappointing.
Of course, this leads to the question of how to determine optimal test panel sizes. This topic was covered in two recent articles, “How Big Should My Test Be?” (DM News, Oct. 1, 2001) and “Identifying Millions in Lost Revenue” (DM News, Jan. 7, 2002).