Poor Hygiene Corrupts Analysis
Consider, for example, these three pairs of New York City records:
o Ben Rosen and Ben Orsen at 1407 Madison Ave.
o R. Happle and Robert Appel at 130 E. End Ave.
o Ms. K. Mahoney and Katherine Maloney at 829 Park Ave.
The individuals represented by these records reside in multiple-family dwelling units. Nevertheless, the records contain no apartment number information. Do each of these record pairs represent the same individual? And, how do hygiene issues affect response analysis and data mining? Both questions will be explored in this article.
Can you spot the duplicate in the following two pairs of records:
It appears obvious that pair #1 contains the duplicate record and that pair #2 represents two different individuals at the same address. Surprisingly, however, pair #2 contains the duplicate.
Pair #1 consists of two people, a father and his son, with their respective suffixes (Jr. and III) deleted. It was a constant source of confusion for me while growing up, as is the case with any son who is named after his father.
In pair #2, the first record represents a married woman's professional name, comprised of her given name and maiden surname. As with many women, Beth did not change her name professionally when she married. She retained the name with which all of her co-workers and associates were familiar. The second record contains Beth's nickname and married surname. In her personal life, she opted for her surname to correspond with her husband's.
These two pairs of records illustrate that there is no way to be 100 percent correct when it comes to defining duplicates. Therefore, it is best to incorporate specialized hygiene technology into operational systems and improve order-entry procedures to minimize the occurrence of potential duplicate situations. In this way, back-end cleanup is employed only as a last resort.
Ramifications of poor hygiene. Consider what happens whenever two legitimate duplicates are not consolidated. When doing matchback response analysis, each unconsolidated duplicate represents one less attributed order. The higher the number of unattributed orders, the more difficult it is to accurately quantify the performance of lists and list segments.
For customers, an actual multiple buyer will appear to be two separate, and less desirable, single buyers, which will reduce the effectiveness of any statistics-based predictive model. After all, "pseudo" single buyers will purchase more frequently in the future than expected - just as "pseudo" multiple buyers will purchase less frequently. Likewise for lifetime value analysis, one relatively valuable customer will appear to be two less valuable individuals. And the opposite effect will occur whenever an inappropriate record consolidation takes place.
Even with records that are likely to be duplicates, sloppy order-entry procedures cause problems. Consider the earlier "Ben Rosen" and "Ben Orsen" record pair. Which is the correct surname for this individual? People appreciate being referred to correctly. When they are not, the logical result is a lowered likelihood to place future orders.
Twenty years ago, most direct marketing orders for many companies arrived via mail. In this long-ago world, the majority of orders could be directly attributed to a promoted name and address. This is because most people filled out the order form that accompanied the direct mail piece. These order forms, in turn, contained the proper name and address of the prospect or customer. Of the orders that were not directly attributable, most were the result of "passalong mail," where a friend or relative was inspired by the direct marketing piece to place an order.
Today, almost all orders are handled by inbound call centers and e-commerce sites. Unfortunately, when it comes to name and address entry, many call centers lack rigorous standards. Glaring misspellings are common, as are omissions of address elements. Likewise, the capture of valid key codes often is not much better. And e-commerce orders present their own name and address quality challenges. Frequently, for example, no more than 25 percent contain a key code.
This attenuated linkage between those who are promoted and those who respond makes it difficult to analyze direct marketing campaigns. Barriers to analysis, in turn, increase the probability that incorrect rollout decisions will be made and that predictive models and lifetime value analysis will not reach their potential.
Maximizing the quality of response data. Fortunately, techniques exist to maximize the quality of response data. Unique IDs can be applied to promoted records. When applied to prospects, these often are referred to as "finder numbers." Call center reps can request this unique ID, which acts as a "hard" link between the order and the promoted record. Likewise, the ID can be requested during e-commerce sessions.
Ideally, when the ID is input by the call center rep into the operational system, the name and address of the customer or prospect will appear on the screen. Similarly, the same steps can occur during e-commerce sessions. At this time, the customer can be queried about any changes, including whether he or she is a "pass-along" order. If so, then a "hard link" has been established between the response vehicle and the "indirectly promoted" new customer, which helps in subsequent response analysis and data mining.
For direct marketers whose size is sufficiently large to justify the investment, real-time hygiene technologies also can be integrated into their operational infrastructure. First, customer and prospect contact lists are loaded into the operational system. If incoming orders have different address data, technology can be used to "screen on the fly" for address elements that do not meet U.S. Postal Service standards.
But many small to midsized DMers cannot justify major investments in upfront hygiene technology. However, name and address input quality can be enhanced by incorporating simple but cost-effective order-entry procedures. For example, rigorous name and address input standards can be established for call center reps, and the performance of individual reps tracked by the back-end matching of orders against prospect and customer lists. Reps who consistently achieve high standards become eligible for performance bonuses.