Using Reference-Based Matching
Merge/purge systems, designed primarily to remove duplicates from mailing lists, are built for speed and efficiency. Customer matching systems, designed to link records from multiple inhouse systems, use more extensive processing to achieve higher accuracy. With a bit of tuning, customer matching systems from vendors like Harte-Hanks Trillium, Innovative Systems and DataMentors probably find 90 percent or more of the matches that could be identified by direct comparison.
But records that refer to the same individual may not be directly comparable because of name changes, address changes or multiple residences. Conventional matching systems sometimes match on other elements, such as telephone numbers or Social Security numbers, but these are not always available or reliable. The problem is particularly severe on files, such as prospect lists and product registration records, that customers have no reason to update.
Acxiom AbiliTec (Acxiom, 888/329-9466, www.acxiom.com) and Experian Truvue (Experian, 714/385-7000, www.experian.com) both offer matching systems that attempt to solve these problems. Instead of comparing records directly against each other, they compare records against a reference database that contains entries for pretty much every individual in the United States. (They have similar databases for the United Kingdom and several other nations.)
This approach frees the user from choosing between tight rules that miss valid matches and loose rules that accept false ones. Instead, users can rely on the reference database to have already determined whether two loosely similar records really refer to the same individual or whether two highly similar records are really different. The reference database can also link dissimilar records if it has determined they are the same person.
Naturally, this works only if the reference database itself is accurate. The advantage of reference-based systems is that their developers can afford to invest in more elaborate processing and larger universes of source records than any firm building a database for internal use. So, even though the reference databases are not perfect, they are more reliable than just about any alternative.
The actual advantage of reference-based methods over conventional techniques depends heavily on the nature of the input files. Older, dirtier data are more likely to benefit than newer, cleaner information. Both Acxiom and Experian report that they typically find twice as many duplicates as conventional systems. In one test run by Raab Associates, the difference was much smaller. A reference-based system found just 10 percent to 20 percent more matches than conventional software. But the lists used in this particular test were exceptionally clean and current, which may partly explain the results.
An important side benefit of the reference-based approach is that each individual is assigned a permanent personal ID number. This ID is stored on each of that person's records in the reference database and is returned when a match is found against an input record. Once all input records are coded with the individual IDs, any records with the same ID are considered to match. This makes it simple to identify duplicates within the input file and avoids providing clients with any information about the contents of the reference database itself. It also means that new records can be matched against a previous input file by appending the IDs to the new records, then looking for old records with matching IDs. This is vastly more efficient than the conventional approach of comparing all old and new records against one another. The same method allows real-time matching of single records as they are captured during order entry and similar tasks.
Both the Acxiom and Experian products work in the ways just described. Both also assign temporary IDs to records that do not match their reference databases. These IDs are based on the contents of the records, so the same ID will be assigned again if the same or a similar record is presented later. This allows some deduplication and makes it easy to replace the temporary ID with a permanent ID if a match is later found on the reference database. Both vendors do all match processing themselves. Users either send files for batch updates or set up online connections to submit single records and receive results in real time.
Of course, the two products are not identical. One difference is that AbiliTec assigns permanent IDs to physical addresses as well as individuals; Truvue currently does not, although it plans to add this feature in a few months. The address ID makes it easier to identify individuals living in the same location, which is done primarily to define households. Actually, both systems are limited to very simple household definitions, such as a common address or a common address and same last name. Nor does either system assign permanent household IDs. This means that custom programming is needed to track changes in household membership over time. Conventional customer matching systems do have these advanced householding capabilities.
Another difference is that AbiliTec relies only on name and address for matching, while Truvue can also use telephone and Social Security number. Despite the problems with these additional elements, Experian is confident that its processes, honed through years of experience assembling credit bureau files, allow it to use such data effectively. Truvue also gives users an option to use matches that would be rejected by its standard rules, such as matching "J Smith" with "John Smith." A final difference is that Truvue returns parsed and standardized versions of input records along with the IDs, while AbiliTec returns IDs only. Acxiom does provide parsing and standardization through other services.
AbiliTec and Truvue were both introduced in 1999, although they have been rolled out fairly slowly. AbiliTec has about 40 external clients, plus many internal users at Acxiom, while Truvue has about 16 external clients. Cost is determined primarily by the number of records processed and whether real-time updates are required. Fees are somewhat higher than conventional matching services.