Search Software Unique on Matching
Matching software presents a good example of why the best may not be good enough. Matching systems all do roughly the same things, but with differences that can be important in particular situations. Users must look carefully to find the system that best fits their needs.
SSA-NAME3 (Search Software America, 203/698-2399, www.searchsoftware.com) is a set of programs that builds keys from name and address data, defines search criteria to identify similar records and identifies matches among record pairs. These are the same broad functions as other matching systems, but SSA-NAME3 approaches them in its own way.
Key-building is the most distinctive. All matching systems use a key to find records that are potential matches. Usually the keys are defined simply, often by combining extracts from elements such as family name, street number, city and postal code. Systems commonly generate multiple keys with different sequences of elements, so one error does not prevent two related records from being compared.
Keys in SSA-NAME3 are more interesting. Instead of extracts from data elements, each key represents an entire name or address - up to 256 characters of input compressed into eight characters or five binary bytes. But the system does not simply copy the original entry. Rather, it determines whether the entry contains one of the common names that account for a large proportion of records in any given population. Keys for common names capture a great deal of detail, while keys for other names are less precise. This approach reduces the number of records returned when a common name is submitted, while casting a wider net for matches against an uncommon name. The system also performs some cleaning, standardization and phoneticization on the entries - again with most of the effort concentrated on common words. It usually builds multiple keys for each entry, based on resequencing the words within the entry.
The search function of SSA-NAME3 accepts an input record and returns ranges of key values that define potential matches. Because SSA-NAME3 only defines search ranges, the user must write a program that actually finds the matching keys in a key database. In fact, the user also must create the key database itself. Users who do not want to build their own system can buy another product from Search Software America called Identity Systems that does both of these functions. Identity Systems also produces files of matched records.
The match function takes a pair of records, compares the actual contents (not the keys) and returns a match score. SSA-NAME3's matching technique is similar to other vendors', though it gains advantages by using rules rather than pattern tables (which take more space) or fixed element weights (which do not adjust to different situations). Identity Systems can convert SSA-NAME3 output into a file of matched records, but neither will assign a common ID to all records for the same individual or household. Another Search Software America product, Clustering Engine, does provide this capability.
So what does this all mean, or, more to the point, which users will find that SSA-NAME3 meets their requirements? One implication of the system's approach is that it needs relatively few data standardization rules, since these must only handle the most common cases. This makes SSA-NAME3 easier to adapt to new populations than products that rely on extensive reference tables. But it also means the system is highly dependent on the rules being accurately tuned for the particular population being analyzed. In the past, such rules were custom built for each installation, but the vendor has more recently developed standard rule sets for more than 50 nations. These can include non-Roman character sets and have special extensions for Chinese, Korean and Japanese. So one application where the system is likely to be relevant is processing names from many countries.
A second implication is that the system is highly scalable. Tightly compressed keys and differential treatment of common versus uncommon names allow thorough searches without unacceptably high resource consumption. While performance depends heavily on the surrounding application, SSA-NAME3 has been clocked at more than 10 million matches an hour on a large multiprocessor server. The technology is also suited for real-time applications where a single record must be matched quickly against a large, existing database.
A third implication is that the system requires a technically sophisticated user. SSA-NAME3 provides only the key-building, search and matching functions, accessed through an Application Program Interface or API: so the user must build the entire surrounding system to submit the records, manage the keys and process the results. Identity Systems and Clustering Engine can save much of this work, but still require considerable integration. Of course, buyers with atypical applications may find that any system requires extensive customization. In their case, SSA-NAME3's approach is no disadvantage.
Finally, there are some things the system does not do. It cannot produce standardized, parsed or address-validated output; maintain permanent IDs for individuals or households; match inputs against a vendor-supplied reference database; and consolidate or pick the best value from conflicting inputs. Users who need these functions could buy supplemental software, but many will prefer to get them in a single package.
In short, SSA-NAME3 software is not for everyone. But users with special needs will be glad they took the trouble to find it.
Search Software America products run on Unix, Windows NT and OS/390 servers. Prices are based on the server type and start at $42,000 for SSA-NAME3 and $84,000 for Identity Systems. The original version of SSA-NAME3 was introduced in 1986, and the system now has 500 installations.