A Rose by Many Other Names

Share this content:
Recently, innocent travelers have turned up on terrorist no-fly lists, revealing a major problem with current name-matching and name-searching technology: It's a 19th-century, one-size-fits-all technology that does not account for the various ways cultures treat names.

Anthony Rose, Tony Rose, Tony Rolls, Abdul Rose, Au-Yeung Mei Ro Rose: Any of these could be the actual names referred to by a database entry stored as "A. Rose." Given that every database contains faulty data, many other name referents also are likely, most of which may never be found when searching for "A. Rose."

This is a big problem. Not only is this true for security-related applications that guard against terrorism, but also for direct marketing operations that require vetting of potential customers, retaining the best customers and proper accounting of financial transactions. Moreover, direct marketers are concerned about the proper use of the names of their good customers, so as not to appear insensitive to the differences across names from different cultures.

Why do we have these intractable problems with personal names? The first reason is that there is no dictionary or authority to check. Names must be accepted as presented. However, the proper entry of personal name data is contingent on an extraordinary amount of education about names.

It is at the critical moment of data capture that we have the last chance to enter such names properly, yet it is also at this vital juncture that we paradoxically opt to minimize our attention.

The first attempt to mitigate the disparity across name renderings was patented in 1918. It is called Soundex and originally was designed to help with the analysis of the 1890 census data. Despite many studies illustrating the failings of this simple key-based approach to retrieving similar-looking names, it remains one of the most common techniques in use today. (Soundex comes standard with most database products.)

The continuing interest in Soundex and its many derivative forms of key-based technology results from two brutal facts: Names are complex, and computer programmers are under great pressure to do something to address the following issues:

· Spelling variations. Soundex focused on trying to neutralize certain spelling variations across separate name elements. For example, by using Soundex keys, Anderson and Andersen suddenly matched. But many other names that are clearly unrelated also were retrieved (Amaturk has the same Soundex code as Anderson).

· White space. The blank space is a huge problem for computer systems: Degomez and De Gomez do not match in most computer search systems. But Abd El Rahman and Abdurrahman are exact matches, as are Guanlu and Guan Lu. Knowing when and how to make these matches is hard without recognizing the cultural origin of a name.

· Syntax and name models. Even if neutralizing spelling variations within data fields can be effectively accomplished in your database and the applications that use it, an even more pernicious problem looms. Our standard model for names (first, middle, last) is not universal, and causes us tremendous problems with data entry, retrieval and data sharing. The most effective and only truly universal solution is to enter names into databases that have only two fields for names: given name and surname.

· Cultural issues with personal name data. How a culture changes names according to social customs - such as marriage and religious ceremonies - is another complicating factor that attends the entry of personal name data. These complex issues never occur in isolation. They compound each other, especially when names are transported across systems with different definitions. Yet, even within isolated systems, rampant problems exist that often have stayed hidden for decades.

New technologies are emerging called name-recognition software that are knowledge-based and encapsulate the way names work around the world.

For example, there are tools to identify the cultural classification and gender of personal names. Other tools take the form of character-oriented, name-searching engines that provide ranked search results based on linguistic and cultural variation patterns. There also are tools to rank search results based on similarities of pronunciations, not just similarities of spellings.

Name-hygiene tools enhance the consistency of name-data retrieval by applying culture-specific rules to names and identifying basic structural elements in a name. Variation tools generate a set of possible alternative Romanized spellings for names. Equivalence tools generate a table of how names appear across multiple cultures. There also are tools to help database users unlock the meanings of names and their spelling variations when transcribed to the Roman alphabet.

All these tools help direct marketers better understand their prospects and customers as well as better meet the needs of females, males or specific cultural groups within their customer base.

Though technology can help a lot, practicing data ecology is a first step for everyone. In its simplest form, data ecology is nothing more than an awareness of the environment for data and a commitment to ensure their initial value for future generations. Because accurate data are the lifeblood of every direct marketer, here are three fundamental tips for practicing data ecology:

· Data stewardship requires that every DMer who handles personal name data be aware of the fragility of this precious information. It is most critical at data entry, often the last chance that accurate data entry validation may be secured from the actual owner of the name. At every subsequent stage of processing and access, everyone is "downstream" of this vital operation.

· Every effort must be made to understand, calibrate and validate the exchange or distribution of personal name data. It is incumbent on those who deliver, as well as those who receive, such data to ensure the integrity of the process.

· Direct marketers should never change original data. Instead, administrators should provide a method for adding "See Also" entries - associated records for PN data that appear to be faulty. Doing so will keep the data ecosystem in balance.

By each direct marketer doing his part in an organization, the data environment can be improved exponentially. The byproduct of such an environment is marked improvements in customer interactions and security as well as long-term customer relationships.


Next Article in Data/Analytics

Sign up to our newsletters

Company of the Week

PAN Communications is an award-winning integrated marketing and public relations agency for B2B technology and healthcare brands. PAN's data-driven approach allows the firm to specialize in public relations, social media, content and influencer marketing, and data and analytics. PAN partners with brands to create unique, integrated campaigns that captivate audiences and drive measurable results. PAN services clients out of the firm's four offices: Boston, San Francisco, New York City and Orlando.

Find out more here »

Career Center

Check out hundreds of exciting professional opportunities available on DMN's Career Center.  
Explore careers in digital marketing, sales, eCommerce, marketing communications, IT, data strategies, and much more. And don't forget to update your resume so employers can contact you privately about job opportunities.

>>Click Here

Relive the 2017 Marketing Hall of Femme

Click the image above