List Performance Depends On Data Quality

List performance is a function of selectivity, deliverability and data quality.

Selectivity refers to the criteria used to make a particular selection. Deliverability means the mailing piece actually reached the mailbox where it was intended to go. A number of factors can influence the deliverability of a name and address.

Data quality tends to have many varying definitions depending on the specifics of a mailing or program. Sometimes, the response rates are the only measurement available. Mailers don't always having a clear understanding of acceptable data quality in list compilation and selection pre-processing.

Here are some data quality drivers:

Accuracy. If household income is used in a mailing selection, how likely is it that the number is correct? How close is close enough?

Data coverage. How many records in the master mail file carry known values for key data elements? Are there misleading default values that are used when the real values are unknown?

Stability. How often does this information change? With each list update? Does the stability of the data depend on how it was received?

Multiple sources. Studies have shown that information confirmed by more than one source is more likely to be correct. How many different data sources have confirmed or contributed to the data carried on the mailing list?

Renewability. Some data is static — once a birth date is applied to a persons record it's not likely to change — but the majority of data elements change over time such as changes in occupation, marriage, the birth of children or other life events. Has the information used to select records for a mailing been updated recently?

Use constraints. Data may be available, but are there limitations on use? Some types of data have restrictions on use, like state source vehicle ownership data. And does the seller have legal authority to sell the data?

Data quality equals accuracy times coverage.

Coverage can be easily measured for key data elements, usually right in the list compilation process. Relevant coverage can be defined as the number of records with known values for a data element divided by the total number of records on the list. Accuracy can be measured through periodic telephone or mail surveys of samples from a list. Depending on the specific goals of a particular mailing or process, other definitions of data quality may apply.

The major culprits of poor list data quality can be collapsed into two main categories: problems related to data collection practices and problems in data compilation business rules.

One key to building a high-quality mailing list is to make sure the data that goes into the list is reliable. The old adage “garbage in, garbage out” certainly applies here. An unreliable data source can damage the quality of an entire list.

The freshness of data also plays an important role in the quality equation. Insist that data suppliers provide a date that indicates how recently their information has been verified, and use that date to identify the freshness of the information.

Don't throw everything into the list, just in case it might be useful some day. Information that doesn't get used costs money, slows processing and just takes up space. Use judgment in selecting and including data that is consistent with the purposes of the list.

If good data collection practices have been used, then equally good business rules can help maintain data quality. Sometimes, different data sources can supply the same type of information, and sometimes the sources do not agree. Consistently applied strategies can help ensure list data quality.

Every data source file should go through name and address data cleansing and standardization processing prior to any attempt at merging with the master list or with other new data sources. Nicknames can be converted to full names for merging while still maintained on the file for selection purposes, and addresses should be formatted into consistent forms.

Where appropriate, movers should be updated with their new addresses. Keep in mind that things like home value are not candidates for moving to the new address, but that most individual level data will go with the household or individual in a move.

Just as the name and address data is standardized, the new data elements should be verified. Invalid data values should be eliminated. Too many invalid values should raise questions about the value of the entire data source. Reformat the data elements into a consistent format — birth dates might be carried as year, then month, then day, or whatever works in your list, but be consistent across all sources going into the list. Unstandardized data elements are difficult to work with and generally add little value to a list.

Compare new data to existing data, looking for similar patterns. Measure coverage and residual records. Coverage shows how well the new source lines up with your existing file, and residuals show how many new records you might be able to add to your list by including the new source. A new source of data should fill holes as well as confirm existing data.

Verify the need for new types of data with data users — the people who sell lists know what they can use, often better than those in the data acquisition, evaluation and compilation areas. Include them in the assessment of the new data's potential value.

Use graphs to show the differences between different data sources. For example, build a graph showing the age ranges covered by several sources of age data. Some may be stronger than others in one or another age group, and some may have a wider distribution across several age groups. A graph will clearly bring out this information and help the decision making process.

Measure the changes in volume from each data source that goes into the list. Compare the distribution of values received for key variables. Significant changes from one update to the next can indicate problems, and quick identification can help save costs.

Sometimes differences make sense — a source may have supplied additional records because they now collect data in a wider geographic area. Don't assume, investigate.

Track trends in volume, too. If a source continually is dropping in volume, you may need to locate an alternative source for that data. Knowing sooner gives you an advantage.

Rather than waiting until someone points out problems with your list, perform periodic list data quality audits. Select a random sample from your list and survey the households on key elements. Use a large enough sample to identify differences in accuracy at the data source level. Compare the survey results to the data you have on the list to identify which data sources are more accurate. Use this knowledge to refine business rules to prioritize better sources over less accurate ones.

While several factors influence whether the mail gets delivered, the number of active contributing sources for a household is certainly a key. As the number of active contributing sources confirming a name and address increases, the incremental improvement in deliverability is relatively small. The key message is that records with multiple confirming sources are more likely to be deliverable.

Rick Dwornick is manager of data evaluation for the Polk Company, Denver.

Related Posts