The ABC's of Data Purchasing
Large purchases are often considered capital expenditures and fall under the same level of scrutiny as other investments. Yet few rules exist on how to evaluate various product offerings and even fewer delineate how they will affect your bottom line.
As a result, data purchasing has become a cloudy area where leaps of faith are more common than rational purchasing decisions. The work that should be performed to find the best product for your needs is often overlooked or unknown. For many companies the question is even more basic: Where do I begin?
The first step should be gaining a better understanding of what data is and where it originates. Consumer data is compiled from a variety of sources, including surveys, phone books and warranty cards. Much of this information is self-reported, resulting in an inherent level of inaccuracy at inception. Consumers often misunderstand survey questions, fill in the wrong blank by accident, or even check boxes to make a happy face on the form.
There is no way to circumvent these inaccuracies or correct them in a systematic fashion. There is no such thing as perfect data. However, much of any compiled data is quite accurate and will fully meet your company's needs -- you simply need to know what you need and test your data for a sufficient quality level as it pertains to your uses.
Measuring Data Quality
Data quality is often described in terms of overall match rate, elemental match rates and accuracy. While important, these should not be the only factors considered when making a purchase. Other important parameters include ease of data interpretation, consistent representation, data delivery and customer service, and turnaround time. A big data warehouse full of great information is of little benefit if you don't know what you have, how to update it, what it really means, and your provider's help desk is always busy.
Capital data purchases will often require competitive bids and proposals that should be reviewed closely. All legitimate data providers should offer to send your test file through their system and help you analyze the results before you make a purchase.
To ensure your purchase meets your needs it is important that a proper test is created. This is more for your own understanding than simply to find the best match rate.
There are four rules that should be applied to create a fair, unbiased and statistically accurate sample of test records -- both to ensure you get data that meets your needs and that you pick a company you trust and can comfortably conduct business with in the future:
o Rule No. 1: Randomly choose all records to be tested from your own internal data set. Do not pick the first 1,000 records in your file or every 10th one. Use a random number generator or table to ensure a truly random sample.
o Rule No. 2: Ensure that each record has an equal chance of being selected. Do not create a test file where most of the people live in New Jersey if you are measuring nationwide data and vice versa. This is one of the most common failings of data tests.
o Rule No. 3: Choose an adequately sized test file to allow for mistakes in the random choice of records and to ensure that the file will not be a problem to test. Below is a rule-of-thumb table for test file sizes for statistical purposes.
Number of Records In Database Test File Size
0 - 100,000 2,000
100,000 - 250,000 10,000
250,000 - 1 Million 50,000
1 Million or more 100,000
o Rule No. 4: Do not run an accuracy test for each element with an exact match as your criteria unless this is absolutely necessary. The level of acceptable error should vary by element and use. Choose a range of accuracy that is within acceptable parameters.
No data is 100 percent accurate, so design your tests to allow for some range of error. Measure the data the way it is intended to be used. For example, take the element "Individual Age" and look below at test numbers for a 1,000 record test file:
Provider Number of Direct Hits Number of Hits +/- 2 Years
One 500 575
Two 300 650
Three 250 800
If you only counted exact matches, you would probably choose Provider One. If you were comfortable being off by plus or minus two years, then Provider Three would provide the best data. If plus or minus two years is acceptable then you would be making the wrong purchase decision by choosing Provider One. Assess your own data needs before you begin shopping.
Evaluating the Results
After following these rules, consider letting an outside company run the test files through the bidding data providers and check the accuracy and analysis with your help -- especially if you do not have inhouse analysts. Measure turnaround time and customer service, as well as the documentation provided with the returned data and your contact's level of knowledge and overall knowledge of their product.
When the results arrive, send the enhanced files to a phone survey company for verification of the results (accuracy testing). One thousand completed surveys should be sufficient for any test. Count on 15 to 20 attempted surveys for every completed survey at a cost of about a dollar per survey attempt. This is the only true test of data accuracy.
Whether you already have a data provider or have never purchased data before, the procedures above are valid. If you are currently buying data and have not extensively tested it, now is a good time to do so. You might find that data from another company may be more appropriate for your needs. If you are new to data purchasing, start off on the right foot. Finally, if you are currently purchasing data you might wish to reassess your data needs every few years and rerun these tests, as both the data and your needs may have changed considerably.
Where do you go from here? Contact data providers who have data that meets your needs and ask them to run a test file. Then contact a survey company to test the accuracy as it pertains to your needs. Analyze the returned files and remember that raw data is only one part of the overall purchasing process. If you follow these steps you are well on your way toward making a great data purchasing decision.
Shawn Harvey is product quality manager, InfoBase Enhancement, Acxiom Corp., Conway, AK.