It's summertime: tune-up that database

Share this content:
Data Quality's Dirty Little Secret
Data Quality's Dirty Little Secret

With the summer doldrums officially upon us, this is a great time to give that marketing database a thorough cleaning and tune-up, so it's in top-notch condition by the time the busy holiday season is upon us. Much like the car we drive to work every day, our marketing database is largely taken for granted unless it starts doing something weird. But just as your car needs periodic oil changes, filters, spark plugs, and tire pressure adjustments; databases run better when given a little TLC. Here are some things you can do to rev up your database for the busy fall and holiday seasons.

Take out the trash

All databases contain garbage data, such as bad email addresses, invalid phone numbers, dates, or dollars that are wildly out of range. Here are a few tricks that can make finding bad data easier:

  • Run frequency counts on fields that are supposed to be unique, like email addresses or phone numbers. To avoid being inundated with output, you can instruct the database to only return cases where the data is not unique. Often you'll see phony addresses such as “” or phone numbers such as “999-999-9999” assigned to several different consumers.
  • Ask the database to display the maximum and minimum value of each field, filtering out blanks, nulls, and zeros. Because special characters sort either before or after normal letters and digits, these often show up in your output. If you're feeling especially ambitious, there are advanced techniques for finding garbage characters embedded within strings. Also on occasion you'll see dates that are centuries old (or centuries in the future), or numbers that are obviously out of range.

As a general rule of thumb, I like to replace bad string data with blanks, bad numeric data with zeros, and bad dates with nulls. This helps prevent incorrect query output—also referred to as garbage in, garbage out.

Remove the junk

Most databases I've seen contain tables that nobody uses, and no one knows what they're for or who put them there (usually a fellow named “Not Me”). Much like the junk in my garage, these seem to sit around forever; cluttering up the database and forcing users to work around them when building queries. Unless you're 100% sure, before deleting a table you should rename it and see if anything breaks. Once a week or so has gone by without incident, copy the data onto a backup media just to be sure, and delete the table.

Adopt the orphans

Because marketing databases obtain most of their data from other systems, they can get out of sync when system or timing problems occur. In the database world, an orphan is a logical impossibility that happens on an astoundingly regular basis: orders without matching consumers, orders without products, products without orders, and so on. The best way to handle orphans is to search for their “parents” back in the originating system, and reload them as necessary, causing the orphans to be adopted.

Watch out for blankety-blanks

Character strings such as name and address data often get loaded with trailing spaces. This commonly happens when data files are imported into the database, especially files where each data element starts in a specific column (known as “fixed length” files). So, why is this a big deal? Because those blankety-blank blanks take up room just like a printed character, and the more space you waste in a database, the slower it gets. The problem is astonishingly commonplace, can eat up hundreds of megabytes of storage, and often is quite tricky to detect. That's because many databases ignore trailing blanks when performing comparisons; a city name with no trailing spaces is considered equal to the same city with ten trailing spaces, and since everything seems to be working, nobody knows there's a problem.

One way to spot this condition is to compare the display length of a string with its physical storage length. Unfortunately, there is not a consistent way to do this across database platforms. SQL Server provides a function called LEN to obtain a string length without counting blanks at the end, and another called DATALENGTH that does count them. With other systems, you can compare a string length against the same string length after trimming trailing blanks. So whenever LEN(string) is not equal to DATALENGTH(string) or LENGTH(string) is not equal to LENGTH(trimmed string), you've got a blankety-blank problem.

Right-size your data

In ancient times—before Bill Gates single-handedly slayed the dinosaurs and popularized the PC—adding or changing data elements within a database used to be a big deal. Accordingly, database architects approached design with a conservative mind-set; usually including extra, unused “filler” fields for future expansion and allowing extra space on fields that they thought might grow larger.

In today's rapidly changing Big Data world, this is no longer necessary or recommended. Because marketing databases quickly become enormous monstrosities, wasting even a few bytes on every transaction can mean terabytes of squandered storage. Just as with trailing spaces, the more data that needs to be sifted through, the longer queries take to run. Why use a 10 character field if you're only using two? Take advantage of this period of relative quiet by making sure that the field lengths you're using are appropriate for the data they contain, and get rid of fields that aren't used.

Archive, clean, merge, and purge

If you haven't done so for a while, consider removing or archiving old data (i.e. customers you no longer actively market to). After doing this, it might be time to run a full NCOA and merge to make sure you're targeting the right consumers with the right message, and almost as important, excluding the right consumers.

Compress the rest

Much like defragging your hard drive, most databases offer a command to reorganize their contents. This basically sorts and moves data around so that processing it becomes faster and more efficient. It also gets rid of wasted space by compressing data files. Depending on your database, the command for doing this might be SHRINK or REORGANIZE, although one system uses the command VACUUM—ensuring that nobody wants to do it. At any rate, reorganizing is easy to run and yields nice dividends in improved performance.

So once you've taken out the garbage, hauled the junk away, zapped the blankety-blank blanks, archived, cleaned, vacuumed, dusted, and mopped; your database will be in great shape for the busy fall and holiday seasons. As an added bonus, a well-maintained database looks great in a swimsuit, and helps keeps your mind off the summer heat!

Jeff Fowler is president and
founder of Decision Software

Loading links....

Sign up to our newsletters

Company of the Week

Since 1985, Melissa Data has helped thousands of companies clean, correct and complete contact data to better target and communicate with their customers. We offer a full spectrum of data quality solutions, including global address, phone, email, and name validation, identify verification - available for batch or real-time processes, in the Cloud or on-premise. Our service bureau provides dedupe, email/phone append and geographic/demographic append services for better targeting and insight. For direct mailers, Melissa Data offers easy-to-use address management/postal software, list hygiene services and 100s of specialty mailing lists - all with competitive pricing and excellent customer service.

Find out more here »

DMN's Career Center

Check out hundreds of exciting professional opportunities available on DMN's Career Center.  
Explore careers in digital marketing, sales, eCommerce, marketing communications, IT, data strategies, and much more. And don't forget to update your resume so employers can contact you privately about job opportunities.

>>Click Here