Database experts have long recognized that analyzing large sets of records, such as marketing databases, is fundamentally different from processing individual transactions. The logical conclusion is that different database engines should be used for each purpose. But while many specialized analytical databases have been developed over the years, none has found wide acceptance.
Instead, most firms have built analytical systems with the same relational database engines they use for transaction processing: products like DB2, Oracle and SQL Server. This has been made practical by advances in processing power, by database vendors' adding analysis-oriented features such as specialized indexes and by analytical data structures. The result is like putting an air spoiler on a pickup truck: You get better performance but nothing like a true race car. Still, most information technology departments are the technical equivalent of a one-car garage, lacking the resources or desire to maintain two different vehicles. And because they spend most of their time racing pickup trucks against each other — that is, evaluating ways to use transaction databases for analysis — they rarely even recognize that vastly faster technologies exist.
But the advantages of specialized analytical databases are so great that they are still worth considering.
DataPulse and MailWizard (DataPulse Inc., 732/577-8115, www.lmlabs.com) are, respectively, data analysis and campaign management applications that run on DataPulse's DP DBMS database engine. DP DBMS applies data compression, column-oriented data storage and entity-oriented data mapping. Respectively, these reduce data storage requirements, load less data with each query and make it easier to combine data from different levels such as customers, transactions and locations. As a result, the system can run on inexpensive hardware and perform complex queries including user-defined calculations. Some analytical systems lack these advantages.
On the other hand, the DP DBMS is not the world's fastest query engine. A query involving multiple fields on 100 million records might take 30 seconds to two minutes in DP DBMS. This is fast, but the quickest analytical products could return the same result in one-tenth the time — particularly if the data were all on the same level. Still, a conventional relational database could take several hours for the same query and might not return a complex multilevel query at all.
Like other analytical systems, DP DBMS must load data into its own format. The system does this at five to 10 gigabytes per hour, depending mostly on the speed of the disk drives. Other analytical systems load at similar rates, or slower if lots of processing is needed.
But load time can still be a problem. Even at a relatively brisk 10 gigabytes per hour, one terabyte would take four days. Fortunately, DP DBMS can load incremental changes or additions to an existing file, rather than starting from scratch after each update. Not all analytical systems share this ability. The load process can also incorporate data cleaning and transformations, and can even flag data that have changed since the prior version. This makes it easier to identify significant events such as a new child or change of address.
Once the data are loaded into DP DBMS, the information is accessed primarily through the system's own interfaces. The system does have an ODBC option to accept SQL queries, but performance is much slower. The system's point-and-click query builder lets users construct expressions by selecting data elements, operators and values. Several expressions can be combined in a group, and groups themselves can be combined and nested indefinitely. The result is a collapsible tree that lets nontechnical users develop complex selections. The system also shows counts for each data element and for each expression, helping users understand what caused their results.
The same query interface is used in the DataPulse analysis tool and MailWizard campaign manager. DataPulse also provides descriptive statistics and multilevel cross tabs. The cross tabs are comparable to multidimensional analysis systems, with nuances including multiple attributes per cell and interactive features to sort data, pivot or hide columns, and expand or collapse rows. But the real advantage is from DP DBMS, which gives fast response without requiring users to predefine which elements, measures or summaries they can access, as in conventional multidimensional tools.
Users can select any set of cross tab cells and transfer the underlying records as a group in the MailWizard campaign manager. Like DataPulse, MailWizard has a plain interface that offers a solid set of capabilities. Users can construct multisegment campaigns, with each segment defined by its own query. Each segment has a budget, unit cost, target quantity and frequency, which lets users schedule it for multiple repetitions. Segments can be split into random, Nth or sequential samples based on a fixed quantity or percentage.
When a campaign is ready for execution, its segments become available to an administrator who builds a mail tape from one or more campaigns. The administrator picks which segments to include and can define global exclusion groups, select only one name per household, eliminate duplicates across segments and exclude customers who have received previous promotions within a specified time. Output format and rules for assigning key codes are defined during system setup. Once the tape is generated, the system updates the campaign and segment statistics with actual mail quantities. It also stores promotion history records, which are available for future queries and analysis.
DataPulse, MailWizard and DP DBMS all run on a Windows 95 or NT server and use Windows workstations. The system was originally created by L&M Technologies, an application developer, for a large direct marketer that has been using it since 1997. L&M created DataPulse to market the products to other users and is just beginning its sales efforts. Pricing begins at $250,000 for the complete system, with additional charges for implementation services.