Norbert Technologies Corworks
promised to provide this access, then, more famously, data warehouses claimed they would make it widely available.
While no knowledgeable person ever doubted the hard work needed to actually implement these systems, the problem seemed to have been solved in principle, and thus lost much of its conceptual interest.
Unfortunately, providing data access is like curing a disease -- doing it "in principle" isn't much use if you can't make real individuals feel better. For all the undoubted joys and occasional actual successes of data warehousing, most marketers still can't get all the data they want whenever they want it.
It's no surprise, then, that a new batch of solutions now promises to really, truly make data access easy. No bothersome fussing over database design or data quality analysis or maintenance procedures -- the tasks that make traditional warehouse projects drag on for months or years. Just load all the data you want and access it any way you like in minutes or hours or days. Here is Nirvana at last.
Norbert Technologies (800/915-2666; www.norbert-tech.com) is one of several vendors making some version of this claim. Others include Broadbase (650/614-8300; www.broadbase.com), Digital Archaeology (913/438-9444; www.digarch.com) and Sand Technology (514/939-3477; www.sandtechnology.com). All take a broadly similar approach of combining easy-to-use data loading tools with a high-performance, proprietary database engine. All are also careful not to anger the data warehouse gods by actually claiming to be an alternative to a full-scale warehouse implementation. Indeed, most adopt the warehouse theorists' label for a temporary, flexible analysis platform -- "exploration warehouse." But while warehouse theory requires that exploration data come from a pre-existing corporate warehouse, these vendors often cite examples where data came directly from operational sources: a heresy with irresistible practical advantages.
Despite the general similarities among these systems, Norbert's product, Corworks, belongs to a different class. The others all focus on real-time data analysis: Broadbase and Sand using standard Structured Query Language tools and Digital Archaeology with its own analysis and reporting software. Corworks data cannot be analyzed directly, beyond some simple summary statistics; instead, users create extracts that are loaded into standard formats including relational database tables, spreadsheets and SAS.
The extracts are generated in a sequential pass against the Corworks data set. Proprietary technology compresses the data to about one-tenth its original size, allowing the system to pass about 5 million records per minute. Additional time is needed to write any extracted data: in one test, a two gigabyte extract took 11 minutes. The largest current Corworks installation achieves
about 90 percent compression on four terabytes of input.
Corworks data can be also be manipulated without creating extracts, through "callable modules" that execute programs written in C, Cobol or SAS. These modules look up specified data elements in Corworks, execute an external program such as calculating a model score, and then store the result back in the Corworks data set. This simplifies the use of other systems to process Corworks data, although it still does not allow direct external access to individual records.
Corworks can mimic the multitable data structures of a conventional relational database. Users define table relationships when the data is originally loaded and can merge unrelated sources so long as a common key is available. Users can extract subsets of a file based on logical constraints and Nth or ranked samples. They can also combine data from multiple Corworks data sets, taking advantage of special functions to prepare data for time series analysis.
Building Corworks data sets is relatively painless because the system automatically reads input in different formats, capturing the data structure and key statistics such as minimum and maximum values per field. Users can define views that control how the data will later be presented. In addition, they can deploy a rich set of transformation capabilities to perform quality checks, make changes and create derived variables. The system works with relational database tables, flat files, mainframe data sets, spreadsheets, PC files and other formats. There is no ability to update a Corworks file by changing existing records, although users can append new records to an existing data set.
Corworks also includes a separate householding module that can identify records belonging to the same customer or household. This runs after the data is compressed, providing substantially faster performance than systems that run on uncompressed data. This is particularly important to businesses with very large databases, such as credit card issuers. The householding module lets users apply prebuilt matching rules to fields they specify. Rules can have different "tightness" settings and users can force matches on specific data elements. The system can assign the same ID to a given household during successive builds, allowing users to track household performance over time.
It will also identify a customer even if the associated account numbers change -- say, because of a lost or stolen credit card -- or if the customer drops out of the system for several cycles. The module can do some types of validation against external files, such as checking for a telephone number against a national directory. But it relies on third-party software for address parsing and postal standardization.
Corworks was introduced in 1996, and grew out of consulting engagements by Norbert's principals. The system is installed at 12 sites, primarily large financial institutions. Pricing is based on the modules purchased and number of processors in the system it will run on. The entire system including householding starts around $350,000 for a one-time license; individual modules can also be purchased separately. The file building module runs on any system, householding runs on mainframe and Unix systems, and data extraction tool runs on Unix and NT.
David M. Raab is a consultant specializing in marketing systems and analysis. He is based near Phildelphia.