Not so long ago, customer data integration referred to matching customer records from different sources when assembling a marketing database or data warehouse. It was a challenging task but one with limited implications. After all, the data warehouse was used mainly for analysis, not daily customer interactions. So any consolidation errors were hidden from the customer and most company staff as well.
But customer data integration today extends beyond the traditional batch updates of data warehouses. Companies want to assemble and distribute up-to-the-minute customer data not only for marketing and customer service, but also to meet government requirements for surveillance, privacy, risk management and corporate reporting. With legal as well as financial issues at stake, accurate and efficient consolidation has become urgent.
The basic techniques for real-time customer data integration have been understood for some time. Operational systems need a single, central reference source that combines information from all inputs. The chief difference among competing solutions has been the amount of data stored within this repository. One option is to store all the information, either in a data warehouse or one dominant operational system. The other extreme is to store almost nothing centrally, except possibly links between related records in different systems. Complete customer data then is assembled as needed by querying the source systems directly. Intermediate solutions store some data centrally and gather the rest on demand.
Each approach has advantages. Centralized storage means all data is immediately available and allows more sophisticated consolidation processing because the work can be done in advance. But it requires conforming to the data model of the main operational system, which can be constricting, or copying a great deal of data into a separate warehouse. Distributed storage always returns the freshest data and can access detailed information without the cost of moving and storing it. But on-demand access and consolidation can face performance issues.
Siperian Hub (Siperian Inc., 650/571-2400, www.siperian.com) takes the middle approach. The system builds a consolidated customer record using sophisticated business rules to pick the most reliable information when different sources conflict. This then becomes available to other systems as a master customer record.
Siperian also can capture and present multiple hierarchies used to classify customers, such as households, businesses, geographic regions, product lines and industry groups. A further extension, due by the end of 2005, will help synchronize data among the source systems themselves.
The heart of Siperian Hub is Master Reference Manager. This performs four main functions: importing data from multiple sources, matching records that refer to the same customer, selecting the best data as a customer master and making the master available to other systems.
The import functions are straightforward. A graphical interface lets users map external sources to the hub’s data model. The model is based on Siperian templates but customized as needed for each client. A key feature is that it keeps enough detail to reconstruct original input for audit and rollback purposes. Mapping itself is largely manual, though Siperian does have prebuilt maps for common source systems such as Siebel. Data is loaded from source systems in batch or near-real-time message queues using Web services or enterprise Java beans.
The imported data must be cleansed and matched. These are demanding specialties, so Siperian uses third-party software. Cleansing, such as ensuring postal codes match city and state names, can be handled by products such as Trillium or FirstLogic. For matching, which is more tightly linked with Siperian’s own processing, the vendor has integrated technology from Identity Systems (formerly Search Software America). Siperian provides its own graphical interface to let users review and resolve questionable matches.
Building the master customer record is Siperian’s core expertise. Users set up a “trust framework” of rules that determine which version of each data element to adopt. The rules generate a score for each element, based on its source, recency and syntax (completeness, format, appropriate characters, etc.). It also applies a decay rate that reflects how quickly different values are expected to change: for example, e-mail addresses decay much faster than names.
Scores are calculated separately for each data element. Users also can treat a set of elements as a block to avoid inconsistencies such as mixing the street name from one address with the city from another.
The rules are built during system implementation based on user judgments. Siperian does not provide formal statistical analysis of inputs or rule results. It does have a standard rule set for the pharmaceutical industry and is building sets for financial services and publishing.
Contents of the master customer record will change as new data appears. Siperian provides features to record these changes, trace the original source of each value and roll back to earlier versions if necessary.
The master customer record is stored in a conventional relational database table, usually Oracle. External systems can query the table directly or access it through Web services XML APIs or Java APIs. Siperian also can push changes to message queues where external systems can read them.
Siperian supplements Master Reference Manager with Hierarchy Manager. This links records in multiple, independent hierarchies imported from source data. Starting from a single record, users can navigate its hierarchies using the data steward interface. Other systems also can access the hierarchies via calls to Siperian APIs.
An integrated version of Activity Manager, due in early 2006, will transmit customer activity data from one source system to another. It will manage complex processes, such as creating a new account in one system when a transaction in a different system indicates it is required. Activity Manager will mix Siperian and third-party technology.
Siperian Hub runs on Unix, Linux or Windows servers and integrates with all major application servers. Pricing is based on project scope and complexity and ranges upward of $600,000 for a perpetual license. Siperian released its initial product in 2002 and has 18 clients for Siperian Hub.