QwestDex Accelerates Data Processing System
In a case study, QwestDex said that its use of Applied Semantics' automated categorization system helped classify the new data into 4,500 categories. QwestDex's DotComDirectory.com business unit keeps a database of online businesses, numbering in the millions, that QwestDex customers use for direct marketing. Records are grouped into categories so clients can easily select which segment of the database they wish to target.
Adding records to the database can be challenging because of the need to place each record into a specific category. Manual organization of the new data would have taken years, the company estimated.
Database-management systems exist that can group the data automatically. These "learning by example" systems use manually inputted sample records to build a model for each category, and group the data based on similarity to the models.
However, such systems typically need at least 50 records inputted per category, and sometimes need as many as 200 for high accuracy, said Gil Elbaz, acting CEO of Applied Semantics. Even using 50 samples, QwestDex would have had to input 225,000 records manually for such a categorization system to work, nearly 10 percent of the whole database.
Applied Semantics estimated such a task would have required 470 person-days of labor. However, in early discussion with its future client, Applied Semantics told QwestDex it could do the job in two weeks.
Applied Semantics' method differs in that it does not require manual inputting of sample records. Its system has a database of words and terms grouped into 500,000 "concept sets" or categories that can be used to draw similarities between records.
The system is organized something like a thesaurus, with an entry for each word containing dozens of synonyms. The database of terms contains 1.5 million entries and uses two gigabytes of memory.
Applied Semantics lexicographers matched the 4,500 categories in the QwestDex database with their appropriate concept sets, thus showing the categorization system which terms to look for when grouping records into categories.
Processing time took four days. Checking and authenticating the data after they had been categorized took the rest of the two weeks.