Trading in Trillions of Tweets

The great Library of Alexandria was the center of earthly knowledge in the 1st Centuries B.C. and A.D., and when it was burned, papyrus scrolls penned by some of the founding fathers of engineering, physiology, geography, and medicine were lost. Here in the digital age it’s said that Internet data is forever. Perhaps demonstrating that better than anyone is Crimson Hexagon.

The social media analytics company announced this week that the collection of posts in its social media library had topped 500 billion, dating back to the company’s founding in 2008. Compared to the Library of Alexandria’s centuries of compiled knowledge, six years doesn’t seem like much, but Crimson Hexagon disagrees.

“We made an interesting and different decision to collect and store data from the beginning, and 2008 is a long time ago in the social media world,” says Elizabeth Breese, the company’s senior content and digital marketing strategist. “Using our library, in which all keywords are indexed, marketers have the ability to run market research analyses, using historical data to inform current campaigns.”

Residing in the library are posts from blogs, forums, consumer review sites, news sites, Facebook, Instagram, YouTube, Google+, and Sina Weibo. Since 2010, Crimson Hexagon has been indexing the full Twitter firehose.

The import of this collection is brought home by Crimson Hexagon client Richard Ng, who as VP of insights and intelligence at Edelman Digital maps out markets for clients and associates to help them build decision models that influence nearly all their actions in their industries.

“Having those half a trillion posts is incredibly unique. It gives us immediate access to marketing memory,” Ng says. “Instead of being myopically devoted to real time, we get to use hindsight to see how the world has actually changed. A lot of the software providers are focused on giving marketers data porn. But the data’s not important in some ways. It’s the contextualization into groups and categories that’s important.”

Crimson Hexagon’s archival approach to social media analytics springs from the company’s beginnings in academia. The company’s Chief Scientist and Cofounder, Gary King, is a full professor in the Government Department at Harvard University and is the director of its Institute for Quantitative Social Science. Much of his work is based in nonparametric modeling that employs descriptive and inferential statistics along with the usual probability distributions. He studies what makes people vote based, not just on averages of their actions, but on how their belief systems make them vote differently at different times.

Ng provides a marketing corollary for nonparametric modeling. “Say I analyze historical posts and I find that people who are high frequency potato chip buyers always talk about going to the gym. They eat them after they work out,” he posits. “You’re starting to map out how people actually socialize. You’re going back to very traditional marketing theory, actually trying to reconcile these digital avatars to what people really want to do.”

Crimson Hexagon, says Breese, is still plumbing the depths of its library and will slowly introduce new features of its software. Earlier this year, it introduced Affinities, a feature that identifies audience interests and compares brand affinities to the interest segments of Twitter authors in general. It’s now working on developing further insight-generating analytics for audience segmentation and image processing. “We’re using our data library to build technologies that capture the trends within interest segments,” Breese says.

It’s also quickly adding to its social library collection. Crimson Hexagon currently indexes a billion new posts every two days.