Yahoo to Share 13.5 Terabytes of User Data With Universities

Share this content:

The dataset drills deep into the details of 110 billion interactions by 20 million users.

Yahoo announced what it calls the largest-ever public release of a machine learning dataset, some 13.5 terabytes worth of user interactions on the news feeds of assets such as Yahoo News, Sports, Finance, Movies, and Real Estate. But digital marketers shouldn't get too excited just yet. For now, Yahoo is offering the huge data dump only to the academic research community (like MIT's Computer Science and Aritificial Intelligence Lab, pictured above).

"Many academic researchers and data scientists don't have access to truly large-scale datasets because it is traditionally a privilege reserved for large companies," said Suju Rajan, director of research, Yahoo Labs, in press release. "We are releasing this dataset for independent researchers because we value open and collaborative relationships with our academic colleagues, and are always looking to advance the state-of-the-art in machine learning and recommender systems."

The data consists of details on 20 million users and 110 billion events occurring between February and May 2015. Information such as age, gender, and region for a subset of the anonymous users is provided. On the item side, the title, summary, and key phrases of the news articles in question are also included. Events are time-stamped and contain some information about the types of devices used for access.

"The release of this large Yahoo News Feed dataset will be a tremendous asset for the academic research community, and for us at UMass particularly, given our major research activities in natural language processing, information retrieval, databases and computational social science," said Andrew McCallum, director of UMass's Center for Data Scientist.

Whether and when the embattled Yahoo will share such treasured data with commercial enterprises remains to be seen.


Sign up to our newsletters

Company of the Week

Since 1985, Melissa has helped thousands of companies clean, correct and complete contact data to better target and communicate with their customers. We offer a full spectrum of data quality solutions, including global address, phone, email, and name validation, identify verification - available for batch or real-time processes, in the Cloud or on-premise. Our service bureau provides dedupe, email/phone append and geographic/demographic append services for better targeting and insight. For direct mailers, Melissa offers easy-to-use address management/postal software, list hygiene services and 100s of specialty mailing lists - all with competitive pricing and excellent customer service.

Find out more here »

Career Center

Check out hundreds of exciting professional opportunities available on DMN's Career Center.  
Explore careers in digital marketing, sales, eCommerce, marketing communications, IT, data strategies, and much more. And don't forget to update your resume so employers can contact you privately about job opportunities.

>>Click Here

Relive the 2017 Marketing Hall of Femme

Click the image above