Yahoo to Share 13.5 Terabytes of User Data With Universities

Share this content:

The dataset drills deep into the details of 110 billion interactions by 20 million users.

Yahoo announced what it calls the largest-ever public release of a machine learning dataset, some 13.5 terabytes worth of user interactions on the news feeds of assets such as Yahoo News, Sports, Finance, Movies, and Real Estate. But digital marketers shouldn't get too excited just yet. For now, Yahoo is offering the huge data dump only to the academic research community (like MIT's Computer Science and Aritificial Intelligence Lab, pictured above).

"Many academic researchers and data scientists don't have access to truly large-scale datasets because it is traditionally a privilege reserved for large companies," said Suju Rajan, director of research, Yahoo Labs, in press release. "We are releasing this dataset for independent researchers because we value open and collaborative relationships with our academic colleagues, and are always looking to advance the state-of-the-art in machine learning and recommender systems."

The data consists of details on 20 million users and 110 billion events occurring between February and May 2015. Information such as age, gender, and region for a subset of the anonymous users is provided. On the item side, the title, summary, and key phrases of the news articles in question are also included. Events are time-stamped and contain some information about the types of devices used for access.

"The release of this large Yahoo News Feed dataset will be a tremendous asset for the academic research community, and for us at UMass particularly, given our major research activities in natural language processing, information retrieval, databases and computational social science," said Andrew McCallum, director of UMass's Center for Data Scientist.

Whether and when the embattled Yahoo will share such treasured data with commercial enterprises remains to be seen.


Loading links....

Sign up to our newsletters

Company of the Week

We recently were named B2B Magazine's Direct Marketing Agency of the Year, and with good reason: We make real, measureable, positive change happen for our clients. A full-service agency founded in 1974, Bader Rutter expertly helps you get the right message to the right audience at the right time through the right channels. As we engage our clients' audiences along their journey, direct marketing (email, direct mail, phone, SMS) and behavioral marketing (SEM, retargeting, contextual) channels deliver information relevant to the needs of each stage. We are experts at implementing and leveraging marketing technologies such as CRM and marketing automation in order to synchronize sales and marketing communications. Our team of architects and activators plan, execute, measure and adjust in real time to ensure the strategy is working as needed and change things if it's not.

Find out more here »

DMN's Career Center

Check out hundreds of exciting professional opportunities available on DMN's Career Center.  
Explore careers in digital marketing, sales, eCommerce, marketing communications, IT, data strategies, and much more. And don't forget to update your resume so employers can contact you privately about job opportunities.

>>Click Here