RSS
热门关键字:  数据挖掘  数据仓库  商业智能  人工智能  搜索引擎

Web Spam Collections

来源: 作者:互联网作品 时间:2007-02-11 点击:

Status

This page is related to research on Search Engine Spamming at Yahoo! Research Barcelona. Currently we are hosting a reference collection for Web Spam Research, that is being used for the Web Spam Challenge 2007. 数据挖掘研究院

Datasets

The goal of our dataset activity is to make available reference collections that should be: 数据挖掘研究院

  • Large: the collections should include many examples of spam and non-spam content.
  • Clean: the collections should contain little classification errors.
  • Uniform: the collections should represent a uniform random sample over a set of pages or hosts.
  • Broad: the collections should include as many different Web spam aspects as possible.
  • Open: the collections should be freely available for researchers.

     

    数据挖掘实验室

A first such collection was generated at the Università di Roma "La Sapienza" and is currently hosted by Yahoo! Research Barcelona. See datasets >>.

数据挖掘实验室

Code

There is some source code available, corresponding to Truncated PageRank and Adaptive Estimation of Supporters, the algorithms proposed in a WebKDD'06 paper.

数据挖掘研究院

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?