RSS
热门关键字:  数据挖掘  人工智能  数据仓库  搜索引擎  数据挖掘导论

KDD Cup 2000 ( 包含数据集 )

来源: 作者:unkonwn 时间:2004-12-11 点击:

KDD Cup 2000

Held in conjunction with the Sixth ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Co-Chairs:

Carla Brodley, School of Electrical and Computer Engineering , Purdue University
Ronny Kohavi , Blue Martini Software
Special thanks to Brian Frasca, Llew Mason, and Zijian Zheng from Blue Martini Software and Ben Bernstein from Gazelle.com
Thanks to Acxiom for providing data enhancements.

Email: kddcup2000@bluemartini.com

Summary talk presented at KDD (8/20/2000)
KDD-Cup 2000 organizers′ report: Peeling the onion. SIGKDD Explorations, 2(2):86-98, 2000

Cups in previous years: KDD Cup 99, KDD Cup 98 (data)

数据挖掘研究院

General Information (updated Apr 2002)

The KDD Cup 2000 domain contains clickstream and purchase data from Gazelle.com, a legwear and legcare web retailer that closed their online store on 8/18/2000.

You are required to sign a non-disclosure agreement in order to receive a password to access the data, although the original restrictions have been dramatically relaxed on Apr 2002 to allow wider use of the data.  Basically, any use of the data is allowed as long as the proper acknowledgment is provided and a copy of the work is provided to Blue Martini Software.

In order to access the data, you must fill out the form on this page . Your username and password will be emailed to you.

When you have received a username and password (see above), you can go to the confidential section of the site, which contains a description of the tasks, the data, background information, and more. 数据挖掘研究院

The reference to the KDD Cup 2001 is as follows (a PDF is available here): 数据挖掘研究院

Ron Kohavi, Carla Brodley, Brian Frasca, Llew Mason, and Zijian Zheng.  KDD-Cup 2000 organizers′ report: Peeling the onion. SIGKDD Explorations, 2(2):86-98, 2000. http://robotics.stanford.edu/users/ronnyk/kddOrganizerReport.pdf

The bibtex entry is:

    @Article{kddcup2000,
    author = {Ron Kohavi and Carla Brodley and Brian Frasca and Llew Mason and Zijian Zheng},
    title = {{KDD-Cup} 2000 Organizers′ Report:  Peeling the Onion},
    journal = {SIGKDD Explorations},
    volume = {2},
    number = {2},
    pages = {86--98},
    url = {http://robotics.stanford.edu/users/ronnyk/kddOrganizerReport.pdf},
    year = 2000}
A paper describing the Blue Martini architecture is available here
Suhail Ansari, Ron Kohavi, Llew Mason, and Zijian Zheng, Integrating E-Commerce and Data Mining: Architecture and Challenges, ICDM 2001.
Please remember the restrictions on the data.

Real Datasets for Association Rule Discovery (updated Oct 2002):http://robotics.Stanford.EDU/users/ronnyk/realWorldAssoc.pdf}}

Three real-world datasets are available. You are required to sign a simple non-disclosure agreement in order to receive a password to access the data. Basically, any use of the data is allowed as long as the proper acknowledgment to Blue Martini Software is provided and a copy of the work is sent (e-mail is fine). For reference, please reference the following article instead of the KDD Cup paper:
Zijian Zheng, Ron Kohavi, and Llew Mason, Real World Performance of Association Rule Algorithms, KDD 2001.

The bibtex entry is: 数据挖掘研究院

    @inproceedings{ zheng-kohavi-mason-real-assoc,
    author = "Zijian Zheng and Ron Kohavi and Llew Mason",
    title = "Real World Performance of Association Rule Algorithms",
    booktitle = "Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
    editor={Foster Provost and Ramakrishnan Srikant},
    pages={401--406},
    year = 2001,
    url = {
Note, a long version of the oroginal paper is available as well as the slides.
     
Please remember the restrictions on the data.

KDD Cup Winners: click here for more information

There were five questions at the KDD Cup 2000. The results for Question 2 have been revised. When we calculated the results at Purdue, we had a subtle bug. The bug was uncovered thanks to Yoshinori Yaginuma who calculated his own score using the posted test data. We have corrected for this bug and have posted the new results for Question 2 (11/20/00).

Question 1 Winner: Amdocs ( Paper , Poster )

Given a set of page views, will the visitor view another page on the site or will the visitor leave?

Honorable Mentions: Mui Seng Martin Lee, Chong Jin Ong and S. Sathiya Keerthi of Mechanical and Production Engineering Department, National University of Singapore 数据挖掘实验室
  数据挖掘研究院

Question 2 Winner: Salford Systems, Inc

Given a set of page views, which product brand will the visitor view in the remainder of the session?

Honorable Mentions: MP13 team of Alexei Vopilov, Ivan Shabalin and Vladimir Mikheyev, and the team of Mukund Deshpande, George Karypis, Department of Computer Science and Engineering, University of Minnesota

数据挖掘研究院

Question 3 Winner: Salford Systems, Inc

Given a set of purchases over a period of time, characterize visitors who spend more than $12 (order amount) on an average order at the site.

Honorable Mentions: Orit Rafaely, Tel-Aviv University and Amdocs 数据挖掘实验室

Question 4 Winner: e-steam ( Poster )

Given a set of page views, characterize killer pages, i.e., pages after which users leave the site.

Honorable Mentions: SAS, Amdocs, and LLSoft, Ltd

数据挖掘研究院

Question 5 Winner: Amdocs ( Paper , Poster )

Given a set of page views, characterize which product brand a visitor will view in the remainder of the session?

数据挖掘实验室

Schedule (passed):

  • Data available: 5/20 - now available
  • Question period 1: 5/20-5/30 - passed
  • Test set available: 7/7 - now available
  • Question period 2: 7/10-7/17 - passed
  • Entries due: 7/17 - passed
  • Winners notified: 7/28 - passed
  • Talks due: 8/07 - passed
  • Talks approved by: 8/14 - passed
  • Winners announced: KDD Conference - passed
Summary talk presented at KDD 8/20/2000
KDD-Cup 2000 organizers′ report: Peeling the onion. SIGKDD Explorations, 2(2):86-98, 2000

数据挖掘研究院

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?