RSS
热门关键字:  数据挖掘  人工智能  数据仓库  搜索引擎  数据挖掘导论

KDD-CUP-98

来源: 作者:unkonwn 时间:2004-12-11 点击:

KDD-CUP-98

The Second International Knowledge Discovery and
Data Mining Tools Competition
Held in Conjunction with KDD-98
The Fourth International Conference on Knowledge
Discovery and Data Mining


Sponsored by the
American Association for Artificial Intelligence (AAAI)
Epsilon Data Mining Laboratory
Paralyzed Veterans of America (PVA)


Dates | Data Set | Evaluation | Committee | Participants | Results

KDD-CUP is a knowledge discovery and data mining (KDDM) tools competition held in conjunction with the Fourth International Conference on Knowledge Discovery and Data Mining.

数据挖掘研究院

Last year, the KDD-CUP-97 enjoyed worldwide participation of 45 data mining tools. The Gold Miner award was jointly shared by UCSD′s BNB (Boosted Naive Bayes Classifier) software and Urban Science′s GainSmarts software. SGI′s MineSet was the runner-up and has earned the Bronze Miner award. For more information on KDD-CUP-97, please refer to the URL: www.epsilon.com/new. Some of the highlights from last year′s competition are as follows: 数据挖掘研究院

  • The success of the Naive Bayes algorithm (used by 2 of the top 3 contestants)
  • No clear evidence backing the hypothesis that there are "real" returns to incremental data preprocessing activity.

KDD-CUP-98 will follow on the success of last year′s competition. The CUP is again open to all KDDM tool vendors, academics with research prototypes and corporations with significant applications. Attendance of the KDD-98 conference is not required to participate in the CUP. 数据挖掘研究院

KDD-CUP Process and Important Dates

  • Registration and signing of the NDA (Non-Disclosure Agreement), July 1-15, 1998
  • Release of the datasets (learning and validation), related documentation and the KDD-CUP questionnaire July 16, 1998
  • Return of the results and the KDD-CUP questionnaire August 14, 1998
  • KDD-CUP Committee evaluation of the results August 15-25
  • Individual performance evaluations send to the participants August 25, 1998
  • Public announcement of the winners and awards presentation during KDD-98 in New York City August 29, 1998

KDD-CUP Data Set

The data set for this year′s Cup has been generously provided by the Paralyzed Veterans of America (PVA). PVA is a not-for-profit organization that provides programs and services for US veterans with spinal cord injuries or disease. With an in-house database of over 13 million donors, PVA is also one of the largest direct mail fund raisers in the country. 数据挖掘研究院

Participants in the CUP will demonstrate the performance of their tool by analyzing the results of one of PVA′s recent fund raising appeals. This mailing was dropped in June 1997 to a total of 3.5 million PVA donors. It included a gift "premium" of personalized name & address labels plus an assortment of 10 note cards and envelopes. All of the donors who received this mailing were acquired by PVA through premium-oriented appeals like this. The analysis data set will include:

数据挖掘研究院

  • A subset of the 3.5 million donors sent this appeal
  • A flag to indicate respondents to the appeal and the dollar amount of their donation
  • PVA promotion and giving history
  • Overlay demographics, including a mix of household and area level data.

Unlike least year, all available information about the fields will be made available in the project documentation. 数据挖掘研究院

The objective of the analysis will be to identify response to this mailing -- a classification or discrimination problem.

数据挖掘研究院

Performance Evaluation Criteria

The CUP is aimed at recognizing the most accurate, innovative, efficient and methodologically advanced data mining tools in the marketplace.

数据挖掘研究院

The participants will again be evaluated based on the performance of their algorithm on the validation or hold-out data set. The KDD-CUP program committee will consider the following metrics in their evaluations:

数据挖掘研究院

  • Lift curve or gains table analysis listing the cumulative percent of targets recovered in the top quantiles of the file
  • Receiver operating characteristics (ROC) curve analysis and the area under the ROC curve
  • Several statistical tests to ensure the robustness of the results.

Last year, the performance in the top 10 percent of the file was considered as a measure of precision while the performance in the top 40 percent of the file was considered as a measure of stability and marketing coverage. The average performance up to the 40th percentile was also looked at as a measure of overall performance.

KDD-CUP-97 Program Committee

  • Vasant Dhar, New York University, New York, NY
  • Tom Fawcett, Bell Atlantic, New York, NY
  • Georges Grinstein, University of Massachusetts, Lowell, MA
  • Ismail Parsa, Epsilon, Burlington, MA
  • Gregory Piatetsky-Shapiro, Knowledge Stream Partners, Boston, MA
  • Foster Provost, Bell Atlantic, New York, NY
  • Kyusoek Shim, Bell Laboratories, Murray Hill, NJ

Participants

Last year, the CUP enjoyed worldwide participation of 45 data mining tools. This year, it is enjoying worldwide participation of 57 contestants. 18 of the 57 participants have elected to stay anonymous. The software status of those that elected anonymity is as follows:
  • 4 Commercial
  • 4 Freeware
  • 10 Research Prototype.
The following 39 participants wish to be identified.
 SOFTWARE/TOOL/RESEARCH PROTOTYPE        VENDOR/INSTITUTION
 --------------------------------------- --------------------------------------
					
 APN (Adaptive Probabilistic Networks)   Berkeley/SRI/Stanford                
 BAYDA/PRO                               Complex Systems Computation Group   
                                           (CoSCo), University of Helsinki   
 BNB (Boosted Naive Bayes Classifier)    University of California San Diego  
 BPSOM                                   Eindhoven University of Technology  
 CARRL                                   Austrian Research Institute for AI  
 DataBase Mining Marksman                HNC Software Inc.                    
 DataDetective                           Sentient Machine Research            
 DataLamp                                University of East Anglia            
 Discovery Board                         Rutgers University                   
 DMZ                                     Yongwon Lee, Lockheed Martin ATC    
                                           (tool not affiliated with Lockheed
                                           Martin ATC.)			     
 DTI v5.0                                ECCI-University of Costa Rica        
 Enterprise Miner                        SAS Institute                        
 Fragment-Potential                      QueryObject Systems, NY & 	      
                                           Institute for Information 	      
                                           Transmission Problems, Moscow      
 GainSmarts                              Urban Science Applications, Inc.    
 ICL                                     Katholieke Universiteit Leuven       
 IGLUE                                   CRIL                                 
 Information Network                     Tel Aviv University                  
 JABC                                    University of Constance, Germany    
 JAM                                     Florida Institute of Technology &   
                                           Columbia Univeristy                
 JAWS                                    University of Waikato, New Zealand  
 Kepler                                  Dialogis Software & Services GmbH   
 KnowledgeMiner                          Frank Lemke, Script Software         
 KnowMan DataMiner(research version)     Intellix / Riso National Laboratory 
 LPDT                                    Rensselaer Polytechnic Institute    
 MineSet                                 Silicon Graphics, Inc.               
 Mixtures of Trees                       Massachusetts Institute of Technolog
 Model 1                                 Unica Technologies, Inc.             
 ModelQuest Enterprise                   AbTech Corp.                         
 Otis                                    Randy Kerber, NCR (tool not affiliat
                                           with NCR)                          
 PolyAnalyst                             Megaputer Intelligence Ltd.          
 QS                                      Iona Corp.                           
 Rdt/Db                                  Informatik LS VIII, Universitaet Dortmund
 SENN Sales                              Siemens Nixdorf Business Service         
 The Shrunken-Belly Method               Edward Malthouse, Northwestern University
 TILDE                                   Katholieke Universiteit Leuven            
 Tutti 0.1                               Tampere University of Technology         
 WARMR                                   Katholieke Universiteit Leuven            
 WhiteCross HeatSeeker                   MRJ Technology Solutions/WhiteCross      
 WizWhy                                  WizSoft                                   
 数据挖掘研究院 

REGISTRATION BROCHURE

All participants are required to complete the application form below and send it in plain ASCII format to (e-mail preferred):
+-----------------------------+
| Ismail Parsa                |
|                             |
| Epsilon                     |
| 50 Cambridge Street         |
| Burlington MA 01803 USA     |
|                             |
| E-MAIL: iparsa@epsilon.com  |
| V-MAIL: (781) 273-0250*6734 |
| FAX:    (781) 272-8604      |
+-----------------------------+
  
The participants will receive the NDA (non-disclosure agreement) before the July 15, 1998 deadline. Please contact Ismail Parsa if you did not receive the NDA before July 15.

Last year, the KDD-CUP program committee publicly announced the names of only the top 3 performing tools. The names of the 45 participants were not released. This year, although we will again only announce the names of the top 3 performing tools, we will make the list of participants publicly available UNLESS THE PARTICIPANTS INDICATE THAT THEY WILL PRESERVE THEIR ANONYMITY BY CHECKING THE APPROPRIATE BOX IN THE REGISTRATION BROCHURE. We think it′s fair for everyone to know who they are competing with. Here is the Registration Brochure in ASCII

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?