KDD-CUP-98
The Second International Knowledge Discovery and
Data Mining Tools Competition
Held in Conjunction with KDD-98
The Fourth International Conference on Knowledge
Discovery and Data Mining
Sponsored by the
American Association for Artificial Intelligence (AAAI)
Epsilon Data Mining Laboratory
Paralyzed Veterans of America (PVA)
Dates | Data Set | Evaluation | Committee | Participants | Results
KDD-CUP is a knowledge discovery and data mining (KDDM) tools competition held in conjunction with the Fourth International Conference on Knowledge Discovery and Data Mining.
Last year, the KDD-CUP-97 enjoyed worldwide participation of 45 data mining tools. The Gold Miner award was jointly shared by UCSD′s BNB (Boosted Naive Bayes Classifier) software and Urban Science′s GainSmarts software. SGI′s MineSet was the runner-up and has earned the Bronze Miner award. For more information on KDD-CUP-97, please refer to the URL: www.epsilon.com/new. Some of the highlights from last year′s competition are as follows: 数据挖掘研究院
- The success of the Naive Bayes algorithm (used by 2 of the top 3 contestants)
- No clear evidence backing the hypothesis that there are "real" returns to incremental data preprocessing activity.
KDD-CUP-98 will follow on the success of last year′s competition. The CUP is again open to all KDDM tool vendors, academics with research prototypes and corporations with significant applications. Attendance of the KDD-98 conference is not required to participate in the CUP. 数据挖掘研究院
KDD-CUP Process and Important Dates
- Registration and signing of the NDA (Non-Disclosure Agreement), July 1-15, 1998
- Release of the datasets (learning and validation), related documentation and the KDD-CUP questionnaire July 16, 1998
- Return of the results and the KDD-CUP questionnaire August 14, 1998
- KDD-CUP Committee evaluation of the results August 15-25
- Individual performance evaluations send to the participants August 25, 1998
- Public announcement of the winners and awards presentation during KDD-98 in New York City August 29, 1998
KDD-CUP Data Set
The data set for this year′s Cup has been generously provided by the Paralyzed Veterans of America (PVA). PVA is a not-for-profit organization that provides programs and services for US veterans with spinal cord injuries or disease. With an in-house database of over 13 million donors, PVA is also one of the largest direct mail fund raisers in the country. 数据挖掘研究院
Participants in the CUP will demonstrate the performance of their tool by analyzing the results of one of PVA′s recent fund raising appeals. This mailing was dropped in June 1997 to a total of 3.5 million PVA donors. It included a gift "premium" of personalized name & address labels plus an assortment of 10 note cards and envelopes. All of the donors who received this mailing were acquired by PVA through premium-oriented appeals like this. The analysis data set will include:
- A subset of the 3.5 million donors sent this appeal
- A flag to indicate respondents to the appeal and the dollar amount of their donation
- PVA promotion and giving history
- Overlay demographics, including a mix of household and area level data.
Unlike least year, all available information about the fields will be made available in the project documentation. 数据挖掘研究院
The objective of the analysis will be to identify response to this mailing -- a classification or discrimination problem.
Performance Evaluation Criteria
The CUP is aimed at recognizing the most accurate, innovative, efficient and methodologically advanced data mining tools in the marketplace.
The participants will again be evaluated based on the performance of their algorithm on the validation or hold-out data set. The KDD-CUP program committee will consider the following metrics in their evaluations:
- Lift curve or gains table analysis listing the cumulative percent of targets recovered in the top quantiles of the file
- Receiver operating characteristics (ROC) curve analysis and the area under the ROC curve
- Several statistical tests to ensure the robustness of the results.
Last year, the performance in the top 10 percent of the file was considered as a measure of precision while the performance in the top 40 percent of the file was considered as a measure of stability and marketing coverage. The average performance up to the 40th percentile was also looked at as a measure of overall performance.
KDD-CUP-97 Program Committee
- Vasant Dhar, New York University, New York, NY
- Tom Fawcett, Bell Atlantic, New York, NY
- Georges Grinstein, University of Massachusetts, Lowell, MA
- Ismail Parsa, Epsilon, Burlington, MA
- Gregory Piatetsky-Shapiro, Knowledge Stream Partners, Boston, MA
- Foster Provost, Bell Atlantic, New York, NY
- Kyusoek Shim, Bell Laboratories, Murray Hill, NJ
Participants
Last year, the CUP enjoyed worldwide participation of 45 data mining tools. This year, it is enjoying worldwide participation of 57 contestants. 18 of the 57 participants have elected to stay anonymous. The software status of those that elected anonymity is as follows:- 4 Commercial
- 4 Freeware
- 10 Research Prototype.
SOFTWARE/TOOL/RESEARCH PROTOTYPE VENDOR/INSTITUTION
--------------------------------------- --------------------------------------
APN (Adaptive Probabilistic Networks) Berkeley/SRI/Stanford
BAYDA/PRO Complex Systems Computation Group
(CoSCo), University of Helsinki
BNB (Boosted Naive Bayes Classifier) University of California San Diego
BPSOM Eindhoven University of Technology
CARRL Austrian Research Institute for AI
DataBase Mining Marksman HNC Software Inc.
DataDetective Sentient Machine Research
DataLamp University of East Anglia
Discovery Board Rutgers University
DMZ Yongwon Lee, Lockheed Martin ATC
(tool not affiliated with Lockheed
Martin ATC.)
DTI v5.0 ECCI-University of Costa Rica
Enterprise Miner SAS Institute
Fragment-Potential QueryObject Systems, NY &
Institute for Information
Transmission Problems, Moscow
GainSmarts Urban Science Applications, Inc.
ICL Katholieke Universiteit Leuven
IGLUE CRIL
Information Network Tel Aviv University
JABC University of Constance, Germany
JAM Florida Institute of Technology &
Columbia Univeristy
JAWS University of Waikato, New Zealand
Kepler Dialogis Software & Services GmbH
KnowledgeMiner Frank Lemke, Script Software
KnowMan DataMiner(research version) Intellix / Riso National Laboratory
LPDT Rensselaer Polytechnic Institute
MineSet Silicon Graphics, Inc.
Mixtures of Trees Massachusetts Institute of Technolog
Model 1 Unica Technologies, Inc.
ModelQuest Enterprise AbTech Corp.
Otis Randy Kerber, NCR (tool not affiliat
with NCR)
PolyAnalyst Megaputer Intelligence Ltd.
QS Iona Corp.
Rdt/Db Informatik LS VIII, Universitaet Dortmund
SENN Sales Siemens Nixdorf Business Service
The Shrunken-Belly Method Edward Malthouse, Northwestern University
TILDE Katholieke Universiteit Leuven
Tutti 0.1 Tampere University of Technology
WARMR Katholieke Universiteit Leuven
WhiteCross HeatSeeker MRJ Technology Solutions/WhiteCross
WizWhy WizSoft
数据挖掘研究院
REGISTRATION BROCHURE
All participants are required to complete the application form below and send it in plain ASCII format to (e-mail preferred):+-----------------------------+ | Ismail Parsa | | | | Epsilon | | 50 Cambridge Street | | Burlington MA 01803 USA | | | | E-MAIL: iparsa@epsilon.com | | V-MAIL: (781) 273-0250*6734 | | FAX: (781) 272-8604 | +-----------------------------+The participants will receive the NDA (non-disclosure agreement) before the July 15, 1998 deadline. Please contact Ismail Parsa if you did not receive the NDA before July 15.
Last year, the KDD-CUP program committee publicly announced the names of only the top 3 performing tools. The names of the 45 participants were not released. This year, although we will again only announce the names of the top 3 performing tools, we will make the list of participants publicly available UNLESS THE PARTICIPANTS INDICATE THAT THEY WILL PRESERVE THEIR ANONYMITY BY CHECKING THE APPROPRIATE BOX IN THE REGISTRATION BROCHURE. We think it′s fair for everyone to know who they are competing with. Here is the Registration Brochure in ASCII

