RSS
热门关键字:  数据挖掘  数据仓库  商业智能  人工智能  搜索引擎

MUC Evaluations and dataset

来源: 作者:unkonwn 时间:2004-12-10 点击:

Since early 1990, the MUC evaluations have been funding the development of metrics and statistical algorithms to support government evaluations of emerging information extraction technologies. In the mid-nineties MUC evaluations began to provide prepared data and task definitions in addition to providing fully automated scoring software to measure machine and human performance. The tasks grew from just production of a database of events found in newswire articles from one source to the production of multiple databases of increasingly complex information extracted from multiple sources of news in multiple languages. The databases now include named entities, multilingual named entities, attributes of those entities, facts about relationships between entities, and events in which the entities participated.

The results of these evaluations were reported at conferences during the 1990′s where developers and evaluators shared their findings and government specialists described their needs. These conferences were called "Message Understanding Conferences (MUC)" as a results of the use of such technology to process military messages. The multilingual portion was known as "Multilingual Entitity Task (MET)" The proceedings of these conferences have all been published, the last of which appears on this website. All previous proceedings were published in bound form by Morgan Kaufmann Publishers.

数据挖掘研究院

MUC Data Sets

For each evaluation, ground truth had to be established to determine the reliability of the participating systems. Datasets were typically prepared by human annotators for training, dry run test, and formal run test usage. These datasets are now being made available wherever possible on this website.

The texts used for MUC 6 and MUC 7 are copyrighted materials and are only available through the Linguistic Data Consortium (LDC) for a small fee. The texts are available as: newswire articles for MUC-6 (MUC-VI Text Collection), and newswire articles for MUC-7 (North American News Text Corpora). 数据挖掘研究院

Contact the LDC for licensing of the texts and request the public domain prepared datasets used in MUC and the MUC scoring software. The MUC 3 and MUC 4 Data Sets are provided completely free of charge courtesy of FBIS (Federal Broadcast Information Services). The MET 2 Data Sets are provided completely free of charge courtesy of the US Government. They are available here in compressed and TAR′ed format. 数据挖掘研究院

MUC 3 and MUC 4 Data Sets

数据挖掘研究院

MET 2 Data Sets

Note: If you see the data, rather than a dialog box, then download the file and save it before uncompressing and un TARing the file. 数据挖掘研究院

上一篇:信息抽取相关词语定义
下一篇:没有了
最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?