Learning from Positive and Unlabeled Examples

 

Recent talk given at Boeing, UIUC, University of Notre Dame, University of Trento (Italy) and University of Siena (Italy) which summarizes the theory and some algorithms.


Text classification is an important problem that has numerous applications. It is commonly stated as follows: Given a set of labeled training documents of n classes, the system uses this training set to build a classifier, which is then used to classify new documents into the n classes. Although this classic model is important, in practice one also encounters another problem. That is, one has a set of documents of a particular topic or class P (positive class), and is given a large set U of mixed (unlabelled) documents that contains documents from class P and also other types of documents (negative documents). One wants to classify the documents in U into documents from P and documents not from P. The key feature of this problem is that there is no labeled negative training data, which makes the traditional text classification techniques inapplicable. This problem is termed, partially supervised classification (PSC). We also call it PU-learning (Learning from Positive and Unlabeled examples). 数据挖掘实验室

The objectives of this project are to design a robust and principled technique to solve PSC, implement a system for PSC, devise a method to evaluate such techniques, and identify methods for determining the minimum number of labeled documents needed to achieve the optimal accuracy in order to reduce manual labeling efforts. The results of this research should be widely useful because the identification of targeted information/documents is of great value in this information age. 数据挖掘研究院

In our work in (Liu et al. 2002), it was shown theoretically that P and U provide sufficient information for learning, and the problem can be posed as a constrained optimization problem. This theoretical result provides a good guidance for designing practical algorithems. Some of our algorithms are reported in (Liu et al 2003), (Lee and Liu 2003) and (Li and Liu 2003). Since research in this direction only started recently, many important issues still need to be addressed in order to gain a better understanding of the problem.

  数据挖掘交友

Read the following paper first: It summarizes most existing methods, proposed a new biased-SVM technique and also performed a comprehensive evaluation.

  • Bing Liu, Yang Dai, Xiaoli Li, Wee Sun Lee and and Philip Yu. "Building Text Classifiers Using Positive and Unlabeled Examples." Proceedings of the Third IEEE International Conference on Data Mining (ICDM-03), Melbourne, Florida, November 19-22, 2003. [PDF]

Publications

      数据挖掘论坛

  1. Xiaoli Li, Bing Liu. "Learning from Positive and Unlabeled Examples with Different Data Distributions." To appear in European Conference on Machine Learning (ECML-05), 2005. [PDF]

     

  2. Bing Liu Xiaoli Li, Wee Sun Lee and and Philip Yu. "Text Classification by Labeling Words." To appear in Proceedings of The Nineteenth National Conference on Artificial Intelligence (AAAI-2004), July 25-29, 2004, San Jose, California. [PDF]

      数据挖掘研究院

  3. Xiaoli Li, and Bing Liu. "Dealing with Different Distributions in Learning from Positive and Unlabeled Web Data." WWW-2004 poster paper. [PDF]

      数据挖掘工具

  4. Gao Cong, Wee Sun Lee, Haoran Wu, Bing Liu. "Semi-supervised Text Classification Using Partitioned EM." DASFAA 2004: 482-493. [PDF]

     

    数据挖掘实验室

  5. Bing Liu, Yang Dai, Xiaoli Li, Wee Sun Lee and and Philip Yu. "Building Text Classifiers Using Positive and Unlabeled Examples." Proceedings of the Third IEEE International Conference on Data Mining (ICDM-03), Melbourne, Florida, November 19-22, 2003. [PDF]

     

  6. Xiaoli Li, Bing Liu. Learning to classify text using positive and unlabeled data. Proceedings of Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), Aug 9-15, 2003, Acapulco, Mexico.

      数据挖掘论坛

  7. Wee Sun Lee, Bing Liu. Learning with Positive and Unlabeled Examples using Weighted Logistic Regression. Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), August 21-24, 2003, Washington, DC USA.

     

    数据挖掘交友

  8. Bing Liu, Wee Sun Lee, Philip S Yu and Xiaoli Li. Partially Supervised Classification of Text Documents. Proceedings of the Nineteenth International Conference on Mach ine Learning (ICML-2002), 8-12, July 2002, Sydney, Australia.

Software

[数据挖掘专家] [数据挖掘研究院] [数据挖掘论坛] [数据挖掘实验室]
上一篇:Web Mining
下一篇:Tutorial given at WWW-2005 and WISE-2005-Web Content Mining
最新评论共有 0 位网友发表了评论 , 查看所有评论
发表评论( 不能超过250字,需审核,请自觉遵守互联网相关政策法规。 )
匿名?
数据挖掘网站导航 数据挖掘论坛导航
  • 数据挖掘工具
  • 数据挖掘论坛
  • DataCruncher - Cognos
  • MineSet - MathSoft
  • Intelligent Miner - GainSmarts
  • Sqlserver - SAS - Clementine
  • CART - Weka - WizSoft
  • NeuroShell - ModelQuest
  • data mining tools - Darwin
  • 数据挖掘交友
  • 数据挖掘博客
  • 数据挖掘工具
  • 数据挖掘资源
  • 数据挖掘技术算法
  • 数据挖掘相关期刊、会议
  • 研究院联盟合作专区
  • 数据挖掘基础与相关技术
  • 数据挖掘厂商与就业
  • 数据挖掘研究者乐园
  • 知名厂商数据挖掘工具资料
  • 国内数据挖掘实验室
  • Foreign Data Mining Lab
  • 热点关注
  • Web数据挖掘的研究现状及发展
  • Web数据挖掘技术综述
  • 百度申请精确广告专利 欲抑制Google步伐
  • Web数据自动采集及其应用研究
  • 信息安全中的数据挖掘
  • 面向Web的数据挖掘
  • Extended Log File Format
  • 基于XML的Web数据挖掘在数字图书馆中的应用
  • XML与Web数据挖掘
  • Web数据挖掘
  • 论坛最新话题
  • Foundations of Statistical Natural Langu
  • Game Theory meet Data Mining: A Recent P
  • System Building: How does it help or hin
  • 数据挖掘与Clementine培训
  • 新手报到
  • 求 SASEM 客户流失预测分析
  • 数据挖掘工程师/搜索研究院—北京——无线
  • 数据挖掘入门介绍(如何着手数据挖掘)
  • Information Overload Survey Results
  • The INEX 2005 Workshop on Element Retrie
  • 相关资讯
  • Any Extract (AE) 网站在线编辑
  • 信息安全中的数据挖掘
  • 基于XML的Web数据挖掘在数字图书馆中的应用
  • Web数据挖掘技术综述
  • Web数据挖掘
  • 北大计算机所万小军博士接连在国际一流学术
  • Refereed Papers on WWW2007
  • WWW2007 tutorials
  • WWW2007 workshops
  • Why ’08 Matters for the Web
  • 数据挖掘实验室资料
  • 数据挖掘博客地址
  • 数据挖掘实验室网站地址
  • Prepare for Medicare audits by using dat
  • 注册成为SAS用户与爱好者俱乐部会员
  • 水南梅
  • 明日烟
  • 新人报道
  • 下载
  • 厦门服务器托管,450元/月—0592-5177319 高
  • 买空间送域名--0592-5177319 高静