WEB INFORMATION ACQUISITION BY PERSONAL SEARCH ENGINE BASED

How to obtain information from the web according to your own preference is a challange nowadays since there is an overload of information making it difficult to find the needed one quickly. People like to search for information by topic, author or language, etc, however, the current information acquisition technology including the SVM-decision tree and unsupervised clustering based SVM cannot satisfy these requirements. Although the best way to solve the problem is through supervised clustering, it may not produce desirable clustering without additional information provided by the user. In this case, we could only adjust algorithm or similarity measurement. Compared with adjusting the algorithm, modifying the similarity measurement is more visually oriented. Usually, people cannot specify the similarity easily but supplement examples to substantiate it. Thus it is the best way to learn the similarity measurement for clustering. The common way is to use a binary classifier: By taking all pairs of items in all training sets and then describing each pair in a feature vector. Positive examples are considered as the same class, negative examples, different ones.When a new set of items run though the classifier, whether a pair should be in the same class could be decided by the output value (positive or negative). But the approach assumes that all the pairs are i.i.d. and cannot take advantages of dependency between item pairs. To avoid this problem, some researchers have adopted the Conditional Random Fields (CRFs) which uses various clustering functions without requiring the independence of attributes, but they cannot optimize the clusters with respect to loss function. T. Finley and T. Joachims have introduced Supervised Clustering with Support Vector Machines to overcome the problem [Finley & Joachims, 2005]. However it is not specialized and cannot be applied to the domain knowledge. We introduce the ontology and semantic similarity into SVM as a similar measurement in this paper.

资料全文下载

[数据挖掘专家] [数据挖掘研究院] [数据挖掘论坛] [数据挖掘实验室]
上一篇:Reinforcement Learning for Improving Gene Identification Acc
下一篇:Zipf, Power-laws, and Pareto - a ranking tutorial
最新评论共有 0 位网友发表了评论 , 查看所有评论
发表评论( 不能超过250字,需审核,请自觉遵守互联网相关政策法规。 )
匿名?
数据挖掘网站导航 数据挖掘论坛导航
  • 数据挖掘工具
  • 数据挖掘论坛
  • DataCruncher - Cognos
  • MineSet - MathSoft
  • Intelligent Miner - GainSmarts
  • Sqlserver - SAS - Clementine
  • CART - Weka - WizSoft
  • NeuroShell - ModelQuest
  • data mining tools - Darwin
  • 数据挖掘交友
  • 数据挖掘博客
  • 数据挖掘工具
  • 数据挖掘资源
  • 数据挖掘技术算法
  • 数据挖掘相关期刊、会议
  • 研究院联盟合作专区
  • 数据挖掘基础与相关技术
  • 数据挖掘厂商与就业
  • 数据挖掘研究者乐园
  • 知名厂商数据挖掘工具资料
  • 国内数据挖掘实验室
  • Foreign Data Mining Lab
  • 热点关注
  • 支持向量机算法及其代码实现
  • Boosting算法及其代码实现
  • K近邻算法
  • Kalman filter toolbox for Matlab
  • Decision Trees算法及其代码实现
  • 生物信息学--机器学习方法
  • [mlchina] ICML 2008 Call for Papers
  • Java Machine Learning Library
  • Paperless office? Only on paper
  • Normal Bayes 分类器
  • 论坛最新话题
  • Foundations of Statistical Natural Langu
  • Game Theory meet Data Mining: A Recent P
  • System Building: How does it help or hin
  • 数据挖掘与Clementine培训
  • 新手报到
  • 求 SASEM 客户流失预测分析
  • 数据挖掘工程师/搜索研究院—北京——无线
  • 数据挖掘入门介绍(如何着手数据挖掘)
  • Information Overload Survey Results
  • The INEX 2005 Workshop on Element Retrie
  • 相关资讯
  • 预言:50年后机器人威胁人类 数十亿人丧命智
  • Paperless office? Only on paper
  • Simplicity vs. Complexity
  • 生物信息学--机器学习方法
  • Java Machine Learning Library
  • IBM visualization software uses 3D avata
  • Combining classifiers to predict gene fu
  • Anyone has experience using data mining
  • The 3rd International Conference on Larg
  • A satisfied customer
  • 数据挖掘实验室资料
  • 数据挖掘博客地址
  • 数据挖掘实验室网站地址
  • Prepare for Medicare audits by using dat
  • 注册成为SAS用户与爱好者俱乐部会员
  • 水南梅
  • 明日烟
  • 新人报道
  • 下载
  • 厦门服务器托管,450元/月—0592-5177319 高
  • 买空间送域名--0592-5177319 高静