Machine learning and generalisation

Machine learning is research domain that has provided the world with cell phones with voice recognition capabilities, recommending systems on websites selling books or dvd’s, fingerprint identification systems, spam detection, and many others. It can be seen as positioned at the intersection of computer science, applied mathematics and statistics, sharing concepts with artificial intelligence and information theory.

When are machines (read computers) said to learn ? 数据挖掘工具

Machine learning is concerned with computer programs that allow better performances as they gain experience. Hence, lauching twice the same program might give different results if a learning process has taken place. Such software provides with betters answers as more information is fed into it. As for humans, computers can either learn 'by heart', like short-term learning people tend to resort to before exams, or they can learn 'in the long term', being able to infer new knowledge from known facts. This is called generalisation. 数据挖掘工具

By-heart learning 数据挖掘研究院

Computer learn 'by heart' when they need new information to address new situations. The typical example is the anti-virus software. Anti-virus software are better as more information, in this case virus signatures, is provided. When the software downloads, periodically, news virus signatures, or definitions, it is able to spot and eliminate more and more viruses. It is therefore learning in the sense we juste defined.

It is however learning 'by heart' as it can now only detect viruses for which a signature was provided. It is not able to detect and remove a new virus without downloading its corresponding signature file. It the signature is known, the virus will be detected 100% of the time. 数据挖掘工具

Other examples of by-heart learning are the auto-completion capabilities of most web browsers, and Windows' Start menu, which proposes a list of frequently used softwares, depending on the user.

数据挖掘实验室

Generalisation

数据挖掘研究院

Computer programs are said to generalize when they are able to deal with new situations without the need for new information. The typical example is the anti-spam software. Spam filters get better and better as more examples of both spam and legitimate emails are provided to them, increasing their percentage of correct classification. 数据挖掘研究院

Spam filter are indeed able to generalize the concept of spam. When a new email arrives, most probably different from any other email already received, the spam filter estimates the degree of probability that this particular email is a spam, without the need for a specific signature file describing all possible spam messages, as is the case with viruses. It is therefore sometimes making mistakes, marking as junk a legitimate email, or conversely marking as legitimate a spam email. But mistakes get fewer as more and more example are provided. 数据挖掘交友

Other examples include recommendation systems for online shops, and optical character recognition software. 数据挖掘论坛

Reasoning by analogy and inductive reasoning 数据挖掘论坛

Generalisation corresponds thus to the process of reasoning by analogy, and inductive reasoning. New elements can be processed even though the software was not told explicitely how to process them. Spam filters compare new emails with past email which have been confirmed by the user as spam, and decided, based on the similarities, to classify the new email as junk or not. Spam filters furhtermore build dictionaries of words that often appear in spam messages and use this dictionary, which can be different from one user to another, to estimate the degree of 'spamness' of an email.

Conclusions

数据挖掘工具

By-heart learning is only concerned with reasoning by analogy, but only with situations which have already been encountered, or which have been explicitely described.

数据挖掘研究院

Generalisation is thus a very interesting ability to achieve, but also, of course a very challenging mathematical and computational problem. 数据挖掘论坛

Interested readers can refer to the following books: 数据挖掘论坛

machine learning in general : Machine Learning, Tom Mitchell, McGraw Hill, (1997)generalisation : Statistical Learning Theory, Vladimir Vapnik, Wiley-Interscience, (1998), 数据挖掘交友

[数据挖掘专家] [数据挖掘研究院] [数据挖掘论坛] [数据挖掘实验室]
上一篇:Video Presentation - Google - Unleashing Video Search
下一篇:Sentiment analysis and consumer generated content
最新评论共有 0 位网友发表了评论 , 查看所有评论
发表评论( 不能超过250字,需审核,请自觉遵守互联网相关政策法规。 )
匿名?
数据挖掘网站导航 数据挖掘论坛导航
  • 数据挖掘工具
  • 数据挖掘论坛
  • DataCruncher - Cognos
  • MineSet - MathSoft
  • Intelligent Miner - GainSmarts
  • Sqlserver - SAS - Clementine
  • CART - Weka - WizSoft
  • NeuroShell - ModelQuest
  • data mining tools - Darwin
  • 数据挖掘交友
  • 数据挖掘博客
  • 数据挖掘工具
  • 数据挖掘资源
  • 数据挖掘技术算法
  • 数据挖掘相关期刊、会议
  • 研究院联盟合作专区
  • 数据挖掘基础与相关技术
  • 数据挖掘厂商与就业
  • 数据挖掘研究者乐园
  • 知名厂商数据挖掘工具资料
  • 国内数据挖掘实验室
  • Foreign Data Mining Lab
  • 热点关注
  • 支持向量机算法及其代码实现
  • Boosting算法及其代码实现
  • K近邻算法
  • Kalman filter toolbox for Matlab
  • Decision Trees算法及其代码实现
  • 生物信息学--机器学习方法
  • [mlchina] ICML 2008 Call for Papers
  • Java Machine Learning Library
  • Paperless office? Only on paper
  • Normal Bayes 分类器
  • 论坛最新话题
  • Foundations of Statistical Natural Langu
  • Game Theory meet Data Mining: A Recent P
  • System Building: How does it help or hin
  • 数据挖掘与Clementine培训
  • 新手报到
  • 求 SASEM 客户流失预测分析
  • 数据挖掘工程师/搜索研究院—北京——无线
  • 数据挖掘入门介绍(如何着手数据挖掘)
  • Information Overload Survey Results
  • The INEX 2005 Workshop on Element Retrie
  • 相关资讯
  • 预言:50年后机器人威胁人类 数十亿人丧命智
  • Paperless office? Only on paper
  • Simplicity vs. Complexity
  • 生物信息学--机器学习方法
  • Java Machine Learning Library
  • IBM visualization software uses 3D avata
  • Combining classifiers to predict gene fu
  • Anyone has experience using data mining
  • The 3rd International Conference on Larg
  • A satisfied customer
  • 数据挖掘实验室资料
  • 数据挖掘博客地址
  • 数据挖掘实验室网站地址
  • Prepare for Medicare audits by using dat
  • 注册成为SAS用户与爱好者俱乐部会员
  • 水南梅
  • 明日烟
  • 新人报道
  • 下载
  • 厦门服务器托管,450元/月—0592-5177319 高
  • 买空间送域名--0592-5177319 高静