Hello from Vegas: StyleFeeder hits KDD 2008

I’m representing StyleFeeder at this year’s KDD conference, held in Las Vegas, Nevada. It might seem odd mixing seductive showgirls and stodgy statisticians, but I think it’s an excellent location choice. Gambling concepts such as probability, expected value and exploration vs. exploitation are core to many concepts in Machine Learning, Data Mining and Statistics. 数据挖掘工具

KDD played host to the 2nd Recommender System/Netflix Prize Workshop. Gavin Potter showed us that users, movies, and even ratings sessions (date) impart significant biases on ratings, so much so that a model which simply captures these biases and completely ignores user-movie affinity yields a lower error score than than the original CineMatch algorithm. After some discussion of the fact that minimum-error recommenders tend to yield popular and somewhat uninteresting recommendations, Oscar Celma and Pedro Cano presented a study of this effect on music. They found that a collaborative filtering similarity metric was strongly biased toward popular music, whereas content-based and expert-based similarity metrics made it easier to explore “the long tail.” Next, a member of the Gravity Team, Gabor Takacs (who I later learned is the author of the best “big board” tic-tac-toe player in the world), provided a detailed description of their methods for the Netflix Prize. Their approach is an SVD-like matrix factorization, which incorporates incremental training, regularization, user/item bias, positivity constraints, and neighbor-based correction. 数据挖掘工具

Based on discussions and other presentations, it sounds like a combination of matrix factorization and neighborhood based methods is the most common approach to the Netflix Prize of the leaders. Everyone at the workshop seemed to agree that Netflix did a surprisingly good job of selecting a goal for the competition: Netflix requires a 10% improvement over their CineMatch algorithm and the current top team has a 9.15% improvement. The difference seems small enough that the 10% goal must be reachable, but progress has slowed considerably, with improvement of only .72%-age points since the first progress prize was awarded last October.

As the main conference has started, it has become quite clear what the “hot” topic of the year is: social network modeling. Sessions on the topic have been packed and some top figures in the community have presented papers on the subject… 数据挖掘论坛

[数据挖掘专家] [数据挖掘研究院] [数据挖掘论坛] [数据挖掘实验室]
上一篇:ACM's KDD 2008 Conference
下一篇:没有了
最新评论共有 0 位网友发表了评论 , 查看所有评论
发表评论( 不能超过250字,需审核,请自觉遵守互联网相关政策法规。 )
匿名?
数据挖掘网站导航 数据挖掘论坛导航
  • 数据挖掘工具
  • 数据挖掘论坛
  • DataCruncher - Cognos
  • MineSet - MathSoft
  • Intelligent Miner - GainSmarts
  • Sqlserver - SAS - Clementine
  • CART - Weka - WizSoft
  • NeuroShell - ModelQuest
  • data mining tools - Darwin
  • 数据挖掘交友
  • 数据挖掘博客
  • 数据挖掘工具
  • 数据挖掘资源
  • 数据挖掘技术算法
  • 数据挖掘相关期刊、会议
  • 研究院联盟合作专区
  • 数据挖掘基础与相关技术
  • 数据挖掘厂商与就业
  • 数据挖掘研究者乐园
  • 知名厂商数据挖掘工具资料
  • 国内数据挖掘实验室
  • Foreign Data Mining Lab
  • 热点关注
  • 文本聚类程序实例
  • BBS 数据挖掘研究及其地位与核心问题
  • 一种新的基于统计的自动文本分类方法
  • Text Categorization
  • Is Data Mining Misguided?
  • 焦点应用:语义分析
  • 句子相似度计算在FAQ中的应用
  • 文本挖掘抢占商业智能掘金制高点
  • 基于文本概念和kNN 的跨语种文本过滤
  • More data isn’t always a good thing in
  • 论坛最新话题
  • Foundations of Statistical Natural Langu
  • Game Theory meet Data Mining: A Recent P
  • System Building: How does it help or hin
  • 数据挖掘与Clementine培训
  • 新手报到
  • 求 SASEM 客户流失预测分析
  • 数据挖掘工程师/搜索研究院—北京——无线
  • 数据挖掘入门介绍(如何着手数据挖掘)
  • Information Overload Survey Results
  • The INEX 2005 Workshop on Element Retrie
  • 相关资讯
  • More data isn’t always a good thing in
  • Text Categorization
  • Finding Advertising Keywords on Web Page
  • Communities from Seed Sets
  • To Randomize or Not To Randomize: Space
  • Overview of Text Summarization History
  • Porter Stemming Algorithm
  • Sequential Minimal Optimization
  • 句子相似度计算在FAQ中的应用
  • 弱指导的统计隐含语义分析及其在跨语言信息
  • 数据挖掘实验室资料
  • 数据挖掘博客地址
  • 数据挖掘实验室网站地址
  • Prepare for Medicare audits by using dat
  • 注册成为SAS用户与爱好者俱乐部会员
  • 水南梅
  • 明日烟
  • 新人报道
  • 下载
  • 厦门服务器托管,450元/月—0592-5177319 高
  • 买空间送域名--0592-5177319 高静