强化学习研究进展

智能系统的一个主要特征是能够适应未知环境,其中学习能力是智能系统的关键技术之一。在机器学习范畴,根据反馈的不同,学习技术可以分为监督学习(Supervised learning)、非监督学习(Unsupervised learning)和强化学习(Reinforcement learning)三大类。其中强化学习是一种以环境反馈作为输入的、特殊的、适应环境的机器学习方法。所谓强化学习是指从环境状态到行为映射的学习,以使系统行为从环境中获得的累积奖赏值最大。该方法不同与监督学习技术那样通过正例、反例来告知采取何种行为,而是通过试错(trial-and-error)的方法来发现最优行为策略[KLM96][SB98]。
强化学习通常包括两个方面的含义:一方面是将强化学习作为一类问题;另一方面是指解决这类问题的一种技术。如果将强化学习作为一类问题,目前的学习技术大致可分成两类:其一是搜索智能系统的行为空间,以发现系统最优的行为。典型的技术如遗传算法等搜索技术;另一类是采用统计技术和动态规划方法来估计在某一环境状态下的行为的效用函数值,从而通过行为效用函数来确定最优行为。我们特指这种学习技术为强化学习技术。不作特殊说明,在本章中强化学习被理解为是一种学习技术。
强化学习技术是从控制理论、统计学、心理学等相关学科发展而来,最早可以追溯到巴普洛夫的条件反射实验。但直到上世纪八十年代末、九十年代初强化学习技术才在人工智能、机器学习和自动控制等领域中得到广泛研究和应用,并被认为是设计智能系统的核心技术之一。特别是随着强化学习的数学基础研究取得突破性进展后,对强化学习的研究和应用日益开展起来,成为目前机器学习领域的研究热点之一。 数据挖掘工具
本章综述了强化学习技术这一领域的研究情况,特别是从第3节至第6节讨论了当前强化学习研究中的热点问题。第2节简要介绍典型强化学习算法及其数学基础,第3节介绍部分感知环境下的强化学习算法,第4节介绍强化学习中连续状态的函数估计,第5节介绍分层强化学习技术,第6节介绍多agent强化学习研究,最后在第7节进行了总结和展望。

[Ber99] Bernstein D S. Reusing old policies to accelerate learning on new MDPs. Technical Report : UM-CS-1999- 026, Dept. of CS, U. of Massachusetts, Amherst, MA, 1999.
[BM03] A G Barto, S Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamics Systems: Theory and Applications, 2003, 13(4):41-77.
[Bow04] M Bowling. Convergence and no-regret in multiagent learning. In: Advances in Naural Information Processing Systems, 2004.
[CB98] C Claus, C Boutilier. The dynamics of reinforcement learning in cooperative multiagent system. In: Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications on Artificial Intelligence, Madison, Wisconsin, United States: American Association for Artificial Intelligence, 1998, 746-752.
[CK01] Y Chang, L Kaelbling. Playing is believing: the role of beliefs in multi-agent learning. In: Proceedings of NIPS-2001, Vancouver, Canada, 2001.
[Dav97] S Davies. Multidimensional triangulation and interpolation for reinforcement learning. In: Michael C Mozer, Michael I Jordan, Thomas Petsche, eds. Advances in Neural Information Processing Systems 9, NY: MIT Press, 1997, 1005-1010. 数据挖掘工具
[Die00] Dietterich T G. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 2000, 13, 227-303.
[Dig98] Digney B. Learning hierarchical control structure for multiple tasks and changing environments. In: Proceedings of the Fifth Conference on the Simulation of Adaptive Behavior: SAB 98, 1998.
[Ger99] Gerhard Weiss. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, MIT Press, 1999.
[GHS03] A Greenwald, K Hall, R Serrano. Correlated-q learning. In: Proceedings of Twentieth International Conference on Machine Learning, Washington DC, 2003, 242-249.
[HW98] J Hu, M P Wellman. Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research, 2003, 4:1039-1069.
[Iba89] Iba G A. A heuristic approach to the discovery of macro-operators. Machine Learning, 1989, 3:285-317.
[Lit94] M L Littman. Markov games as a framework for multi-agent reinforcement learning. In: Eleventh International Conference on Machine Learning, New Brunswick, 1994, 157-163.

数据挖掘交友


[Lit01] M L Littman. Fierend-or-foe q-learning in general-sum games. In: Proceedings of Eighteenth International Conference on Machine Learning, Williams College: Morgan Kaufman, 2001, 322-328.
[KLM96] Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 1996, 4: 237~285.
[KLC98] Kaelbling L P, Littman M L, Cassandra A R. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 1998, 101: 99-134.
[LE03] Luis Nunes, Eugenio Oliveira. Cooperative learning using advice exchange. In: E Alonso et al., eds. Adaptive Agents and Multiagent Systems, Lecture Notes in Computer Science, 2636, Berlin, Heidelberg: Springer-Verlag, 2003, 33-48.
[Lov91] Lovejoy W S. A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 1991, 28:47-65.
[Moo94] A W Moore. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces. In : Jack D Cowan, Gerald Tesauro, Joshua Alspector, eds. Advances in Neural Information Processing Systems, 6: Morgan Kaufmann Publishers, 1994, 711-718. 数据挖掘交友
[PR97] Parr R, Russell S. Reinforcement learning with hierarchies of machines. In: Proceedings of Advances in Neural Information Processing Systems 10. MIT Press, 1997.
[Pre00] Precup D. Temporal abstraction in reinforcement learning. Doctoral dissertation, U. of Massachusetts, Amherstm, 2000.
[Sam02] Samuel W. Hasinoff. Reinforcement learning for problems with hidden state. Technical Report, University of Toronto, Department of Computer Science, 2002.
[SB98] R S Sutton and A.G. Barto. Reinforcement Learning, Cambridge, MA:MIT Press, 1998.
[SJJ95] S Singh, T Jaakkola, M I Jordan. Reinforcement learning with soft state aggregation. In: G Tesauro, D Touretzky, eds. Advances in Neural Information Processing Systems, 7. Morgan Kaufmann: MIT Press, 1995, 361-368.
[SPS99] Sutton R S, Precup D, Singh, S. Between MDPs and Semi-MDPs: a framework for  temporal abstraction in reinforcement learning. Artificial Intelligence, 1999, 112:181-211.
[Sut96] R S Sutton. Generalization in reinforcement learning: successful examples using sparse coarse coding. In: D Touretzky, M.Mozer, M. Hasselmo, eds. Advances in Neural Information Processing Systems, 8, NY: MIT Press, 1996, 1038-1044.
[Tan93] M Tan. Multi-agent reinforcement learning : independent vs. cooperative agents. In: Proc. Of the tenth international conference on machine learning, Amherst, MA, 1993: 330-337.
[TJ94] Tsitsiklis, John N. Asynchronous stochastic approximation and Q-learning. Machine Learning, 1994, 16(3):185-202.
[WD98] Weiss G, Dillenbourg P. What is multi in multiagent learning? In: P Dillenbourg, eds.  Collaborative learning. Cognitive and computational approaches. Amsterdam: Pergamon Press, 1998, 64-80
[苏高05] 苏畅, 高阳等. SMDP环境下自主生成options的算法研究. 模式识别与人工智能, 2005.
数据挖掘工具

数据挖掘交友


数据挖掘交友

资料全文下载

[数据挖掘专家] [数据挖掘研究院] [数据挖掘论坛] [数据挖掘实验室]
上一篇:Towards self-organising Action Selection
下一篇:Shilling Recommender Systems for Fun and Prot
最新评论共有 0 位网友发表了评论 , 查看所有评论
发表评论( 不能超过250字,需审核,请自觉遵守互联网相关政策法规。 )
匿名?
数据挖掘网站导航 数据挖掘论坛导航
  • 数据挖掘工具
  • 数据挖掘论坛
  • DataCruncher - Cognos
  • MineSet - MathSoft
  • Intelligent Miner - GainSmarts
  • Sqlserver - SAS - Clementine
  • CART - Weka - WizSoft
  • NeuroShell - ModelQuest
  • data mining tools - Darwin
  • 数据挖掘交友
  • 数据挖掘博客
  • 数据挖掘工具
  • 数据挖掘资源
  • 数据挖掘技术算法
  • 数据挖掘相关期刊、会议
  • 研究院联盟合作专区
  • 数据挖掘基础与相关技术
  • 数据挖掘厂商与就业
  • 数据挖掘研究者乐园
  • 知名厂商数据挖掘工具资料
  • 国内数据挖掘实验室
  • Foreign Data Mining Lab
  • 热点关注
  • :::数据挖掘未来研究方向:::
  • :::数据挖掘常用技术:::
  • :::数据挖掘研究内容和本质:::
  • :::数据挖掘的功能:::
  • 数据挖掘测试数据集大全
  • :::数据挖掘的研究历史和现状:::
  • Making the Most of Operational Analytics
  • 近期与数据挖掘相关的一些重要会议的截止日
  • :::数据挖掘热点:::
  • 韩家炜的论文下载
  • 论坛最新话题
  • Foundations of Statistical Natural Langu
  • Game Theory meet Data Mining: A Recent P
  • System Building: How does it help or hin
  • 数据挖掘与Clementine培训
  • 新手报到
  • 求 SASEM 客户流失预测分析
  • 数据挖掘工程师/搜索研究院—北京——无线
  • 数据挖掘入门介绍(如何着手数据挖掘)
  • Information Overload Survey Results
  • The INEX 2005 Workshop on Element Retrie
  • 相关资讯
  • 从影响圈到关注圈,从数据挖掘到价值挖掘
  • SAS Updates BI Products
  • Call for Papers & Invited Session Propos
  • IEEE Intelligent Systems Special Issue
  • 近期与数据挖掘相关的一些重要会议的截止日
  • Data mining program near rock bottom
  • IDC Names Oracle as Leader in Data Wareh
  • Characterizing the Function Space for Ba
  • German scientists develop software to re
  • deviantART.com Web Application Software
  • 数据挖掘实验室资料
  • 数据挖掘博客地址
  • 数据挖掘实验室网站地址
  • Prepare for Medicare audits by using dat
  • 注册成为SAS用户与爱好者俱乐部会员
  • 水南梅
  • 明日烟
  • 新人报道
  • 下载
  • 厦门服务器托管,450元/月—0592-5177319 高
  • 买空间送域名--0592-5177319 高静