Information retrieval review

The term IR may be considered a research field, but it may also be considered a research tradition (or rather a set of related traditions) based on some particular assumptions. In the first sense of the word is it about any approach (manual or mechanical) to organizing and searching "information". In the last sense of the word IR is one among a set of competing approaches to organizing and searching "information". Today is the term IR mostly associated with a set of particular research traditions (the Boolean, the Vector, and the Probabilistic tradition), why it seems out place when Stockwell (2000) in his book A History of Information Storage and Retrieval writes about encyclopedias and ignores the experimental traditions. Stockwell′s book is an example on how the term IR is used about the field rather than about the tradition.

 

数据挖掘交友

The term IR was introduced by Calvin Mooers in 1951, who defined it in this way: 数据挖掘工具


"Information retrieval is the name for the process or method whereby a prospective user of information is able to convert his need for information into an actual list of citations to documents in storage containing information useful to him. It is the finding or discovery process with respect to stored information. It is another, more general, name for the production of a demand bibliography. Information retrieval embraces the intellectual aspects of the description of information and its specification for search, and also whatever systems, technique, or machines that are employed to carry out the operation. Information retrieval is crucial to documentation and organization of knowledge". (Mooers, 1951, p. 25).

 

Van Rijsbergen writes: 数据挖掘研究院

 

数据挖掘实验室

"Information retrieval is a wide, often loosely-defined term but in these pages I shall be concerned only with automatic information retrieval systems. Automatic as opposed to manual and information as opposed to data or fact. Unfortunately the word information can be very misleading. In the context of information retrieval (IR), information, in the technical meaning given in Shannon′s theory of communication, is not readily measured (Shannon & Weaver 1). In fact in many cases, one can adequately describe the kind of retrieval by simply substituting "document" for "information". Nevertheless, "information retrieval" has become accepted as a description of the kind of work published by Cleverdon, Salton, Spark Jones, Lancaster and others. A perfectly straightforward definition along this line is given by Lancaster 2: "Information retrieval is the term conventionally, though somewhat inaccurately, applied to the type of activity discussed in this volume. An information retrieval system does not inform (i.e. change the knowledge of) the user on the subject of his inquiry. It merely informs on the existence (or non-existence) and whereabouts of documents relating to his request". This specifically excludes Question-Answering systems as typified by Winograd 3 and those described by Minsky 4. It also excludes data retrieval systems such as used by, say, the stock exchange for on-line quotations. 数据挖掘论坛
. . . " (Van Rijsbergen, 1979, p. 1).

 

It is evident from the quotes by Mooers and Van Rijsbergen that the IR-tradition is related to a computer-based searching and experimenting in (mainly bibliographical or full-text) databases. Compared to the concept "Information seeking" it still has a connotation towards computer based retrieval, while information seeking has a broader connotation. 数据挖掘论坛

 

数据挖掘实验室

D. A. Kemp argues that "knowledge retrieval" should substitute "information retrieval":  数据挖掘论坛

  数据挖掘工具

"Knowledge, information and data and their representation. It can be useful to distinguish between knowledge and information and data; it is also difficult and contentious. Four points should be made. [First] Knowledge, information and data is what the systems to be discussed are for: by storing it in an organized manner, they are intended to enable it to be found when needed. Secondly, there is a spectrum of increased size and organization between data, where the units are quite small, through to knowledge, where the units are large and distinguished by their complex internal structure and relationships, and overlap with other units (Serebria­koff, 1986, D91). Meunier (1987, E44) presents a typology of levels of representation which is useful for the breath of its approach and its classification of relationships. Thirdly, "information" in the expression "information retrieval" is generally abused, because what is retrieved is not information, but bibliographic details of sources in which desired information potentially exists. Very many information retrieval systems are at best document retrieval systems, and more usually they are systems which retrieve surrogates for documents (see also Lancaster, 1979, A140, p. 13). Finally, although the expression knowledge retrieval is particularly associated with artificial intelligence and expert systems (smith, 1984, C78), it should not be forgotten that this is what cataloguers, indexers and bibliographers have been doing, and devising systems for, for many years. For further discussion, see Kemp (1974, A36) and McGarry (1977, E189." (Kemp, 1988, p. 3).

数据挖掘交友

 

数据挖掘工具

Francis Miksa regards the traditional view within IR is a narrow view:

数据挖掘研究院

 

"In this context, information is retrieved primarily in response to a clearly delineated decision-making process and seems to serve chiefly to fill a consciously estimated gap in the view the user has of a problem. This leads in turn to viewing retrieval systems as mechanisms that by definition must respond directly and with reasonable precision to a relative precise information request―to function, in short, as question-answer processes . . . intellectual knowledge appears to be characterized by a relatively unfocused sense of inquiry where the initial goal is not to find some particular informational answer or to fill some sort of reasonably anticipated informational gap, but rather to bring order to (or to re-order) an ill-formed mass of ideas or to map some vaguely arranged area of knowledge. Information retrieval in such situations takes on the character, then, of helping an inquirer think about what he or she appears to be interested in, and might be better conceived as an exploratory and game-like mechanism rather than a precise response mechanism". (Miksa, 1992, p. 240-241). 数据挖掘实验室

 

数据挖掘交友

Compare criticism by Julian Warner: query processing paradigm: 数据挖掘工具

 

"Two antithetical, if not always clearly distinguished, traditions can be detected in information retrieval system design and evaluation. The idea of query transformation, understood as the automatic transformation of a query into a set of relevant records, has been dominant in information retrieval theory. A contrasting principle of selection power has been valued in ordinary discourse, librarianship, and, to some extent, in practical system design and use". (Warner, 2002). 数据挖掘工具

  数据挖掘工具

The two last quotes are also in accordance with this definition of IR:

 

数据挖掘论坛

Information retrieval (IR) part of computer science which studies the retrieval of information (not data) from a collection of written documents. The retrieved documents aim at satisfying a user information need usually expressed in natural language. (Baeza-Yates & Ribeiro-Neto, 1999).

数据挖掘论坛

 

数据挖掘工具

Ian Ruthven finds that the classical model of IR-systems evaluation is obsolete: 数据挖掘工具

  数据挖掘论坛

"The classical model of IR system evaluation, initiated by the Cranfield experiments and currently manifest in the TREC programme, demonstrates very clearly its origins in the era of batch retrieval systems. The system is seen as taking well-defined input (a query or topic) and producing well-defined output (a list of documents). However, with modern interactive systems, that input-output model is clearly becoming more and more inadequate as a representation of the IR situation". (Ruthven, 1996).

  数据挖掘研究院

In the term "retrieval", the suffix "re" seems to indicate that something is "found again", implying that it has been identified at some earlier stage. This view may be correct when documents are represented in closed systems for specific purposes, but seems more problematic when items are retrieved by serendipity or by free text searching. In such cases are items found or identified, but not necessarily "retrieved". 

 

数据挖掘实验室

  数据挖掘实验室

  数据挖掘研究院

  数据挖掘交友

 


Literature:

数据挖掘研究院

  数据挖掘研究院

Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern Information Retrieval. Boston, MA: Addison Wesley Longman Publishing Comp., Inc. Glossary: http://www.sims.berkeley.edu/~hearst/irbook/glossary.html 数据挖掘论坛

  数据挖掘研究院

Blair, D. C. (1990). Language and Representation in Information Retrieval. Amsterdam, Elsevier science publishers.
 
数据挖掘实验室

Ellis, D. (1990). New Horizons in Information Retrieval. London: Library Association.


Ellis, D. (1996). Progress and problems in Information retrieval. London: Library Association.

数据挖掘工具

  数据挖掘交友

Kemp, D. A. (1988). Computer-based Knowledge Retrieval. Oxford: Aslib.
 
数据挖掘交友

Miksa, F. L. (1992). Library and information science: two paradigms. (Pp. 229-252) IN: Conceptions of Library and Information Science. Historical, empirical and theoretical perspectives. Ed. by Pertti Vakkari & Blaise Cronin. London: Taylor Graham.
 

数据挖掘交友

Mooers, C. N. (1951). Zatocoding applied to mechanical organization of knowledge. American Documentation, 2, 20-32.
 
数据挖掘研究院

Robertson, S. E. (1977). Theories and Models in Information Retrieval. Journal of Documentation, 33(2), 126-148. 数据挖掘实验室

  数据挖掘论坛

Ruthven, I. (1996). MIRA: Evaluation Frameworks for Interactive Multimedia Information Retrieval Applications http://www.dcs.gla.ac.uk/mira/themes2.html 数据挖掘工具

 

Sparck Jones, K. (1992). Information retrieval. Vol. 1, pp. 684-690 IN: Encyclopedia of Artificial Intelligence. Vol. I-II. Ed. by Stuart C. Shapiro. New York: John Wiley & Sons.

数据挖掘论坛

  数据挖掘论坛

Stockwell , F. (2000). A History of Information Storage and Retrieval. Jefferson, North Caroling: McFarland & Company. 
 

数据挖掘实验室

van Rijsbergen, C. J. (1979). Information Retrieval. 2. ed. London: Butterworths. Online edition 1999.  http://www.dcs.gla.ac.uk/~iain/keith/

 

Warner, J. (2002). Forms of labour in information systems. Information Research, 7(4) http://informationr.net/ir/7-4/paper135.html 数据挖掘交友

 

数据挖掘交友

 

Warner, J. (2004). A labor theoretic approach to information retrieval. http://www.sims.berkeley.edu:8000/courses/is296a-1/f04/summary.html 数据挖掘研究院


 

Wilson, P. (1978). Some Fundamental Concepts of Information Retrieval. Drexel Library Quarterly, 14, 10-24.

 

数据挖掘工具

See also: Information seeking; Latent semantic indexing; Probabilistic models of IR; Vector space model

数据挖掘工具

  数据挖掘交友

  数据挖掘工具

[数据挖掘工作交流] [数据挖掘研究院] [数据挖掘论坛] [数据挖掘实验室]
上一篇:Information retrieval language (IR-language)
下一篇:information theory
最新评论共有 0 位网友发表了评论 , 查看所有评论
发表评论( 不能超过250字,需审核,请自觉遵守互联网相关政策法规。 )
匿名?
数据挖掘网站导航 数据挖掘论坛导航
  • 数据挖掘工具
  • 数据挖掘论坛
  • DataCruncher - Cognos
  • MineSet - MathSoft
  • Intelligent Miner - GainSmarts
  • Sqlserver - SAS - Clementine
  • CART - Weka - WizSoft
  • NeuroShell - ModelQuest
  • data mining tools - Darwin
  • 数据挖掘交友
  • 数据挖掘博客
  • 数据挖掘工具
  • 数据挖掘资源
  • 数据挖掘技术算法
  • 数据挖掘相关期刊、会议
  • 研究院联盟合作专区
  • 数据挖掘基础与相关技术
  • 数据挖掘厂商与就业
  • 数据挖掘研究者乐园
  • 知名厂商数据挖掘工具资料
  • 国内数据挖掘实验室
  • Foreign Data Mining Lab
  • 热点关注
  • 信息检索的核心支撑技术
  • 信息检索研究人员推荐读物
  • 清华信息检索在TREC评测中再创佳绩
  • 如何实现中文文献的自动聚合分类
  • Resources for Text, Speech and Language
  • 基于WordNet的文本分类技术研究和实现
  • 字符串匹配的KMP算法
  • 中创软件Infor中间件助力税收信息化
  • Boyer Moore 算法
  • 中文信息处理——纵览与建议
  • 论坛最新话题
  • 正规省级、国家级别期刊征集论文稿件
  • 寻data mining cookbook 一书的配套光盘
  • 网博垂直搜索引擎完全开源版
  • 电脑也会成为火灾元凶 操作不当也会有危险
  • 网络暴力间接逼死崔真实 韩国拟立法实名上
  • 网络最流行的歌曲单良《那一场雪》推荐给大
  • 快国庆了大家怎么安排
  • 08年“铁观音秋茶”安溪铁观音,茶叶批发网
  • 快国庆了大家怎么安排
  • 世界最大规模“网格计算”网络启动
  • 相关资讯
  • 信息检索权威资料收集
  • Artificial Intelligence as Smart as Huma
  • 2nd CFP: Social Linking Track at Hyperte
  • 如何实现中文文献的自动聚合分类
  • 信息检索的核心支撑技术
  • Efficient Similarity Search over Vector
  • MARS: A Matching and Ranking System for
  • 信息检索研究人员推荐读物
  • Resources for Text, Speech and Language
  • Information Wants to be Found
  • 数据挖掘实验室资料
  • 注册成为SAS用户与爱好者俱乐部会员
  • 水南梅
  • 明日烟
  • 新人报道
  • 下载
  • 厦门服务器托管,450元/月—0592-5177319 高
  • 买空间送域名--0592-5177319 高静
  • mit ocw 数据挖掘相关课程连接
  • Introduction to Data Mining
  • Data Mining & Business Intelligence