To assist with the task of retrieving information, search has been primarily a keywordbased endeavor, where searchers have attempted as best they could to retrieve relevant documents accurately by matching up the keywords in a given query with the same words in the documents. Without some means of deriving the correct senses for words in the query and at data indexing time, lexical polysemy and homonymy result in a significant number of misinterpretations and thus blur the quality of retrieval results. We are now seeing more promotion of improvements to better achieve accurate and relevant search results through “smart query modification,” which allows for variations in spelling (through sophisticated pattern search) as well as allowing for use of synonyms of query terms. Most of this development is tailored at achieving best results while optimizing index size, providing a scalable indexing and querying paradigm as well as promoting flexibility through SDKs that allow for customization of search and retrieval.
数据挖掘论坛
But information management paradigms are changing. Unit costs of disk space, RAM and processing units are decreasing. The world where recall was of the utmost importance (because it paid to retrieve more than was necessary to ensure finding all relevant information) worked on the smaller document collections of decades past, but cannot possibly apply to the vast quantities of information that we must now be able to handle. Precision is becoming ever more important in the context where the simple increase in document collection size translates into increases in the number of relevant documents for a given query. The technological approach must be able to account for this and include metrics to quantify achievements in precision enhancement, all the while attempting to minimize sacrificing completeness of search results sets.
资料全文下载