If you are familiar with Microsoft′s research on VIPS: a VIsion based Page Segmentation Algorithm, some of the ideas in the next document may sound a little familiar. Imagine a page that includes restaurant reviews for a number of restaurants in a city neighborhood. Might the information from that page be segmented, so that reviews for each of the restaurants can be included in results for the right restaurants in a local search? This visual gap approach might be helpful in that endeavor.
The document also notes that this process might be helpful in determining what an image is about, and in indexing them. It also mentions that it could help the search engine understand what the different parts of a page are, and how much value they have (for instance, distinqusihing between content and navigation.) 数据挖掘论坛
Document segmentation based on visual gaps
Inventors: Daniel Egnor
US Patent Application 20060149775
Published July 6, 2006
Filed on December 30, 2004
Abstract 数据挖掘研究院
A document may be segmented based on a visual model of the document. The visual model is determined according to an amount of visual white space or gaps that are in the document. In one implementation, the visual model is used to identify a hierarchical structure of the document, which may then be used to segment the document.
While a search engine may be able to determine where a business related to a page is located, it may want to associate that location with a geographical region. Something like a Hierarchical Triangular Mesh may be used to help in making that association. 数据挖掘交友
Indexing documents according to geographical relevance