RSS
热门关键字:  数据挖掘  人工智能  数据仓库  搜索引擎  数据挖掘导论

msnbc.com anonymous web data

来源: 作者:unkonwn 时间:2004-12-11 点击:

msnbc.com anonymous web data

Abstract

This data describes the page visits of users who visited msnbc.com on September 28, 1999. Visits are recorded at the level of URL category (see description) and are recorded in time order.

数据挖掘研究院

Information files:
  • Description of the data(数据描述见附录)

Data files: 数据挖掘研究院

数据挖掘实验室

Data Type

Discrete sequence 数据挖掘研究院

Abstract

This data describes the page visits of users who visited msnbc.com on September 28, 1999. Visits are recorded at the level of URL category (see below) and are recorded in time order.

Sources

David Heckerman (heckerma@microsoft.com)

数据挖掘研究院

Data Characteristics

The data comes from Internet Information Server (IIS) logs for msnbc.com and news-related portions of msn.com for the entire day of September, 28, 1999 (Pacific Standard Time). Each sequence in the dataset corresponds to page views of a user during that twenty-four hour period. Each event in the sequence corresponds to a user′s request for a page. Requests are not recorded at the finest level of detail---that is, at the level of URL, but rather, they are recorded at the level of page category (as determined by a site administrator). The categories are "frontpage", "news", "tech", "local", "opinion", "on-air", "misc", "weather", "health", "living", "business", "sports", "summary", "bbs" (bulletin board service), "travel", "msn-news", and "msn-sports". Any page requests served via a caching mechanism were not recorded in the server logs and, hence, not present in the data.

数据挖掘研究院

Other Relevant Information

  • Number of users: 989818
  • Average number of vitis per user: 5.7
  • Number of URLs per category: 10 to 5000

Data Format

Here are the first 20 lines of the ascii data file: 数据挖掘研究院

% Different categories found in input file:

frontpage news tech local opinion on-air misc weather msn-news health living business msn-sports sports summary bbs travel

% Sequences:

1 1 
2 
3 2 2 4 2 2 2 3 3 
5 
1 
6 
1 1 
6 
6 7 7 7 6 6 8 8 8 8 
6 9 4 4 4 10 3 10 5 10 4 4 4 
1 1 1 11 1 1 1 
12 12 
1 1

 数据挖掘研究院 

Each category is associated--in order--with an integer starting with "1". For example, "frontpage" is associated with 1, "news" with 2, and "tech" with 3. Each row below "% Sequences:" describes the hits--in order--of a single user. For example, the first user hits "frontpage" twice, and the second user hits "news" once. 数据挖掘研究院

Past Usage

I. Cadez, D. Heckerman, C. Meek, P. Smyth, S. White, "Visualization of navigation patterns on a Web site using model-based clustering," Journal of Data Mining and Knowledge Discovery, accepted for publication. 数据挖掘实验室

Acknowledgements, Copyright Information, and Availability

This data is avaliable thanks to msnbc.com

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?