We use the term frequent itemset for a set S that appears in at least fraction s of the baskets," where s is some chosen constant, typically 0.01 or 1%.
We assume data is too large to t in main memory. Either it is stored in a RDB, say as a relation Baskets(BID; item) or as a at le of records of the form (BID; item1; item2; : : : ; itemn). When evaluating the running time of algorithms we:
Count the number of passes through the data. Since the principal cost is often the time it takes to read data from disk, the number of times we need to read each datum is often the best measure of running time of the algorithm.
There is a key principle, called monotonicity or the a-priori trick that helps us nd frequent itemsets:
If a set of items S is frequent (i.e., appears in at least fraction s of the baskets), then every subset of S is also frequent.
数据挖掘交友
资料全文下载