The problem of data mining or knowledge discovery has become increasingly important in recent years. There is an enormous wealth of information embedded in large data warehouses maintained by retailers, telecom service providers and credit card companies that contain information related to customer purchases and customer calls. Corporations could benefit immensely in the areas of marketing, advertising and sales if interesting and previously unknown customer buying and calling patterns can be discovered from the large volumes of data.
Clustering is a useful technique for grouping data points such that points within a single group/cluster have similar characteristics (or are close to each other), while points in di erent groups are dissimilar. For example, consider a market basket database containing one transaction per customer, each transaction containing the set of items purchased by the customer. The trans- action data can be used to cluster the customers such that customers with similar buying patterns are in a single cluster. For example, one cluster may consist of predominantly married customers with infants who buy diapers, baby food, toys etc. (in addition to necessities like milk, sugar and but- ter), while another may consist of high-income customers that buy imported products like French and Italian wine, Swiss cheese and Belgian chocolate. The clusters can then be used to characterize the di erent customer groups, and these characterizations can be used in targeted marketing and advertising such that specific products are directed towards specific customer groups. The charac- terizations can also be used to predict buying patterns of new customers based on their profiles. For example, it may be possible to conclude that high-income customers buy imported foods, and then mail customized catalogs for imported foods to only these high-income customers. The above market basket database containing transactions is actually an example of a scenario in which attributes of data points are non-numeric.
Transactions in the database can be viewed as records with boolean attributes, each attribute corresponding to a single item. Further, in the record for a transaction, the attribute corresponding to an item is True if and only if the transaction contains the item; otherwise, it is False. Boolean attributes themselves are a special case of categorical attributes. The domain of categorical attributes is not limited to simply True and False values, but could be any arbitrary finite set of values. An example of a categorical attribute is color whose domain includes values such as brown, black, white, etc. Clustering in the presence of such categorical attributes is the focus of this paper.

