an easy solution. It is generally simpler to construct classifier theory and algorithms for two mutually-exclusive classes than for mutually-exclusive classes. We believe constructing -class SVMs is still an unsolved research problem.
The standard method for-class SVMs [10] is to construct SVMs. The th SVM will be
trained with all of the examples in the th class with positive labels, and all other examples with negative labels. We refer to SVMs trained in this way as 1-v-r SVMs (short for oneversus-rest). The final output of the 1-v-r SVMs is the class that corresponds to the SVM with the highest output value. Unfortunately, there is no bound on the generalization error for the 1-v-r SVM, and the training time of the standard method scales linearly with .
Another method for constructing -class classifiers from SVMs is derived from previous
research into combining two-class classifiers. Knerr [5] suggested constructing all possible two-class classifiers from a training set of classes, each classifier being trained on only two out of classes. There would thus be classifiers. When applied to SVMs, we refer to this as 1-v-1 SVMs (short for one-versus-one).
Knerr suggested combining these two-class classifiers with an “AND” gate [5]. Friedman [4] suggested a Max Wins algorithm: each 1-v-1 classifier casts one vote for its preferred class, and the final result is the class with the most votes. Friedman shows circumstances in which this algorithm is Bayes optimal. Kreßel [6] applies the Max Wins algorithm to Support Vector Machines with excellent results.
A significant disadvantage of the 1-v-1 approach, however, is that, unless the individual classifiers are carefully regularized (as in SVMs), the overall -class classifier system will tend to overfit. The “AND” combination method and the Max Wins combination method do not have bounds on the generalization error. Finally, the size of the 1-v-1 classifier may grow superlinearly with , and hence, may be slow to evaluate on large problems.
In Section 2, we introduce a new multiclass learning architecture, called the Decision Directed Acyclic Graph (DDAG). The DDAG contains nodes, each with an 数据挖掘研究院
associated 1-v-1 classifier. In Section 3, we present a VC analysis of DDAGs whose classifiers are hyperplanes, showing that the margins achieved at the decision nodes and the size of the graph both affect their performance, while the dimensionality of the input space does not. The VC analysis indicates that building large margin DAGs in high-dimensional feature spaces can yield good generalization performance. Using such bound as a guide, in Section 4, we introduce a novel algorithm for multiclass classification based on placing
1-v-1 SVMs into nodes of a DDAG. This algorithm, called DAGSVM, is efficient to train
and evaluate. Empirical evidence of this efficiency is shown in Section 5.

