Parallel C4.5 (PC4.5)
Build classification trees in parallel.
Download PC4.5 now.
And here is how to install it.
Why PC4.5?
If you have C4.5 and a network of workstations that are accessible to you, PC4.5 will help you better use C4.5. PC4.5 offers you these advantages: 数据挖掘研究院
- It is faster. In an N trial c4.5 run, a single process builds N classification trees one by one and then picks the best one. In PC4.5, the N trials are each handled by a process and each process is run on a different machine (if N or more machines are available).
- It is fault-tolerant. PC4.5 automatically assigns a process to a machine if the machine is idle (i.e. no activity by the machine′s owner). If the owner of a machine comes back or it crashes during a PC4.5 computation, the PC4.5 process automatically retreats and resumes on a different machine that is idle.
- It supports multiple platforms. PC4.5 runs on SunOS, Solaris and Linux machines (for HPUX, IRIX, and ALPHA, please contact author). Networked multi-platform workstations can run PC4.5 processes of the a single PC4.5 program at the same time.
How Does It Do It?
PC4.5 is built with the Persistent Linda (PLinda) system, a software system for robust distributed parallel computing developed at New York University. To get more information on PLinda, please visit our web site or send email to plinda@cs.nyu.edu. 数据挖掘研究院
Future Work
- Visulization. Convert a decision tree generated by PC4.5 into a ThinkSheet, so it is easier (and more fun) to consult it.
- Suggestions -- please feel free to send mail to binli@cs.nyu.edu.
People
- Dennis Shasha (Professor)
- Bin Li (student)
- Chin-Yuan Chen (student)

