RSS
热门关键字:  数据挖掘  数据仓库  人工智能  搜索引擎  数据挖掘导论
当前位置 :| 首页>人工智能>机器学习>

A Comparison of Machine Learning Algorithms for Chemical Toxicity Classificati

来源: 作者: 时间:2008-05-23 点击:

Bioactivity profiling through high-throughput in vitro assays can reduce the cost and time required in toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints.

Supervised machine learning is a powerful approach to discover combinatorial relationships in complex in vitro / in vivo datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluatethe relative performance of different machine learning (ML) methods.

Results: The classification performance of Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Naive Bayes (NB), Recursive Partitioning and Regression Trees (RPART), and Support Vector Machines (SVM) in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated in vitro assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant) features were added, some ML methods performed better than others.

In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k=5) were always in the poorest performing set. The addition of measurement noise irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation.

LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA.



Conclusions: We have developed a novel simulation model to evaluate machine learning methods for the analysis of data sets in which in vitro bioassay data is being used to predict in vivo chemical toxicology. From our analysis, we can recommend that several ML methods, most notably SVM and ANN are good candidates for use in real world applications in this area.
最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?