RSS
热门关键字:  数据挖掘  人工智能  数据仓库  搜索引擎  数据挖掘导论

German scientists develop software to read chemical compounds

来源: 作者: 时间:2007-08-02 点击:

German scientists have developed a new software tool capable of identifying pictures of chemical structures in patent files. The aim is to make these pictures computer-readable and retrievable.

Patent files and repositories of scientific publications often contain information on chemical structures in image format. While classifying these structures poses no problems for chemical scientists, who can open the document and understand the meaning of the images, computers have no way to index the structures since they only amount to a mass of pixels.

The chemoCR software, which was developed by Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) and InfoChem, a German company, combines pattern recognition techniques with supervised machine-learning concepts. The method is based on the idea of identifying from structural formulae the most significant semantic entities (e.g. chiral bonds, super atoms, reaction arrows). This enables computers to retrieve information contained in chemical-pharmaceutical patents, by performing structure searches.



'Up to now, structures have been drawn by chemists in India, Russia and other low-wage countries, and entered manually in databases. These fast developing countries are benefiting from the added indexing value. With chemoCR we can now reconstruct chemical structures faster and more cost-effectively, with computers,' says Peter Loew, InfoChem's CEO.

'With our software, for the first time, millions of patents can be searched using the chemical information contained in the pictures. This opens new possibilities for the investigation of patent claims on compounds and synthesis procedures; chemoCR addresses one of the most common challenges of the chemical and pharmaceutical industry,' added Professor Martin Hofmann-Apitius, Director of SCAI.

数据挖掘研究院

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?