npj:牛人新成就—材料合成过程的半监督机器学习分类
海归学者发起的公益学术平台
分享信息,整合资源
交流学术,偶尔风月
由美国加州大学伯克利分校的Gerbrand Ceder教授(本刊编委,美国工程院院士)领导的团队构建了一种半监督机器学习方法,用来从文献的自然语言文字中批量获取的无机材料合成信息并对其进行分类。他们首先采用无监督的算法从已发表的220多万篇文献中提取相关材料合成方法及步骤的信息,进而用监督学习方法对这些信息进行分类。通过两种机器学习模式的结合,可以准确获得材料合成的多层次信息,并以人类和机器可读的方式呈现出来。该研究表明,上述机器学习方法不仅能对材料合成过程准确地分类,而且能够重构出材料的合成路线图。该研究的重要意义在于创新性地提出了从自然语言书写的文献中批量提取材料合成的信息用于机器读取的思路,并基于机器学习给出了相应的实现方案,为无机材料合成数据库的构建奠定了重要基础。
该文近期发表于npj Computational Materials 5: 62 (2019),英文标题与摘要如下,点击左下角“阅读原文”可以自由获取论文PDF。
Semi-supervised machine-learning classification of materials synthesis procedures
Haoyan Huo, Ziqin Rong, Olga Kononova, Wenhao Sun, Tiago Botari, Tanjin He, Vahe Tshitoyan & Gerbrand Ceder
Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis. However, most content in the scientific literature is locked-up in written natural language, which is difficult to parse into databases using explicitly hard-coded classification rules. In this work, we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language. Without any human input, latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps, such as “grinding” and “heating”, “dissolving” and “centrifuging”, etc. Guided by a modest amount of annotation, a random forest classifier can then associate these steps with different categories of materials synthesis, such as solid-state or hydrothermal synthesis. Finally, we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures. Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized, machine-readable database.
扩展阅读
本文系网易新闻·网易号“各有态度”特色内容
媒体转载联系授权请看下方
最新评论
推荐文章
作者最新文章
你可能感兴趣的文章
Copyright Disclaimer: The copyright of contents (including texts, images, videos and audios) posted above belong to the User who shared or the third-party website which the User shared from. If you found your copyright have been infringed, please send a DMCA takedown notice to [email protected]. For more detail of the source, please click on the button "Read Original Post" below. For other communications, please send to [email protected].
版权声明:以上内容为用户推荐收藏至CareerEngine平台,其内容(含文字、图片、视频、音频等)及知识版权均属用户或用户转发自的第三方网站,如涉嫌侵权,请通知[email protected]进行信息删除。如需查看信息来源,请点击“查看原文”。如需洽谈其它事宜,请联系[email protected]。
版权声明:以上内容为用户推荐收藏至CareerEngine平台,其内容(含文字、图片、视频、音频等)及知识版权均属用户或用户转发自的第三方网站,如涉嫌侵权,请通知[email protected]进行信息删除。如需查看信息来源,请点击“查看原文”。如需洽谈其它事宜,请联系[email protected]。