海归学者发起的公益学术平台
分享信息,整合资源

交流学术,偶尔风月
计算材料在过去三十年取得了长足进展,尤其在基于第一性原理计算的材料性能预测及新材料设计方面取得了巨大成功。然而,如何针对这些新材料预测其合成方式是当前材料研究中的瓶颈,即,通常需要数月的反复试错试验甚至更长时间才能得到一种新化合物的合成方式。建立无机材料合成信息数据库是克服这一瓶颈的重要环节。
由美国加州大学伯克利分校的Gerbrand Ceder教授(本刊编委,美国工程院院士)领导的团队构建了一种半监督机器学习方法,用来从文献的自然语言文字中批量获取的无机材料合成信息并对其进行分类。他们首先采用无监督的算法从已发表的220多万篇文献中提取相关材料合成方法及步骤的信息,进而用监督学习方法对这些信息进行分类。通过两种机器学习模式的结合,可以准确获得材料合成的多层次信息,并以人类和机器可读的方式呈现出来。该研究表明,上述机器学习方法不仅能对材料合成过程准确地分类,而且能够重构出材料的合成路线图。该研究的重要意义在于创新性地提出了从自然语言书写的文献中批量提取材料合成的信息用于机器读取的思路,并基于机器学习给出了相应的实现方案,为无机材料合成数据库的构建奠定了重要基础。
该文近期发表于npj Computational Materials 5: 62 (2019),英文标题与摘要如下,点击左下角“阅读原文”可以自由获取论文PDF。
Semi-supervised machine-learning classification of materials synthesis procedures 
Haoyan Huo, Ziqin Rong, Olga Kononova, Wenhao Sun, Tiago Botari, Tanjin He, Vahe Tshitoyan & Gerbrand Ceder
Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis. However, most content in the scientific literature is locked-up in written natural language, which is difficult to parse into databases using explicitly hard-coded classification rules. In this work, we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language. Without any human input, latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps, such as “grinding” and “heating”, “dissolving” and “centrifuging”, etc. Guided by a modest amount of annotation, a random forest classifier can then associate these steps with different categories of materials synthesis, such as solid-state or hydrothermal synthesis. Finally, we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures. Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized, machine-readable database.
本文系网易新闻·网易号“各有态度”特色内容
媒体转载联系授权请看下方
继续阅读
阅读原文