npj:机器学习—预测识别单壁碳纳米管的DNA序列
海归学者发起的公益学术平台
分享信息,整合资源
交流学术,偶尔风月
来自美国Lehigh大学化学与生物分子工程系的Anand Jagota等,基于现有实验序列数据集,报告了一种用机器学习分析来预测识别DNA序列的有效方法。为便于分析、解释,他们将SWCNT识别的DNA序列限制为只有2种碱基组合(C&T)的、12个碱基构成的短序列。以已知数据训练机器学习模型,并将实验测试过的新序列数据集添加到原始数据集,重新训练模型。通过交叉验证和新测试集上的预测误差来评估预测性能,并通过特征表示方法改进模型性能。结果显示准确预测识别序列的频率从原始训练集的10%显著提升到> 50%。他们所获得的机器学习模型,有可能为更普遍的序列选择问题提供新的途径。
该文近期发表于npj Computational Materials 5: 3 (2019),英文标题与摘要如下,点击左下角“阅读原文”可以自由获取论文PDF。
Learning to predict single-wall carbon nanotube-recognition DNA sequences
Yoona Yang, Ming Zheng & Anand Jagota
Abstract DNA/single-wall carbon nanotube (SWCNT) hybrids have enabled many applications because of their special ability to disperse and sort SWCNTs by their chirality and handedness. Much work has been done to discover sequences which recognize specific chiralities of SWCNT, and significant progress has been made in understanding the underlying structure and thermodynamics of these hybrids. Nevertheless, de novo prediction of recognition sequences remains essentially impossible and the success rate for their discovery by search of the vast single-stranded DNA library is very low. Here, we report an effective way of predicting recognition sequences based on machine learning analysis of existing experimental sequence data sets. Multiple input feature construction methods (position-specific, term-frequency, combined or segmented term frequency vector, and motif-based feature) were used and compared. The transformed features were used to train several classifier algorithms (logistic regression, support vector machine, and artificial neural network). Trained models were used to predict new sets of recognition sequences, and consensus among a number of models was used successfully to counteract the limited size of the data set. Predictions were tested using aqueous two-phase separation. New data thus acquired were used to retrain the models by adding an experimentally tested new set of predicted sequences to the original set. The frequency of finding correct recognition sequences by the trained model increased to >50% from the ~10% success rate in the original training data set.
本文系网易新闻·网易号“各有态度”特色内容
媒体转载联系授权请看下方
最新评论
推荐文章
作者最新文章
你可能感兴趣的文章
Copyright Disclaimer: The copyright of contents (including texts, images, videos and audios) posted above belong to the User who shared or the third-party website which the User shared from. If you found your copyright have been infringed, please send a DMCA takedown notice to [email protected]. For more detail of the source, please click on the button "Read Original Post" below. For other communications, please send to [email protected].
版权声明:以上内容为用户推荐收藏至CareerEngine平台,其内容(含文字、图片、视频、音频等)及知识版权均属用户或用户转发自的第三方网站,如涉嫌侵权,请通知[email protected]进行信息删除。如需查看信息来源,请点击“查看原文”。如需洽谈其它事宜,请联系[email protected]。
版权声明:以上内容为用户推荐收藏至CareerEngine平台,其内容(含文字、图片、视频、音频等)及知识版权均属用户或用户转发自的第三方网站,如涉嫌侵权,请通知[email protected]进行信息删除。如需查看信息来源,请点击“查看原文”。如需洽谈其它事宜,请联系[email protected]。