词袋模型在蛋白质亚细胞定位预测中的应用-食品科学与资源挖掘全国重点实验室

词袋模型在蛋白质亚细胞定位预测中的应用

Application of Bag of Words Model in the Prediction of Protein Subcellular Location

DOI：10.3969/j.issn.1673-1689.2017.03.011

英文关键词: bag of words model,K-means,support vector machine,subcellular localization prediction

基金项目:

作者	单位
赵南	南京农业大学信息科学技术学院，江苏南京 210095
张梁	江南大学粮食发酵工艺与技术国家工程实验室，江苏无锡 214122
薛卫	南京农业大学信息科学技术学院，江苏南京 210095
王雄飞	南京农业大学信息科学技术学院，江苏南京 210095
任守纲	南京农业大学信息科学技术学院，江苏南京 210095

摘要点击次数: 161

全文下载次数: 601

中文摘要:

运用词袋模型结合传统的蛋白质特征提取算法提取蛋白质序列特征，采用K-means算法构建字典，计算获得蛋白质序列的词袋特征，最终将提取的特征值送入SVM多类分类器，对数据集中蛋白质的亚细胞位置进行预测，在一定程度上提高了亚细胞定位预测的准确率。

英文摘要:

Predecessors have done a lot of work in the feature extraction of protein and subcellular localization prediction. Previous studies showed that prediction accuracy obtained by traditional feature extraction algorithm is low. In order to improve accuracy，bag of words model combined with traditional protein features extraction algorithm is used to extract feature of protein sequence in this study. Firstly，K-means algorithm is used to construct feature dictionary. Then bag of words features of protein sequences are counted by dictionary.Finally extracted feature is inputted into SVM classifier to forecast the protein subcellular location. Results showed that predictionaccuracy of subcellular localization has been improved.

查看全文查看/发表评论下载PDF阅读器

上一篇 > ：产丙谷二肽重组大肠杆菌的构建及发酵优化

下一篇 > ：黑莓发酵酒澄清稳定处理技术