词袋模型在蛋白质亚细胞定位预测中的应用

Application of Bag of Words Model in the Prediction of Protein Subcellular Location

DOI:10.3969/j.issn.1673-1689.2017.03.011

中文关键词: 词袋模型 K-means 支持向量机 亚细胞定位预测

英文关键词: bag of words model,K-means,support vector machine,subcellular localization prediction

基金项目:

作者

单位

赵南

南京农业大学 信息科学技术学院江苏 南京 210095

张梁

江南大学 粮食发酵工艺与技术国家工程实验室江苏 无锡 214122

薛卫

南京农业大学 信息科学技术学院江苏 南京 210095

王雄飞

南京农业大学 信息科学技术学院江苏 南京 210095

任守纲

南京农业大学 信息科学技术学院江苏 南京 210095

摘要点击次数: 161

全文下载次数: 601

中文摘要:

运用词袋模型结合传统的蛋白质特征提取算法提取蛋白质序列特征,采用K-means算法构建字典,计算获得蛋白质序列的词袋特征,最终将提取的特征值送入SVM多类分类器,对数据集中蛋白质的亚细胞位置进行预测,在一定程度上提高了亚细胞定位预测的准确率。

英文摘要:

Predecessors have done a lot of work in the feature extraction of protein and subcellular localization prediction. Previous studies showed that prediction accuracy obtained by traditional feature extraction algorithm is low. In order to improve accuracy,bag of words model combined with traditional protein features extraction algorithm is used to extract feature of protein sequence in this study. Firstly,K-means algorithm is used to construct feature dictionary. Then bag of words features of protein sequences are counted by dictionary.Finally extracted feature is inputted into SVM classifier to forecast the protein subcellular location. Results showed that predictionaccuracy of subcellular localization has been improved.

查看全文 查看/发表评论 下载PDF阅读器