CL-RBF:一种基于改进ML-RBF的蛋白质亚细胞多点定位预测算法

CL-RBF:An Improved ML-RBF Method for Prediction of Protein Subcellular Location

DOI:10.3969/j.issn.1673-1689.2020.02.009

中文关键词: ML-RBF 亚细胞定位 轮廓系数 词袋模型

英文关键词: ML-RBF,protein subcellular localization,silhouette coefficient,bag of words

基金项目:

作者

单位

薛卫

南京农业大学 信息科学技术学院江苏 南京 210095

洪晓宇

南京农业大学 信息科学技术学院江苏 南京 210095

胡雪娇

南京农业大学 信息科学技术学院江苏 南京 210095

陈行健

南京农业大学 信息科学技术学院江苏 南京 210095

张梁

江南大学 粮食发酵工艺与技术国家工程实验室江苏 无锡 214122

摘要点击次数: 17

全文下载次数: 19

中文摘要:

综合考虑标记内和标记间的聚类结果对多目标学习径向基神经网络算法(RBF Neural Networks for Multi-Label Learning,ML-RBF)的影响,提出CL-RBF算法并应用到蛋白质亚细胞多点定位预测中。通过引入轮廓系数(Silhouette Coefficient)对ML-RBF隐层中心的个数进行优化,并通过分析标记间聚类结果的关系,对小于某一阈值的标记间的聚类中心重新聚类,使用梯度下降算法进行参数调整,最后依据测试样本与标记L的隐层中心和不属于标记L的样本生成的聚类中心的欧式距离差调整预测结果。在10折交叉验证下,采用词袋模型(Bag of Words)和氨基酸组成法(Amino acid composition,AAC)结合的方式提取特征向量,选取另外4种多目标学习算法作对比实验,根据不同评价指标的结果,得出 CL-RBF算法在4个多标记数据集上的综合性能最优的结论。本研究预测算法通过网站 https://njau.applinzi.com/homepage_final.jsp实现。

英文摘要:

CL-RBF algorithm was proposed to predict the protein subcellular localization,which is considered about cluster results within one label and between different labels of the ML-RBF method. Silhouette coefficient was introduced to get the optimal number of centroids on hidden layer. The previous approach only considered optimization of clustering algorithms within the same label. In this paper,larger distance between two centroids which were generated from two labels was taken into account,when there were less samples covering these two labels. Besides,gradient descent algorithm was used to adjust the parameters. The final adjustment was made by analyzing the distance between train samples,the hidden centers obtained by label L and the clustering centers not belonging to label L. Bag of words and AAC method were employed to extract the feature of protein sequence. Compared with the methods which have been introduced previously for bacterial protein subcellular localization prediction via 10-fold cross-validation test,the new predictor performed more powerful and flexible on four different multi-label datasets.The prediction server was available on https://njau.applinzi.com/homepage_final.jsp.

查看全文 查看/发表评论 下载PDF阅读器