基于功率谱的蛋白质序列特征提取新方法

Feature Extraction Method of Protein Sequences Based on Power Spectrum

DOI:10.3969/j.issn.1673-1689.2018.11.007

中文关键词: DNA序列 功率谱 分层聚类 蛋白质序列

英文关键词: DNA sequence,power spectrum,hierarchical clustering,protein sequence,entropy

基金项目:

作者

单位

梁启浩

江南大学 理学院江苏 无锡 214122

李阳

江南大学 理学院江苏 无锡 214122

唐旭清

江南大学 理学院江苏 无锡 214122

摘要点击次数: 98

全文下载次数: 151

中文摘要:

采用分层聚类和熵评价方法进行基于功率谱的蛋白质序列特征提取新方法研究。具体包含以下3个内容:首先,基于经典的HP模型给出了氨基酸序列的数值序列表达;其次,采用离散傅里叶变换方法获取蛋白质序列的特征频谱,构造12维特征向量;最后,利用分层聚类法获取蛋白质序列的分层结构。这种新方法将基于功率谱的DNA序列特征提取方法推广到蛋白质序列上。通过基于19条动物线粒体脱氢酶亚基1和亚基4,以及11条?茁珠蛋白等3组数据的分层结构比较实验,结果表明,新方法在数据系统的分层结构的信息提取上优于基于功率谱的DNA序列分析方法。因此,新方法对确定未知基因的结构与功能有重要的生物意义。

英文摘要:

Based on the power spectrum,a new way for extracting the protein sequences feature was proposed by applying the hierarchical clustering and entropy evaluation. It contained the following three main parts. Firstly,the numerical expression of amino acid sequences was given by the classical HP model. Then,the characteristic spectrum of protein sequence was obtained by using the discrete Fourier transform,and a 12-dimensional feature vector was constructed to represent the protein sequence spectral. Finally,the hierarchical clustering method was used to obtain the structure of protein sequences. The way is a new extension from DNA sequence to the protein sequence. By testing and comparing on three sets of data,their hierarchical structures shown that the new method is better than the DNA sequence analysis method based on power spectrum for extracting the structural information of the data system. This method has important biological significance in determining the structure and function of the unknown genes.

查看全文 查看/发表评论 下载PDF阅读器