2023年高教版数学建模与数学实验第版DNA序列分类竞赛题_第1页
2023年高教版数学建模与数学实验第版DNA序列分类竞赛题_第2页
2023年高教版数学建模与数学实验第版DNA序列分类竞赛题_第3页
2023年高教版数学建模与数学实验第版DNA序列分类竞赛题_第4页
2023年高教版数学建模与数学实验第版DNA序列分类竞赛题_第5页
已阅读5页,还剩26页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

DNA序列分类摘要本问题是一个“有人管理分类问题”.一方面分别列举出20个学习样本序列中1字符串、2字符串、3字符串出现的频率,构成含41个变量的基本特性集,接着用主成分分析法从中提取出4个特性.然后用Fisher线性判别法进行分类,得出了所求20个人工制造序列及182个自然序列的分类结果如下:20个人工序列:22,23,25,27,29,34,35,36,37为A类,其余为B类.182个自然序列:1,4,8,10,27,29,32,41,43,48,54,63,70,72,75,76,81,86,90,92,102,110,116,119,126,131,144,150,157,159,160,161,162,163,164,165,166,169,170,182为B类,其余为A类.最后通过检查证明所用的分类数学模型效率较高.一、问题重述人类基因组计划中DNA全序列草图是由4个字符A,T,C,G按一定顺序排成的长约30亿的字符序列,其中没有“断句”也没有标点符号.虽然人类对它知之甚少,但也发现了其中的一些规律性和结构.例如,在全序列中有一些是用于编码蛋白质的序列片段,即由这4个字符组成的64种不同的3字符串,其中大多数用于编码构成蛋白质的20种氨基酸.又例如,在不用于编码蛋白质的序列片段中,A和T的含量特别多些,于是以某些碱基特别丰富作为特性去研究DNA序列的结构也取得了一些结果.此外,运用记录的方法还发现序列的某些片段之间具有相关性,等等.这些发现让人们相信,DNA序列中存在着局部的和全局性的结构,充足发掘序列的结构对理解DNA全序列是十分故意义的.目前在这项研究中最普通的思想是省略序列的某些细节,突出特性,然后将其表达成适当的数学对象.作为研究DNA序列的结构的尝试,提出以下对序列集合进行分类的问题:1)请从20个已知类别的人工制造的序列(其中序列标号1~10为A类,11~20为B类)中提取特性,构造分类方法,并用这些已知类别的序列,衡量你的方法是否足够好.然后用你认为满意的方法,对此外20个未标明类别的人工序列(标号21~40)进行分类,把结果用序号(按从小到大的顺序)标明他们的类别(无法分类的不写入)2)同样方法对182个自然DNA序列(他们都较长)进行分类,像1)同样地给出分类结果.二、模型的合理假设各序列中DNA碱基三联组(即3字符串)的起始位置和基因表达不影响分类的结果.64种3字符串压缩为20组后不影响分类的结果.较长的182个自然序列与已知类别的20个样本序列具有共同的特性.三、模型建立与求解研究DNA序列具有什么结构,其A,T,C,G4个碱基排成的看似随机的序列中隐藏着什么规律,是解读人类基因组计划中DNA全序列草图的基础,也是生物信息学(Bioinformatcs)最重要的课题之一.题目给出了20个已知为两个类别的人工制造的DNA序列,规定我们从中提取特性,构造分类方法,从而对20个未标明类别的人工DNA序列和182个自然DNA序列进行分类.这是模式辨认中的“有人管理分类”问题,即事先规定了分类的标准和种类的数目,通过大批已知样本的信息解决找出规律,再用计算机预报未知.给出的已知类别的样本称为学习样本.对于此类问题,我们通过建立分类数学模型(这涉及形成和提取特性以及制定分类决策)、考察分类模型的效率、预报未知这几个环节来进行.(一)特性的形成和提取为了有效地实现分类辨认,一方面要根据被辨认的对象产生一组基本特性,并对基本特性进行变换,得到最能反映分类本质的特性.这就是特性形成和提取的过程.在列举了尽也许完备的特性参数集之后,就要借助于数学的方法,使特性参数的数目(在保证分类良好的前提下)减到最小.这是由于:1.多余的特性参数不仅没有多少好处,并且会带来噪音,干扰分类和数学模型的建立.2.为了保证样本数和特性参数个数的比值足够大,而又不必要用太多的样本,最佳使特性参数的个数降至最少.模式辨认计算一般规定样本数至少为变量数的3倍,否则结果不够可靠.本问题的学习样本数为20个,故特性参数的个数以6~8个为宜.我们通过研究4个字符A,T,C,G在DNA序列中的排列、组合特性,重要是研究字符和字符串的排列在序列中出现的频率,从中提取DNA序列的结构特性参数.1.特性的形成分别列举一个字符,2个字符,3个字符的排列在序列中出现的频率,构成基本特性集.(1)1个字符的出现频率表1列出了20个样本中A,T,C,G这4个字符出现的频率.由于在不用于编码蛋白质的序列片段中,A和T的含量特别多些,因此我们将A和T是否特别丰富作为一个特性.在表1中,列出了A和T出现的频率之和.(程序见附录一)表ﻩ1ACTGA+T1.29.7317.1213.5139.6443.242.27.0316.2215.3241.4442.343.27.0321.626.3145.0533.334.42.3410.8128.8318.0271.175.23.4223.4210.8142.3434.236.35.1412.6112.6139.6447.757.35.149.9118.9236.0454.058.27.9316.2218.9236.9446.859.20.7220.7215.3243.2436.0410.18.1827.2713.6440.9131.8211.35.454.5550.0010.0085.4512.32.732.7350.0014.5582.7313.25.4510.0051.8212.7377.2714.30.008.1850.0011.8280.0015.29.09.0064.556.3693.6416.36.368.1846.369.0982.7317.35.4524.5526.3613.6461.8218.29.0911.8250.009.0979.0919.21.8214.5556.367.2778.1820.20.0017.2756.366.3676.36(2)2字符串的排列出现的频率A,T,C,G这4个字符组成了16种不同的2字符串.表2列出了20个样本中各2字符串出现的频率.(用“滚动”算法,如ATTCG有AT,TT,TC,CG共4个2字符串)(程序与附录一类似)表2AAACATAGTATCTGTTCACTCCCGGAGTGCGG1.9.019.013.608.114.50.904.503.603.603.601.808.1111.712.705.4118.922.9.917.213.605.412.701.805.415.414.501.80.909.019.914.505.4121.625.6.318.111.807.211.802.702.703.605.414.502.7010.819.91.909.0121.626.15.322.706.319.913.601.801.805.414.50.00.008.1110.81.908.1119.827.15.321.8010.817.214.502.706.315.41.901.80.906.3113.51.904.5016.228.8.113.606.319.915.413.602.707.212.703.601.808.1110.811.807.2116.229.9.01.904.506.31.003.607.214.503.602.702.7011.717.213.6013.5118.0210.6.363.641.826.361.825.452.733.645.453.644.5513.644.553.6413.6418.1811.15.452.7314.552.7316.36.911.8230.00.91.91.911.822.734.55.002.7312.13.64.9110.916.3615.451.821.8230.91.91.91.00.912.737.27.004.5513.6.364.5510.004.5512.731.822.7334.552.732.731.821.823.644.551.822.7314.8.18.9112.737.2713.646.361.8228.182.734.55.00.915.454.55.91.9115.13.64.0012.731.8213.64.002.7348.18.00.00.00.001.823.64.00.9116.16.363.6415.45.9113.644.554.5522.731.825.45.00.914.552.73.001.8217.17.275.4510.911.8210.006.364.555.454.557.279.092.733.642.733.643.6418.8.187.2711.821.8215.451.82.9130.913.643.641.822.731.823.64.912.7320.6.366.366.36.919.0910.003.6432.732.7313.64.91.001.823.64.00.91(3)3字符串的排列出现的频率A,T,C,G这4个字符组成了64种不同的3字符串.这64种3字符串构成生物蛋白质的20种氨基酸.在参考文献[1]的Figur2中,给出了这20种氨基酸的编码(见图1).因此,在计算3字符串的出现频率时,我们根据图1将代表同一种氨基酸的3字符串合成一类,只记录20类3字符串的出现频率.(不考虑字符串在序列片段中的起始位置,也采用“滚动”算法.如ACGTCC中就有ACG,CGT,GTC,TCC共4个3字符串)见表3.(程序与附录一类似)Symmetriesofthediamondcodesortthe64codonsinto20classes,indicatedhereby20colors.Allthecodonsineachclassspecifiedthesameaminoacid.图1BrianHayes在论文“TheInventionoftheGeneticCode”中给出的图形(注:图中DNA被转录为RNA,“U”代表“T”)表3b1b2b3b4b5b6b7b8b9b10b11b12b13b14b15b16b17b18b19b2011.773.542.650.880.000.007.960.884.422.6517.7010.623.544.424.427.081.773.5413.277.0821.891.890.940.940.000.941.890.944.7212.267.5511.328.493.773.776.609.436.607.552.8330.980.000.005.880.988.822.940.000.002.9410.785.8813.730.004.903.9219.611.968.825.8840.000.000.000.870.000.8713.041.746.092.6111.3013.043.485.223.488.703.481.7414.78,7.8352.860.000.003.810.953.813.810.003.813.819.529.5212.382.869.524.767.622.867.629.5260.000.000.882.630.001.7513.160.884.391.7514.049.657.025.264.3911.402.631.7510.536.1471.920.000.002.880.964.812.880.001.924.8112.506.7313.461.926.734.8110.583.859.627.6990.000.000.002.972.979.902.970.000.993.966.931.9813.861.982.973.9623.762.978.916.93101.870.933.742.800.000.002.800.007.488.419.357.483.7414.9512.150.002.804.677.487.48110.000.890.000.000.001.798.040.005.364.4615.188.048.934.463.578.044.466.2513.395.36122.730.000.912.730.913.644.553.643.641.829.095.453.645.456.367.278.185.4510.919.09131.800.900.900.900.000.909.010.003.607.2114.418.117.216.317.214.501.807.2111.714.50152.911.942.911.940.005.831.940.001.949.715.838.7410.681.943.883.888.742.9111.6510.68162.860.950.0011.431.901.902.860.004.763.815.718.578.576.679.524.765.712.867.627.62171.920.961.924.811.923.851.920.960.966.734.818.6510.582.886.732.889.626.738.657.69181.710.851.710.850.852.5616.240.851.710.8516.245.136.845.983.4211.111.715.1311.113.42200.860.860.001.720.860.8617.240.862.591.7215.527.765.173.454.319.485.175.179.485.17其中b1=aaa+atab2=aca+agab3=cac+ctcb4=ccc+cgcb5=gag+gtgb6=gcg+gggb7=tat+tttb8=tct+tgtb9=aac+caa+atc+ctab10=aag+gaa+atg+gtab11=aat+taa+att+ttab12=acc+cca+agc+cgab13=acg+gac+ctg+gtcb14=act+tca+agt+tgab15=cag+gac+ctt+ttcb16=cat+tac+ctt+ttcb17=ccg+gcc+cgg+ggcb18=cct+tcc+cgt+tgcb19=gat+tag+gtt+ttgb20=gct+tcg+ggt+tgg综合起来,形成了有41个变量的基本特性集.2.特性的提取上述基本特性集中有41个变量,即样本处在一个高维空间中.特性的提取就是通过变换的方法用低维空间来表达样本,使得X的大部分特性能由Y来表达,即将p维随机向量X变换成q维随机向量Y(q<p).我们用主成分分析法进行特性的提取,其环节是:(1)求X的均方差矩阵V的特性根,记为:λ1≥λ2≥…≥λk>0λk+1=…=λP=0(2)求λ1,λ2…λK相应的标准正交的特性向量r1,r2,…,rk得到第i个主成分为yi=riX,i=1,2,…,k.ﻩ(3)求第i个主成分的奉献率ui=λi/λj,i=1,2,…,k,及前m个主成分的累计奉献率vm=ui.(4)求得q,使得Vq≥V0(V0一般在0.85到1之间),则取W=(r1,r2,…,rq)Y=XW第3步所求的奉献率,代表主成分表达X的能力,奉献率越大,相应的主成分表达X的能力越强.只要前q个主成分的累计奉献率超过给定的比例V.就可以用低维特性Y=(y1,y2,…,yq)来反映高维特性(x1,x2,…,xp)的变化特性.现将反映20个已知类别样本的41个特性的随机向量X进行特性提取.计算得前4个主成分的累计奉献率为96%,故提取特性为4个变量,取W=(r1,r2,r3,r4),则Y=XW,Y的4个分量就是从基本特性集提取所得的特性参数向量.(程序及结果见附录二)(二)分类决策的制定前面已选取了特性参数,把特性参数张成的多维空间称为特性空间.分类决策就是在特性空间中用记录的方法把被辨认对象归为某一类别.基本作法是在学习样本集的基础上拟定某个判决规则,使按这种判决规则对被甄别对象进行分类所导致的错误辨认率最小或引起的损失最少.这里,我们的分类决策选取Fisher线性判别法.即选取线性判别函数U(x),使得:U(x)={E1[U(x)]-E2[U(x)]}2/{D1[U(x)]+D2[U(x)]}=max(1)其中Ei与Di分别表达母体i的盼望和方差运算,i=1,2.(1)式的含义是:构造一个线性判别函数U(x)对样本进行分类,使得平均犯错概率最小.即应在不同母体下,使U(x)的取值尽量分开.具体地说,要使母体间的差异(E1(U(x))-E2(U(x)))2相对于母体内的差异D1[U(x)]+D2[U(x)]为最大.取U(x)=(1-2)T(∑1+∑2)-1X就可满足(1).其中i为第i类母体的均值矩阵的估计,∑i为第i类母体的方差矩阵的估计.取分类门槛值为:U0=U(α*1+(1-α)*2)其中0<α<1,本问题中两类样本的个数相等,可取α=1/2.若U(1)>U0,U(2)<U0,则当U(X)>U0.,就认为X取自母体1;当U(X)<U0,就认为X取自母体2.用上面得出的4个主成分构成的特性组和此分类决策,对20个学习样本进行分类,能得出对的的结果.但是,若取W=(r1,r2,r3),求Y=XW,以Y的3个分量作为特性参数向量,再用Fisher线性判别法对20个学习样本进行分类,则第四个样本不能对的分类.因此,得出分类的数学模型为:特性选取:取W=(r1,r2,r3,r4),求Y=XW,得出特性参数向量就是Y的4个列向量.其中X是反映20个学习样本的41个特性的随机向量.分类决策:Fisher线性判别法.(三)分类模型的有效性考察前面建立的分类数学模型对20个学习样本进行了对的分类.为了进一步考察分类模型的有效性和可靠性,我们采用的方法是:预先留一部分学习样本不参与训练,然后用分类决策模型对其作预报,将预报成功率作为预报能力的指标.每次取出一个学习样本,以其余学习样本作训练集,用分类决策模型对取出的一个样本作预报,同时对给出的后20种样本作预报.结果见表4.表4取出样品序号取出样本类别预报后20组样本中A类序号预报1A22,23,25,27,29,34,35,36,372A22,23,25,27,29,34,35,36,373A22,23,25,27,29,34,35,36,374A23,25,27,29,34,35,36,375A22,23,25,27,29,34,35,36,376A22,23,25,27,29,34,35,36,377A22,23,25,27,29,34,35,36,378A22,23,25,27,29,34,35,36,379A22,23,25,27,29,34,35,36,3710A22,23,25,27,29,34,35,36,3711B22,23,25,27,29,34,35,36,3712B22,23,25,27,29,34,35,36,3713B22,23,25,27,29,34,35,36,3714B22,23,25,27,29,34,35,36,3715B22,23,25,27,29,34,35,36,37,3916B22,23,25,27,29,34,35,36,3717B22,23,25,27,29,34,35,36,37,30,3918B22,23,25,27,29,34,35,36,3719B22,23,25,27,29,34,35,36,3720B22,23,25,27,29,34,35,37从表4可以看出:每次取出一个学习样本,以其余学习样本作训练集,用分类模型对该学习样本的预报的成功率是100%.每次取出一个学习样本,以其余学习样本作训练集,用分类模型对未知类别的第21~40个样本进行预报,其结果有以下特点:除分别取出4、15、17,20的预报结果不同外,分别取出其余16中一个,预报结果均为:22,23,25,27,29,34,35,36,37,占80%.分别取出4、15、20的预报结果,与(1)的结果相比,只有一个样本的差异,占15%.取出17的预报结果,与(1)的结果相比,有两个样本的差异,占5%.第一种结果和第二种结果非常接近,合计占总数的95%.只有第三组的这一个结果有较大差异,占总数的5%.由以上检查得出结论:所建立的分类数学模型分类效果很好.(四)未知样本的预报现在用前面建立的数学模型对题目所给的未知类型的20个人工序列和182个自然序列进行预报.(程序见附录三)结果为:20个人工序列的类别A类:22,23,25,27,29,34,35,36,37B类:21、24、26、28、30、31、32、33、38、39、40182个自然序列的类别A类:(共142个)2,3,5,6,7,9,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,28,30,31,33,34,35,36,37,38,39,40,42,44,45,46,47,49,50,51,52,53,55,56,57,58,59,60,61,62,64,65,66,67,68,69,71,73,74,77,78,79,80,82,83,84,85,87,88,89,91,93,94,95,96,97,98,99,100,101,103,104,105,106,107,108,109,111,112,113,114,115,117,118,120,121,122,123,124,125,127,128,129,130,132,133,134,135,136,137,138,139,140,141,142,143,145,146,147,148,149,151,152,153,154,155,156,158,167,168,171,172,173,174,175,176,177,178,179,180,181B类:(共40个)1,4,8,10,27,29,32,41,43,48,54,63,70,72,75,76,81,86,90,92,102,110,116,119,126,131,144,150,157,159,160,161,162,163,164,165,166,169,170,182模型的优缺陷分析优点:针对`“有人管理分类”问题,成功地建立解决这类难题的数学模型,并可立即运用到实践中去.仅用4个特性参数即圆满解决了较为复杂的分类问题.并且模型假设条件少,因而能准确地反映实际情况,可靠性高.采用模块化分析,逐渐进一步,提高了准确性.突出特性,假设合理,避免了在一些细节问题上的纠缠.缺陷:由于只考虑了DNA样本序列中1字符串、2字符串、3字符串出现的频率作为特性,DNA序列的分类不一定与实际情况完全相符.(可以由科学家用物理的或化学的方法测定,作为补充).模型的改善方向及推广模型的改善:由于模型没考虑DNA序列的实际特性,当序列变得很多很长很复杂时,分类的准确性会减少而不可用,因此应增长对DNA序列的生物特性的考虑.模型的推广:该模型对一般的“有人管理分类”问题的求解有重要意义.对研究DNA序列的规律性和结构提供了一种有效的分类模型.对人类基因组的研究有现实意义,有助于加快科研步伐.六、参考文献[1]BrainHayes(美).TheInventionoftheGeneticCode.AmericanScientist—ComputingScience,Jan.-Feb.,1998[2]萧树铁主编.数学实验.北京:高等教育出版社,1999[3]复旦大学.概率论第二册—数理记录.北京:高等教育出版社,1985[4]WilliamF.Lucas主编.生命科学模型。长沙:国防科技大学出版社,1996[5]徐光辉主编.运筹学基础手册.北京:科学出版社,1999[6]姜启源主编.数学模型.北京:高等教育出版社,1993 七、附录附录一1个字符出现频率的计算程序]CHARACTER*121LINE(40)ﻩintegera,c,t,g,at READ*,LINEﻩDO20II=1,40ﻩiii=ii+20A=0 C=0ﻩT=0ﻩG=0DO10I=1,121ﻩIF(LINE(ii)(I:I).EQ.’a’)THEN A=A+1 elseif(line(ii)(I:I).eq.’c’)then c=c+1ﻩelseif(line(ii)(I:I).eq.’t’)thenﻩt=t+1 elseif(line(ii)(I:I).eq.’g’)then g=g+1ENDIFcontinue at=a+tﻩactg=a+c+t+g aa=a/actg*100. cc=c/actg*100. tt=t/actg*100. gg=g/actg*100.ﻩaatt=at/actg*100.ﻩopen(5,file='t1.dat',status='old')ﻩwrite(5,1)aa,cc,tt,gg1 format(1x,4f7.2)20 CONTINUEﻩEND附录二基本特性量的提取程序及结果d=[27.4319.4736.2816.8163.72;28.8524.0422.1225.0050.96;17.6525.4918.6338.2436.27;20.8719.1340.8719.1361.74;24.7622.8621.9030.4846.67;21.9321.0538.6018.4260.53;23.0820.1923.0833.6546.15;25.6414.5344.4415.3870.09;14.8521.7818.8144.5533.66;28.9724.3025.2321.5054.21;24.1117.8635.7122.3259.82;17.4322.9433.0326.6150.46;27.0318.9233.3320.7260.36;23.5323.5316.6736.2740.20;24.2721.3620.3933.9844.66;22.8630.4820.9525.7143.81;21.3625.2420.3933.0141.75;22.2217.0943.5917.0965.81;27.3628.3023.5820.7550.94;19.8319.8343.1017.2462.93];dd=[5.314.427.968.859.736.191.7718.586.194.424.424.426.194.424.421.77;7.699.623.857.699.623.85.966.732.881.927.6911.547.698.652.884.81;2.943.925.884.903.922.941.969.80.001.9612.759.8010.78.984.9021.57;1.744.353.4811.3013.041.742.6122.612.619.574.352.613.484.358.702.61;6.673.813.819.525.711.904.769.527.624.767.622.864.763.819.5212.38;3.513.515.269.657.894.391.7524.567.896.141.754.392.632.6311.401.75;5.774.814.817.696.732.882.8810.582.882.887.696.737.694.814.8115.38;3.425.139.406.8411.975.133.4223.932.566.842.562.567.693.421.712.56;1.981.983.966.933.962.972.978.911.98.998.918.916.934.957.9224.75;9.355.612.8010.287.485.615.616.548.417.482.805.613.748.419.35.00;2.685.364.4611.6115.181.79.8916.963.576.253.574.462.687.147.145.36;5.502.752.756.426.427.344.5913.764.595.506.426.42.9210.096.428.26;5.417.217.217.2110.811.805.4115.323.604.502.707.217.216.316.31.90;7.844.90.988.824.90.982.947.842.943.929.806.867.843.926.8617.65;5.834.853.889.717.773.881.946.803.882.913.889.716.806.808.7411.65;4.763.811.9012.388.575.71.006.675.713.8110.4810.483.818.579.522.86;3.882.912.9110.685.83.976.805.835.835.839.713.884.855.8311.6510.68;3.429.405.983.4210.261.714.2727.355.133.424.273.422.566.841.715.98;8.495.664.728.494.728.492.836.6011.321.899.435.662.839.434.723.77;3.457.764.314.3110.34.863.4527.591.726.038.623.454.315.171.726.03];ddd=[1.773.542.65.88.00.007.96.884.422.6517.7010.623.544.424.427.081.773.5413.277.08;1.921.92.96.96.00.961.92.964.8112.507.6911.548.653.853.856.739.626.737.692.88;.98.00.005.88.988.822.94.00.002.9410.785.8813.73.004.903.9219.611.968.825.88;.00.00.00.87.00.8713.041.746.092.6111.3013.043.485.223.488.703.481.7414.787.83;2.86.00.003.81.953.813.81.003.813.819.529.5212.382.869.523.817.622.867.629.52;.00.00.882.63.001.7513.16.884.391.7514.049.657.025.264.3911.402.631.7510.536.14;1.92.00.002.88.964.812.88.001.924.8112.506.7313.461.926.734.8110.583.859.627.69;2.563.42.00.85.85.8512.82.851.71.8520.512.563.429.405.9811.11.854.2711.973.42;.00.00.002.972.979.902.97.00.993.966.931.9813.861.982.973.9623.762.978.916.93;1.87.933.742.80.00.002.80.007.488.419.357.483.7414.9512.15.002.804.677.487.48;.00.89.00.00.001.798.04.005.364.4615.188.048.934.463.578.044.466.2513.395.36;2.75.00.922.75.923.674.593.673.671.839.175.503.675.506.427.348.265.5011.019.17;1.80.90.90.90.00.909.01.003.607.2114.418.117.216.317.214.501.807.2111.714.50;2.94.00.005.88.006.861.96.003.926.863.929.8013.73.985.882.9410.78.9810.789.80;2.911.942.911.94.005.831.94.001.949.715.838.7410.681.943.883.888.742.9111.6510.68;2.86.95.0011.431.901.902.86.004.763.815.718.578.576.679.524.765.712.867.627.62;1.94.971.944.851.943.881.94.97.976.804.858.7410.682.916.802.919.716.808.747.77;1.71.851.71.85.852.5616.24.851.71.8516.245.136.845.983.4211.111.715.1311.113.42;.94.941.89.94.94.941.89.9410.387.555.669.438.498.497.555.666.6011.326.60.94;.86.86.001.72.86.8617.24.862.591.7215.527.765.173.454.319.485.175.179.485.17];x=[29.7317.1213.5139.6443.24;27.0316.2215.3241.4442.34;27.0321.626.3145.0533.33;42.3410.8128.8318.0271.17;23.4223.4210.8142.3434.23;35.1412.6112.6139.6447.75;35.149.9118.9236.0454.05;27.9316.2218.9236.9446.85;20.7220.7215.3243.2436.04;18.1827.2713.6440.9131.82;;35.454.5550.0010.0085.45;32.732.7350.0014.5582.73;25.4510.0051.8212.7377.27;30.008.1850.0011.8280.00;29.09.0064.556.3693.64;36.368.1846.369.0982.73;35.4524.5526.3613.6461.82;29.0911.8250.009.0979.09;21.8214.5556.367.2778.18;20.0017.2756.366.3676.36];xx=[9.019.013.608.114.50.904.503.603.603.601.808.1111.712.705.4118.92;9.917.213.605.412.701.805.415.414.501.80.909.019.914.505.4121.62;5.4111.713.605.412.701.80.90.905.41.90.9014.4113.51.907.2123.42;18.925.4111.715.4110.811.805.4110.815.411.80.902.706.314.502.704.50;6.318.111.807.211.802.702.703.605.414.502.7010.819.91.909.0121.62;15.322.706.319.913.601.801.805.414.50.00.008.1110.81.908.1119.82;15.321.8010.817.214.502.706.315.41.901.80.906.3113.51.904.5016.22;8.113.606.319.915.413.602.707.212.703.601.808.1110.811.807.2116.22;9.01.904.506.31.003.607.214.503.602.702.7011.717.213.6013.5118.02;6.363.641.826.361.825.452.733.645.453.644.5513.644.553.6413.6418.18;15.452.7314.552.7316.36.911.8230.00.91.91.911.822.734.55.002.73;13.64.9110.916.3615.451.821.8230.91.91.91.00.912.737.27.004.55;6.364.5510.004.5512.731.822.7334.552.732.731.821.823.644.551.822.73;8.18.9112.737.2713.646.361.8228.182.734.55.00.915.454.55.91.91;13.64.0012.731.8213.64.002.7348.18.00.00.00.001.823.64.00.91;16.363.6415.45.9113.644.554.5522.731.825.45.00.914.552.73.001.82;17.275.4510.911.8210.006.364.555.454.557.279.092.733.642.733.643.64;8.187.2711.821.8215.451.82.9130.913.643.641.822.731.823.64.912.73;2.732.7313.641.8214.559.09.9131.821.828.181.822.732.732.73.91.91;6.366.366.36.919.0910.003.6432.732.7313.64.91.001.823.64.00.91];xxx=[5.41.902.70.905.413.60.901.802.708.114.501.8025.233.603.605.4113.51.003.604.50;2.702.70.00.003.606.312.70.907.217.216.311.8018.92.906.311.8014.41.003.6010.81;2.702.702.70.003.606.31.00.904.505.411.80.9029.73.005.414.5022.52.001.802.70;15.326.31.00.00.00.909.011.806.3110.8112.613.604.501.802.705.411.801.807.216.31;3.601.802.70.005.417.21.90.004.501.802.703.6020.721.806.314.5019.821.801.807.21;9.01.90.90.002.705.414.50.002.7013.516.31.0025.23.901.801.8016.22.002.703.60;9.011.80.00.001.804.504.50.903.6016.228.11.0017.122.701.801.8010.81.906.316.31;2.701.80.90.902.703.602.70.904.509.918.113.6018.92.902.704.5012.61.907.218.11;5.41.00.901.805.419.011.80.903.606.311.803.6011.712.702.702.7020.721.804.5010.81;3.64.912.736.363.6410.91.911.823.642.732.73.9117.27.004.554.5517.274.551.827.27;9.09.91.00.00.00.0024.55.003.646.3633.64.914.551.82.001.82.002.735.452.73;2.73.91.00.00.00.0019.09.001.828.1837.27.004.554.55.002.73.00.9110.005.45;.912.73.00.00.00.0027.271.821.825.4526.362.734.552.734.555.451.822.735.451.82;6.365.45.00.001.82.0020.005.452.732.7324.55.001.823.643.648.18.91.919.09.91;11.82.91.00.001.82.0047.271.82.003.6425.45.00.91.91.00.00.00.002.73.91;10.002.73.91.00.00.0014.554.555.453.6431.82.91.913.641.826.36.00.007.273.64;10.91.913.643.64.00.918.182.7312.739.0911.823.643.646.361.821.826.366.361.821.82;4.554.55.00.00.91.9121.82.914.55.9129.09.003.641.82.9110.912.734.554.55.91;3.64.911.82.91.91.0025.455.453.64.0021.821.821.823.64.9113.64.912.735.452.73;2.73.915.45.00.00.0023.6410.006.361.8213.64.001.828.181.8213.64.001.826.36.00];ffx=[xxxxxx];ffd=[dddddd];cx=cov(ffx);[vx,ex]=eig(cx);ex1=eig(cx);e1=mean(ex1)*41;ex2=ex1(38:41,:);e2=mean(ex2)*7;e2/e1vx1=[vx(:,38:41)];s=ffx*vx1;ss=ffd*vx1;x=s(1:10,:);y=s(11:20,:);u1=mean(x);u2=mean(y);u1-u2;z=8/9*(cov(x)+cov(y));ux=0.5*(u1-u2)*inv(z);u12=0.5*u1+0.5*u2;u0=ux*u12.';la=0;fori=1:10p(i)=ux*ss(i,:).';tx(i)=ux*x(i,:).';fy(i)=ux*y(i,:).';ifp(i)>u0pbd(i)=1;la=la+1;elsepbd(i)=2;endiftx(i)>u0lbx(i)=1;elselbx(i)=2;endiffy(i)>u0lby(i)=1;elselby(i)=2;endforn=11:20p(n)=ux*ss(n,:)';ifp(n)>u0pbd(n)=1;la=la+1;elsepbd(n)=2;endtx,fy,ppbd,lbx,lbyans=0.9847u0=-2.4812tx=Columns1through78.24719.707410.87803.86729.38379.76129.2023Columns8through106.270011.64895.4181fy=Columns1through7-15.2467-15.2121-14.2828-8.0112-13.4839-11.1970-11.2608Columns8through10-15.0827-14.9635-15.2662p=Columns1through7-6.5147-3.68690.7514-6.08380.3758-6.78050.1074Columns8through14-8.11945.0825-6.1039-7.0908-2.7297-6.07154.1447Columns15through204.5919-4.21990.9096-9.2269-8.1303-10.7112pbd=Columns1through12Columns13through2021121222lbx=lby=附录三对未知序列进行分类的运算程序d=[27.4319.4736.2816.8163.72;28.8524.0422.1225.0050.96;17.6525.4918.6338.2436.27;20.8719.1340.8719.1361.74;24.7622.8621.9030.4846.67;21.9321.0538.6018.4260.53;23.0820.1923.0833.6546.15;25.6414.5344.4415.3870.09;14.8521.7818.8144.5533.66;28.9724.3025.2321.5054.21;24.1117.8635.7122.3259.82;17.4322.9433.0326.6150.46;27.0318.9233.3320.7260.36;23.5323.5316.6736.2740.20;24.2721.3620.3933.9844.66;22.8630.4820.9525.7143.81;21.3625.2420.3933.0141.75;22.2217.0943.5917.0965.81;27.3628.3023.5820.7550.94;19.8319.8343.1017.2462.93];dd=[5.314.427.968.859.736.191.7718.586.194.424.424.426.194.424.421.77;7.699.623.857.699.623.85.966.732.881.927.6911.547.698.652.884.81;2.943.925.884.903.922.941.969.80.001.9612.759.8010.78.984.9021.57;1.744.353.4811.3013.041.742.6122.612.619.574.352.613.484.358.702.61;6.673.813.819.525.711.904.769.527.624.767.622.864.763.819.5212.38;3.513.515.269.657.894.391.7524.567.896.141.754.392.632.6311.401.75;5.774.814.817.696.732.882.8810.582.882.887.696.737.694.814.8115.38;3.425.139.406.8411.975.133.4223.932.566.842.562.567.693.421.712.56;1.981.983.966.933.962.972.978.911.98.998.918.916.934.957.9224.75;9.355.612.8010.287.485.615.616.548.417.482.805.613.748.419.35.00;2.685.364.4611.6115.181.79.8916.963.576.253.574.462.687.147.145.36;5.502.752.756.426.427.344.5913.764.595.506.426.42.9210.096.428.26;5.417.217.217.2110.811.805.4115.323.604.502.707.217.216.316.31.90;7.844.90.988.824.90.982.947.842.943.929.806.867.843.926.8617.65;5.834.853.889.717.773.881.946.803.882.913.889.716.806.808.7411.65;4.763.811.9012.388.575.71.006.675.713.8110.4810.483.818.579.522.86;3.882.912.9110.685.83.976.805.835.835.839.713.884.855.8311.6510.68;3.429.405.983.4210.261.714.2727.355.133.424.273.422.566.841.715.98;8.495.664.728.494.728.492.836.6011.321.899.435.662.839.434.723.77;3.457.764.314.3110.34.863.4527.591.726.038.623.454.315.171.726.03];ddd=[1.773.542.65.88.00.007.96.884.422.6517.7010.623.544.424.427.081.773.5413.277.08;1.921.92.96.96.00.961.92.964.8112.507.6911.548.653.853.856.739.626.737.692.88;.98.00.005.88.988.822.94.00.002.9410.785.8813.73.004.903.9219.611.968.825.88;.00.00.00.87.00.8713.041.746.092.6111.3013.043.485.223.488.703.481.7414.787.83;2.86.00.003.81.953.813.81.003.813.819.529.5212.382.869.523.817.622.867.629.52;.00.00.882.63.001.7513.16.884.391.7514.049.657.025.264.3911.402.631.7510.536.14;1.92.00.002.88.964.812.88.001.924.8112.506.7313.461.926.734.8110.583.859.627.69;2.563.42.00.85.85.8512.82.851.71.8520.512.563.429.405.9811.11.854.2711.973.42;.00.00.002.972.979.902.97.00.993.966.931.9813.861.982.973.9623.762.978.916.93;1.87.933.742.80.00.002.80.007.488.419.357.483.7414.9512.15.002.804.677.487.48;.00.89.00.00.001.798.04.005.364.4615.188.048.934.463.578.044.466.2513.395.36;2.75.00.922.75.923.674.593.673.671.839.175.503.675.506.427.348.265.5011.019.17;1.80.90.90.90.00.909.01.003.607.2114.418.117.216.317.214.501.807.2111.714.50;2.94.00.005.88.006.861.96.003.926.863.929.8013.73.985.882.9410.78.9810.789.80;2.911.942.911.94.005.831.94.001.949.715.838.7410.681.943.883.888.742.9111.6510.68;2.86.95.0011.431.901.902.86.004.763.815.718.578.576.679.524.765.712.867.627.62;1.94.971.944.851.943.881.94.97.976.804.858.7410.682.916.802.919.716.808.747.77;1.71.851.71.85.852.5616.24.851.71.8516.245.136.845.983.4211.111.715.1311.113.42;.94.941.89.94.94.941.89.9410.387.555.669.438.498.497.555.666.6011.326.60.94;.86.86.001.72.86.8617.24.862.591.7215.527.765.173.454.319.485.175.179.485.17];x=[29.7317.1213.5139.6443.24;27.0316.2215.3241.4442.34;27.0321.626.3145.0533.33;42.3410.8128.8318.0271.17;23.4223.4210.8142.3434.23;35.1412.6112.6139.6447.75;35.149.9118.9236.0454.05;27.9316.2218.9236.9446.85;20.7220.7215.3243.2436.04;18.1827.2713.6440.9131.82;;35.454.5550.0010.0085.45;32.732.7350.0014.5582.73;25.4510.0051.8212.7377.27;30.008.1850.0011.8280.00;29.09.0064.556.3693.64;36.368.1846.369.0982.73;35.4524.5526.3613.6461.82;29.0911.8250.009.0979.09;21.8214.5556.367.2778.18;20.0017.2756.366.3676.36];xx=[9.019.013.608.114.50.904.503.603.603.601.808.1111.712.705.4118.92;9.917.213.605.412.701.805.415.414.501.80.909.019.914.505.4121.62;5.4111.713.605.412.701.80.90.905.41.90.9014.4113.51.907.2123.42;18.925.4111.715.4110.811.805.4110.815.411.80.902.706.314.502.704.50;6.318.111.807.211.802.702.703.605.414.502.7010.819.91.909.0121.62;15.322.706.319.913.601.801.805.414.50.00.008.1110.81.908.1119.82;15.321.8010.817.214.502.706.315.41.901.80.906.3113.51.904.5016.22;8.113.606.319.915.413.602.707.212.703.601.808.1110.811.807.2116.22;9.01.904.506.31.003.607.214.503.602.702.7011.717.213.6013.5118.02;6.363.641.826.361.825.452.733.645.453.644.5513.644.553.6413.6418.18;15.452.7314.552.7316.36.911.8230.00.91.91.911.822.734.55.002.73;13.64.9110.916.3615.451.821.8230.91.91.91.00.912.737.27.004.55;6.364.5510.004.5512.731.822.7334.552.732.731.821.823.644.551.822.73;8.18.9112.737.2713.646.361.

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论