版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、 - 主成分分析 PCA降维的必要性多重共线性预测变量之间相互关联。多重共线性会导致解空间的不稳定,从可能导致结果的不连贯。维空间本具有稀疏性。维正态分布有68%的值落于正负标准差之间,在维空间上只有0.02%。过多的变量会妨碍查找规律的建。仅在变量层上分析可能会忽略变量之间的潜在联系。例如个预测变量可能落仅反映数据某特征的个组内。降维的的:减少预测变量的个数确保这些变量是相互独的提供个框架来解释结果降维的法有:主成分分析、因分析、户定义复合等。PCA(Principal ComponentAnalysis)不仅仅是对维数据进降维,更重要的是经过降维去除了噪声,发现了数据中的模式。PCA把原先
2、的n个特征数更少的个特征取代,新特征是旧特征的线性组合,这些线性组合最化样本差,尽量使新的个特征互不相关。从旧特征到新特征的映射捕获数据中的固有变异性。预备知识样本X和样本Y的协差(Covariance):协差矩阵的最特征向量总是指向数据最差的向,并且该向量的幅度等于相应的特征值。第特征向量总是正交于最特征向量,并指向第数据的传播向。协差的matlab计算公式为:协差(i,j)=(第i列所有元素第i列均值)*(第j列所有元素第j列均值)/(样本数)a =-1-24130213for i=1:size(a,2)for j=1:size(a,2)c(i,j)=sum(a(:,i)-mean(a(:
3、,i).*(a(:,j)-mean(a(:,j)/(size(a,1)-1);endendc =10.3333 -4.1667 3.0000-4.1667 2.3333 -1.50003.0000 -1.5000 1.0000协差为正时说明X和Y是正相关关系,协差为负时X和Y是负相关关系,协差为0时X和Y相互独。Cov(X,X)就是X的差(Variance).当样本是n维数据时,它们的协差实际上是协差矩阵(对称阵),阵的边长是 。如对于3维数据,计算它的协差就是:若,则称 是A的特征值,X是对应的特征向量。实际上可以这样理解:矩阵A作在它的特征向量X上,仅仅使得X的长度发了变化,缩放例就是相应
4、的特征值 。当A是n阶可逆矩阵时,A与PAp相似,相似矩阵具有相同的特征值。特别地,当A是对称矩阵时,A的奇异值等于A的特征值,存在正交矩阵(T),使得:对A进就能求出所有特征值和矩阵。是由特征值组成的对矩阵由特征值和特征向量的定义知,的列向量就是A的特征向量。Jama包包是于基本线性代数运算的java包,提供矩阵的cholesky分解、LUD分解、QR分解、奇异值分解,以及PCA中要到的特征值分解,此外可以计算矩阵的乘除法、矩阵的范数和条件数、解线性程组等。PCA过程特征中化。即每维的数据都减去该维的均值。这的维指的就是个特征(或属性),变换之后每维的均值都变成了0。很多数据挖掘的教材上都会
5、讲到鹫尾花的例,本就拿它来做计算。原始数据是1504的矩阵A:5.14.94.74.65.05.44.65.04.44.95.44.84.84.35.85.75.45.15.75.15.45.14.65.14.85.05.05.25.24.74.85.45.25.54.95.05.54.94.45.15.04.54.45.05.14.85.14.65.35.07.06.46.95.56.55.76.33.53.03.23.13.63.93.43.42.93.13.73.43.03.04.04.43.93.53.83.83.43.73.63.33.43.03.43.53.43.23.13.44.
6、14.23.13.23.53.13.03.43.52.33.23.53.83.03.83.23.73.33.23.23.12.32.82.83.31.41.41.31.51.41.71.41.51.41.51.51.61.41.11.21.51.31.41.71.51.71.51.01.71.91.61.61.51.41.61.61.51.51.41.51.21.31.51.31.51.31.31.31.61.91.41.61.41.51.44.74.54.94.04.64.54.70.20.20.20.20.20.40.30.20.20.10.20.20.10.10.20.40.40.30.
7、30.30.20.40.20.50.20.20.40.20.20.20.20.40.10.20.10.20.20.10.20.20.30.30.20.60.40.30.20.20.20.21.41.51.51.31.51.31.64.96.65.25.05.96.06.15.66.75.65.86.25.65.96.16.36.16.46.66.86.76.05.75.55.55.86.05.46.06.76.35.65.55.56.15.85.05.65.75.76.25.15.76.35.87.16.36.57.64.97.36.77.26.56.46.85.75.86.46.57.77.
8、76.06.95.67.76.36.77.26.26.16.47.27.47.96.46.36.17.76.36.46.06.96.72.42.92.72.03.02.22.92.93.13.02.72.22.53.22.82.52.82.93.02.83.02.92.62.42.42.72.73.03.43.12.33.02.52.63.02.62.32.73.02.92.92.52.83.32.73.02.93.03.02.52.92.53.63.22.73.02.52.83.23.03.82.62.23.22.82.82.73.33.22.83.02.83.02.83.82.82.82.
9、63.03.43.13.03.13.13.34.63.93.54.24.04.73.64.44.54.14.53.94.84.04.94.74.34.44.85.04.53.53.83.73.95.14.54.54.74.44.14.04.44.64.03.34.24.24.24.33.04.16.05.15.95.65.86.64.56.35.86.15.15.35.55.05.15.35.56.76.95.05.74.96.74.95.76.04.84.95.65.86.16.45.65.15.66.15.65.54.85.45.61.01.31.41.01.51.01.41.31.41.
10、51.01.51.11.81.31.51.21.31.41.41.71.51.01.11.01.21.61.51.61.51.31.31.31.21.41.21.01.31.21.31.31.11.32.51.92.11.82.22.11.71.81.82.52.01.92.12.02.42.31.82.22.31.52.32.02.01.82.11.81.81.82.11.61.92.02.21.51.42.32.41.81.82.12.46.95.86.86.76.76.36.56.25.93.12.73.23.33.02.53.03.43.05.15.15.95.75.25.05.25.
11、45.12.31.92.32.52.31.92.02.31.8每列减去该列均值后,得到矩阵B:0.7433330.9433331.143331.243330.8433330.4433331.243330.8433331.443330.9433330.4433331.043331.043331.543330.04333330.1433330.4433330.7433330.1433330.7433330.4433330.7433331.243330.7433331.043330.8433330.8433330.6433330.6433331.143331.043330.4433330.64333
12、30.3433330.9433330.8433330.3433330.9433331.443330.7433330.8433331.343331.443330.8433330.7433331.043330.7433331.243330.5433330.8433330.446-0.0540.1460.0460.5460.8460.3460.346-0.1540.0460.6460.346-0.054-0.0540.9461.3460.8460.4460.7460.7460.3460.6460.5460.2460.346-0.0540.3460.4460.3460.1460.0460.3461.0
13、461.1460.0460.1460.4460.046-0.0540.3460.446-0.7540.1460.4460.746-0.0540.7460.1460.6460.246-2.35867-2.35867-2.45867-2.25867-2.35867-2.05867-2.35867-2.25867-2.35867-2.25867-2.25867-2.15867-2.35867-2.65867-2.55867-2.25867-2.45867-2.35867-2.05867-2.25867-2.05867-2.25867-2.75867-2.05867-1.85867-2.15867-2
14、.15867-2.25867-2.35867-2.15867-2.15867-2.25867-2.25867-2.35867-2.25867-2.55867-2.45867-2.25867-2.45867-2.25867-2.45867-2.45867-2.45867-2.15867-1.85867-2.35867-2.15867-2.35867-2.25867-2.358670.9413330.741333-0.998667-0.998667-0.998667-0.998667-0.998667-0.798667-0.898667-0.998667-0.998667-1.09867-0.99
15、8667-0.998667-1.09867-1.09867-0.998667-0.798667-0.798667-0.898667-0.898667-0.898667-0.998667-0.798667-0.998667-0.698667-0.998667-0.998667-0.798667-0.998667-0.998667-0.998667-0.998667-0.798667-1.09867-0.998667-1.09867-0.998667-0.998667-1.09867-0.998667-0.998667-0.898667-0.898667-0.998667-0.598667-0.7
16、98667-0.898667-0.998667-0.998667-0.998667-0.9986670.2013331.15667 0.1460.5566670.1460.3013331.05667 0.0461.14133 0.3013330.1013330.3433330.6566670.1433330.4566670.9433330.7566670.6433330.8433330.05666670.1566670.2566670.2433330.8566670.2433330.04333330.3566670.2433330.05666670.256667-0.7540.2413330.
17、8413330.7413330.941333-0.4586670.8413330.141333-0.2586670.4413330.2413330.941333-0.1586670.6413330.7413330.3413330.7413330.141333-0.254-0.2540.2460.3013330.1013330.401333-0.1986670.1013330.201333-0.1986670.301333-0.1986670.2013330.1013330.2013330.301333-0.1986670.301333-0.0986667-0.654-0.154-0.354-1
18、.054-0.054-0.854-0.154-0.1540.046-0.054-0.354-0.854-0.5540.1461.04133 0.6013330.241333 0.101333-0.2540.4566670.2566670.5566670.7566670.9566670.8566670.1566670.1433330.3433330.3433330.04333330.1566670.4433330.1566670.8566670.4566670.2433330.3433330.3433330.2566670.04333330.8433330.2433330.1433330.143
19、3330.3566670.7433330.1433330.4566670.0433333-0.554-0.254-0.154-0.054-0.254-0.054-0.154-0.454-0.654-0.654-0.354-0.354-0.0540.3461.14133 0.3013330.9413330.5413330.6413330.001333330.1013330.2013331.04133 0.2013331.24133 0.5013330.741333-0.2586670.0413333-0.05866670.1413330.301333-0.198667-0.0986667-0.1
20、986670.001333331.34133 0.4013330.3013330.7413330.7413330.9413330.6413330.3413330.2413330.6413330.8413330.241333-0.4586670.4413330.4413330.4413330.541333-0.7586670.3413330.4013330.3013330.1013330.1013330.1013330.001333330.2013330.00133333-0.1986670.1013330.001333330.1013330.101333-0.09866670.1013330.
21、046-0.754-0.054-0.554-0.454-0.054-0.454-0.754-0.354-0.054-0.154-0.154-0.554-0.2540.2462.24133 1.301331.34133 0.7013332.14133 0.9013331.84133 0.6013332.04133 1.001332.84133 0.901333-0.3541.25667 -0.0540.4566670.656667-0.154-0.0541.75667 -0.0540.943333-0.5540.7413330.5013331.45667 -0.1542.54133 0.6013
22、332.04133 0.6013332.34133 1.301331.34133 0.8013331.54133 0.7013331.74133 0.9013331.24133 0.8013331.34133 1.201331.54133 1.101331.74133 0.6013332.94133 1.001333.14133 1.101331.24133 0.3013331.94133 1.101331.14133 0.8013332.94133 0.8013331.14133 0.6013331.94133 0.9013332.24133 0.6013331.04133 0.601333
23、1.14133 0.6013331.84133 0.9013332.04133 0.4013332.34133 0.7013332.64133 0.8013331.84133 1.001331.34133 0.3013331.84133 0.2013332.34133 1.101331.84133 1.201331.74133 0.6013331.04133 0.6013331.64133 0.9013331.84133 1.201331.34133 1.101331.34133 0.7013332.14133 1.101331.94133 1.301331.44133 1.101331.24
24、133 0.7013331.44133 0.8013331.64133 1.101331.34133 0.6013330.856667-0.5541.35667 0.5460.6566670.5566670.9566670.1433330.04333330.5566670.6566670.146-0.354-0.054-0.554-0.2540.146-0.0541.85667 0.7461.85667 -0.4540.156667-0.8541.05667 0.1460.243333-0.2541.85667 -0.2540.4566670.856667-0.3540.2461.35667
25、0.1460.3566670.2566670.556667-0.254-0.054-0.2541.35667 -0.0541.55667 -0.2542.05667 0.7460.5566670.4566670.256667-0.254-0.254-0.4541.85667 -0.0540.4566670.5566670.1566670.3460.046-0.0541.05667 0.0460.8566670.0461.05667 0.0460.04333330.9566670.8566670.8566670.4566670.6566670.3566670.0566667-0.3540.146
26、0.246-0.054-0.554-0.0540.346-0.0542.计算B的协差矩阵C:查阅matlab help;cov(A)即可:0.685694-0.03926851.27368 0.5169040.03926851.27368 -0.3217130.516904 -0.1179810.188004-0.321713-0.1179813.11318 1.296391.29639 0.5824143.计算协差矩阵的特征值和特征向量。查阅matlab help可以知道,利eig函数可以快速求解矩阵的特征值与特征向量。格式:V,D = eig(A)说明:其中D为特征值构成的对阵,每个特征值
27、对应于V矩阵中列向量(也正是其特征向量),如果只有个返回变量,则得到该矩阵特征值构成的列向量。C=V*S*V-1S=4.22484140 0 00 0.242244370 00 0 0.078524387 00 0 0 0.023681839V=0.361589190.656543820.581003040.31723640.082268924 0.72970845 0.596429220 -0.32408270.17576972.072535217 -0.479716430.35884438 -0.074704743 0.54904125 0.751134894.选取的特征值对应的特征向量,
28、得到新的数据集。特征值是由到排列的,前两个特征值的和已经超过了所有特征值之和的97%。我们取前两个特征值对应的特征向量,得到个42的矩阵M。令A=AM,这样我们就把1504的数据A集映射成了1502的数据集A,特征由个减到了个。A=2.82713352.79595012.62152132.76490372.78274773.23144322.69045022.88485872.62338242.8374965.64133455.14517155.17738145.00360225.6486516.06250925.23262135.48513234.74392885.20803595.9666
29、6245.33624665.08698764.81144666.50092336.65948056.1328173.00481372.8981982.72390672.28614052.8677973.1274712.88881432.86301793.31226242.92399453.20080882.96810582.29548313.20821223.15516973.00342343.04228482.94894962.87151932.87849292.92287873.10126322.86370382.91418092.8374965.6338646.19397195.8351
30、9965.71259595.75475835.45634135.42025055.28351565.17567195.45261445.68941195.6340185.12465055.1173345.73280896.13470756.41474795.20803595.39192155.9215292.64434082.88611192.8374965.20803594.83447665.55078675.58578664.38186464.98041835.51064435.75742125.07204675.82509315.09415015.9010082.52949832.921
31、01762.74120182.65912992.51304453.10582673.30250772.79567562.97376722.67101962.96865472.80742836.79613496.44375146.97540175.69230826.59847516.15177766.60656444.7598745.42973846.00016955.63392665.81891984.48912545.39012074.89740355.59861874.31362025.54368684.59415214.05223725.21244394.76683795.1903675
32、5.06291275.78296644.96274994.98280644.77290944.73323945.23051245.24790594.98716845.13233765.4651096.55463825.50115115.00025496.02243895.77367646.49538535.33647696.43891346.17093385.74588136.45370255.55458726.62758175.86812726.80780956.43184336.22534876.41098136.84238187.06873686.32379645.2040065.644
33、34125.55940035.58212235.15239664.9496435.4409984.61219114.63723865.00301944.89352264.83144115.50978035.7227655.31945645.64633576.89007796.0986166.31854636.73176946.32420845.75653615.67585445.97437196.40150125.74021984.80425985.8668744.94405265.04799874.63506714.64520055.28091534.91247164.30630374.81
34、150925.10354665.02310535.33380024.56316024.96771145.30288384.73980245.65666525.13360165.27284545.86189834.1233745.84246785.88657916.15303094.60287775.80914888.04306816.92541338.12782527.48215587.86109898.90822036.03072478.44334547.83101348.42947497.17327587.31368137.67671966.85593547.09660867.416084
35、67.46058959.00010579.30602736.80967077.9395085.66710665.06918186.09510885.55676685.09857475.53000994.53831284.77542095.43354715.35545826.4862725.56799744.55371585.69151114.70914795.77150455.11069875.64811415.87309575.13559225.19830255.10387375.77724895.68747366.68525265.09640326.70943869.01060576.89
36、900917.78719448.12553426.76896616.80201067.63419497.89890478.35230138.7436837.67007936.95444337.29098098.5878625.1709274.81326226.00049665.4536337.65632797.41620376.68019447.61899447.82564437.43379166.92541338.07466357.93073227.45535797.03700457.27538677.41297026.90100715.36277465.15022515.68621215.
37、4973385.72400214.73980245.59070285.61823225.50214554.93970965.39324825.4306035.0318398每个样本正好是维的,画在平坐标系中如图:鹫尾花数据集共分为类花(前个样本为类,中间个样本为类,后个样本为类),从上图可以看到把数据集映射到维后分类会更容易进,直观上看已经是线性可分的了,下我们对其进聚类。当然我们已知了有类,所以在设计SOFM络时,我把竞争层节点数设为,此时的聚类结果是前个样本聚为类,后个样本聚为类。当把竞争层节点数改为时,仅第类中的个样本被误分到了第类中,整体精度达98%!#include#include#
38、include#include#include#include#includeusing namespacestd;const int sample_num=;/鹫尾花样本个数/指定聚类的数const int class_num=;int iteration_ceil;/迭代的上限vectorpair flowers(sample_num);/样本数据vectorvector weight(class_num); /权向量const doubleprime_eta=0.7;/初始学习率/*向量模长归化*/void normalize(vector &vec)doublesum=0.0;fori
39、nt i=;ivec.size();+i)sum+=pow(veci, );sum=sqrt(sum);forint i=;ivec.size();+i)veci/=sum;/*从件读鹫尾花样本数据*/void init_sample(string filename)ifstream ifs(filename.c_str();if(!ifs)cerr open data file failed.endl;exit( );forint i=;isample_num;+i)vector X();ifsX X;normalize(X);/输向量模长归化flowersi=make_pair(X ,X)
40、;ifs.close();/*初始化权值*/void init_weight()srand(time();forint i=;iweight.size();+i)vector ele();ele=rand()/()RAND_MAX;ele=rand()/()RAND_MAX;normalize(ele);weighti=ele;/权值向量模长归化/*根据输,选择获胜者*/int pick_winner(doublex1,doublex2)int rect=-;doublemax=0.0;forint i=;imax)max=product;rect=i;returnrect;int main(int argc,char*argv)coutinput iteration countcount;coutinput data file namefilename;iteration_ceil=count*s
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 离职创业或再就业协议书
- 房租约合同免半年租的例子
- 2.2建筑立面图识读-项目二建筑装饰施工图认02课件讲解
- 《市场秩序法实务》课件
- 《寄语高校辅导员》课件
- 银保期交网沙课件
- 固定资产教学课件
- 《动漫文化对比》课件
- 复分解反应课件
- 社会养老与医疗保险专题
- 产品研发合伙人合作协议书
- 各地最 新作文展播40之13 话题:“超越他人与超越自我”( 高三第二次联合测评)
- 2024年度货物运输安全管理协议范例版B版
- 肝硬化腹水的治疗原则
- 高压输电线路质量、检查、验收培训课件
- Unit 6 Meet my family 说课(说课稿)-2024-2025学年人教PEP版英语四年级上册
- 《电焊工培训基础》课件
- 住宅小区乔木修剪方案
- 2024公共数据授权运营实施方案
- 天津市部分区2024-2025学年九年级上学期11月期中数学试题
- 地 理知识点-2024-2025学年七年级地理上学期(人教版2024)
评论
0/150
提交评论