![MVA-basics多元统计分析_第1页](http://file4.renrendoc.com/view14/M01/0B/31/wKhkGWdB8AiAZmokAABwlAg3IN0608.jpg)
![MVA-basics多元统计分析_第2页](http://file4.renrendoc.com/view14/M01/0B/31/wKhkGWdB8AiAZmokAABwlAg3IN06082.jpg)
![MVA-basics多元统计分析_第3页](http://file4.renrendoc.com/view14/M01/0B/31/wKhkGWdB8AiAZmokAABwlAg3IN06083.jpg)
![MVA-basics多元统计分析_第4页](http://file4.renrendoc.com/view14/M01/0B/31/wKhkGWdB8AiAZmokAABwlAg3IN06084.jpg)
![MVA-basics多元统计分析_第5页](http://file4.renrendoc.com/view14/M01/0B/31/wKhkGWdB8AiAZmokAABwlAg3IN06085.jpg)
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Multivariate
Statistical
AnalysisIf
we
obtain
analytical
data
on
two
groups
of
samples
which
wesuspect
may
be
different,
can
we
determine
the
followinginformation?¢
Are
the
groups
different?¢
Detect
those
compounds
which
have
increased
or
decreased
inconcentration
in
each
group.¢
Detect
those
compounds
which
are
missing
from
each
group
and
thosewhich
are
unique
to
each
group.¢
These
are
the
compounds
which
contribute
to
the
variance
betweenthe
groupsThis
is
an
imposing
PROBLEM
if
we
try
to
employ
traditionalmethods
of
spectral
comparison. Thus
we
need
to
employ
a
data-mining
technique
to
aid
us.SOMETHOUGHTSON“OMICS”
STUDIES1
2
51
50
1
75
20
0
22
52
5
02
75
3
00
32
5
35
03
7
54
00
4
25
45
0
47
55
0
05
25
5
50
57
5
60
06
2
56
50
6
75
70
0
72
57
5
07
75
8
00
82
5
85
08
7
59
00
9
25%1:TOFMSE1.93e55__DDaayy22__RRaatt__880405_Day2_Rat_81582(9.712)Cm(67:1617)100x21
2
51
50
1
75
20
0
22
52
5
02
75
3
00
32
5
35
03
7
54
00
4
25
45
0
47
55
0
05
25
5
50
57
5
60
06
2
56
50
6
75
70
0
72
57
5
07
75
8
00
82
5
85
08
7
59
00
9
25%55__DDaayy22__RRaatt__880405_Day2_Rat_81582(9.712)Cm(67:1617)1:TOFMSE1.93e1
0
0x
2A
Manual
ApproachThink
about
doing
a
rigorous
comparison
of
justthese
4
spectra
(~2,000
masses
or
metabolites
in
each).1
2
51
50
1
75
20
0
22
52
5
02
75
3
00
32
5
35
03
7
54
00
4
25
45
0
47
55
0
05
25
5
50
57
5
60
06
2
56
50
6
75
70
0
72
57
5
07
75
8
00
82
5
85
08
7
59
00
9
25%5_Day2_Rat_80405_Day2_Rat_81582(9.712)Cm(67:1617)1:TOFMSE1.93e1
0
0x
21
2
51
50
1
75
20
0
22
52
5
02
75
3
00
32
5
35
03
7
54
00
4
25
45
0
47
55
0
05
25
5
50
57
5
60
06
2
56
50
6
75
70
0
72
57
5
07
75
8
00
82
5
85
08
7
59
00
9
25%5_Day2_Rat_80405_Day2_Rat_81582(9.712)Cm(67:1617)1:TOFMSE1.93e1
0
0x
2¡Itispossibletominethedatausingmultivariatestatistics.UsingthisapproachweanalyzethegroupsusingGCorLC/MSandtabulatealltheobservedmassesandtheirchromatographicretentiontimeswithadvancedcomputationalmethods.Thesemass/retentiontimepairsbecomethevariablesusedforstatisticalanalysis.¡Multivariatestatisticsthenallowsustoreducethousandsvariables(mass/retentiontimepairs)downtoasimple2orthreedimensionalmapwhichshowsthatthegroupsaredifferentandprovidesuswithalistofthevariableswhichcontributetothedifference.A
SOLUTIONTO
THEPROBLEMWHAT
HAVE
WE
JUSTDONE?Latitude
35°
38‘
31.5“
NorthLongitude
139°
45‘
7.3“
EastAltitude12
MMULTIVARIATE
STATISTICALANALYSIS¡
Spectrum
(observation)becomes
a
point
in
PCAScores
plot¡
Variables(m/z_RT)shown
in
PCALoadings
PlotUsing
plots
together
allows
trends
in
the
sample
spectra
to
beinterpreted
in
terms
of
m/z632143_185.0493_213.043Why
not
:Hierarchical
ClusteringHeat
makeANOVAT
tests¡
With
MarkerLynx
it
is
possible
to
export
yourdata
to
any
statistical
program
you
like.WHY
CHOOSE
TO
USE
MULTIVARIATE
STATISTICS?WHY
USE
MULTIVARIATE
ANALYSIS?¡
Short
and
wide
data
setsFew
observations
(N)Many
variables
(K)Noisy
dataMissing
data/excluded
regionsMultiple
objectives¡
ImplicationsHigh
degree
of
correlation
(Many
variables
are
related)Difficult
to
analyse
with
conventional
methods¡
Require
methods
for
simplification
and
visualisation8KNPrinciple
Component
Analysis(PCA)A
multivariate
statistical
approach
thatfacilitates
the
identification
of
differencesorsimilarities
between
groupsData
tablevariable
spaceThe
whole
table
yields
a
swarm
ofpoints
in
variable
spacevar.3var.3Singleobjectinvariablespacevar.2var.2var.1var.1DATAPREPARATIONmeanvar.1Centering–movecentreofpointswarmtothevariableoriginvar.2var.3PRE-PROCESSING(CENTERING)CENTRING
&SCALINGvar.2var.3var.1Scaling–puteachvariableonanequalfootinge.g.makestandarddeviationsequal(nottheonlyway)var.1var.2var.3STEPBYPCA
THEORY
–STEPvar.1(i)var.2var.3ti1The
first
principal
component
(PC
)1is
set
to
describe
the
largest
variation
in
the
data,PC1
(t1p’1)
which
is
thesameas
the
direction
in
which
thepoints
spread
most
in
the
variable
spaceThe
Score
value
(ti1)
for
the
point
i
is
the
distancefrom
the
projection
of
the
point
on
the
1:stcomponent
to
the
origin.PC1
hence
is
the
first
latent
variable
in
a
newcoordinate
system
that
describes
the
variationin
the
data.STEPBYPCA
THEORY
–STEPvar.1var.2var.3PC1(i)PC2ti1ti2The
second
principal
component
(PC2)is
set
to
describe
the
largest
variation
in
the
data,Perpendicular
(orthogonal)
to
the
1:st
componentA
corresponding
loading
plot
describes
thevariables
relationshipsallows
interpretation
of
the
scores
plotbyshowing
which
variables
are
responsible
forsimilarities
and
differences
between
samples.TheperpendiculardistancefromtheobjecttotheprojectionontheplaneistheresidualofthetwoPCs.TwoPCsmakeaplane(window)intheK-dimensionalvariablespace.Thepointsareprojecteddownontotheplanewhichisliftedoutandviewedas
a
two
dimensional
plot.PCA
theory
–
step
by
step·,
=data
points;=
projection1x,1x22This
is
the
scores
plot
similarities
or
differencesbetweensamplescan
now
be
seen.x33PC1PC2SCORESPLOT
EXAMPLEShockcor
et
al,
2001,
Magnetic
Resonance
in
Chemistry,
39:559-565.THE
LOADINGSPLOTSThe
loading
(p)
is
described
as
the
cosine
ofthe
angle
between
the
original
variable
and
thePC.PC
2Loading(p):describedthevariationinthevariabledirectioni.e.similarity/dissimilaritybetweenvariables,andalsoexplainsthevariationinscores.Theloading(p)describestheoriginalvariablesimportanceforrespectivePC.ThisisthesameasthesimilarityindirectionbetweentheoriginalvariableandthePC.PC1Projection
of(rxt
,
m/xz
)px,1px,2I(rt1,
m/z1)sample
i1,22,2PC2p
=
cospi
2
=
1I(rt2,
m/z2)With
px,1
=
cos(
x,1)
and
px,2
=
cos
(x,2)and
x,1
:
anglebetween
axe
(rtx,
m/zx)
and
PC1and
x,2
:
anglebetween
axe
(rtx,
m/zx)
and
PC2var.12=
90ºPC11=
0ºLoadingsIflargestvariationcoincideswithvar.1,thvar.2 firstprincipalcomponentwillbeindirectionvar.1PC1scores=valuesofvar.1PC1loadings,p1=(1,0)Credit:
Henrik
Antti
/
Umea
UniversityExample
Loadings
Plot20INTERPRETATION
OF
PCAScoresObservations
(spectra)Trends,
patterns,groupsLoadingsVariables
(m/z)Correlation,
influencePC2PC1STRONGOUTLIERSCredit:
Henrik
Antti
/
Umea
University-2-123020]21
[t010t[1]THICKNES.M1
(PC),
UnTitled,
Work
setScores:
t[1]/t[2]4066
261489
4611112315484118358146
7565915315510
7
86
17
61
5
2
0
075
75859816400311910111995315311428352613226014896884521938721148119217757051265135778846713017375
9
96012608152961938128282799154321627171712174381691266913718416406248115021245464174329279357
6
3
1
17
12212
9 6184
4
53 4
761641253912812
170641184910520519681403
7450160313747273033
6925118391365103101
1849123541484
118
0Ellipse:
Hotelling
T2
(0.05)Simca-P
7.0
by
Umetri
AB
1998-08-17
12:09-4-22]
2[0t-7
-6
-5
-4
-3
-2
-1
0
1
245
673t[1]41
(PC),
UnTitled,
Work
setScores:
t[1]/t[2]GermanyFranceBelgiumPortugaAlustriaSwedenDenmarkFinlandNorwayItaly
SpainIreSlawLnuixtedmzbeorul
HollandEnglandEllipse:
Hotelling
T2
(0.05)Simca-P
7.0
by
Umetri
AB
1998-08-17
09:23STRONGDETOUTLIERECTIONHOTELLINGS–T2Hotelling"s
T2
is
a
multivariate
generalisation
of
Student"s
tEllipse
of
constant
T2
confidence
regionStrongoutliersNo
stroFOOnDS.Mgoutlier
sHotelling’s
ellipseCredit:
Henrik
Antti
/
UmeaUniversity¡Moderateoutliers detectedonresiduals(DmodX)plot¡Distributionofresiduals~Normal¡Ftestspecifiescut- offate.g.99%confidenceMODERATEOUTLIERSCredit:
Henrik
Antti
/
Umea
UniversityNo
moderate
outliers
Some
moderate
outliersCritical
distance
is
derived
from
the
distribution
of
residu0.200.401.001.200.000
1 2
3 4
5
67
8 9
10
11
12
13
14
15
16
17M
0.60
Do]3[
0.80XdFOODS.M1
(PC),
UnTitled,
Work
setDModX,
Comp
3(Cum)DCrit
(0.05)GermanyItalyFranceHollandLuxembouBelgiumEngPlaonrdtugalAustriaSwedenSwitzerlFinlandDenmark
Spain
NorwayIreland(Dcrit
[3]
=1.1598,
Absolute
distances,
Non
weightedresid01020406080100
120
140
160
180M
Do
2]
32[XdTHICKNES.M1
(PC),
UnTitled,
Work
setDModX,
Comp
2(Cum)247
14358
1522221322451111232028617
2730233436389444549116829
4
10
63D1Cri9t1(1090.05)505460
56263313433257
485626674436515861678047
556579672695737882
921008
0888
3
9110311196
11681
95
108118
12991802
1
2
111225163213353
64
7
47
78769884511101016213750
1231842813714615555 771276
8871909109499791
5
816115
91
613
693
104
119
130
1412141146725117925154716417410511412132711340114510411681562031761679117713
11914840451791
65
131135143
1
5
6
117713751883153
116166780
118824(Dcrit
[2]
=1.5130,
Normalized
distances,
Non
weightedresSimca-P
7.0
by
Umetri
AB
1998-08-17
13:39 Simca-P
7.0
by
Umetri
AB
1998-08-17
13:40OUTLIERS
–TOOLDMODERATEDETECTIONMODXCredit:
Henrik
Antti
/
UmeaUniversityy1y2u011tux1x3x2t(pca)t(pls-da)PLS-DAMetabonomic
spaceXDescribes
variation
in
NMR
dataDicriminant
spaceYDefining
the
known
classes0‘Inner
relation’Maximisingcorrelationbetweent
and
u¡SplitthedataTrainingset-buildthemodelTestset-validatethemodel¡Typicallyrequire>1/3dataintestset¡AllmodelparametersoptimisedontrainingsetE.ponents,variablesselectedetc.¡Goodnessoffitstatisticontest dataindicatespredictivequality ofthemodelMODEL(2)-VALIDATIONTRAIN/TEST12345678Training
setTest
set12345678Build
modelPredict
test
setMODEL
VALIDATION(3)
–
CROSS-¡
General
principVlAe:LIDATIONRemove
some
data
Build
model
on
remainingdataPredict
removed
data
Repeat
until
all
samplesremoved
once¡
Compute
predictions
&residuals
(eik)
foreach
sample
when
leftQ2¡
oCuatlculate
PRESS
andfrom
all
residuals¡
Can
do
this
for
X
or
Y12345678SamplesRoundsTraining
setTest
set13467852Predict
test
setBuild
model‘3-fold’
crossvalidatoin¡
R2
&
Q2
plot
fromSIMCA-P
software¡
R2
rises
with
eachcomponent¡
Q2
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年人事档案保管合同经典版(2篇)
- 2025年五金、交电、家电、化工产品购销合同参考模板(2篇)
- 2025年互联网站合作建立合同(2篇)
- 2025年代理记账委托合同样本(2篇)
- 2025年个人房屋维修服务合同简单版(4篇)
- 2025年个人车库车位租赁合同模板(2篇)
- 低温煤炭储存运输协议
- 奢侈品区装修合同范本
- 保健品办公室装修合同
- 博物馆渣土清理合同
- 高三开学收心班会课件
- 蒸汽换算计算表
- 人教版高中数学必修1全册导学案
- 四年级计算题大全(列竖式计算,可打印)
- 科技计划项目申报培训
- 591食堂不合格食品处置制度
- 国际金融课件(完整版)
- 导向标识系统设计(一)课件
- 220t锅炉课程设计 李学玉
- 全英文剧本 《剧院魅影》
- 北京城的中轴线PPT通用课件
评论
0/150
提交评论