R语言命令Tutorial-更新后,CCA ,RDA,PCA, heatmap.docx_第1页
R语言命令Tutorial-更新后,CCA ,RDA,PCA, heatmap.docx_第2页
R语言命令Tutorial-更新后,CCA ,RDA,PCA, heatmap.docx_第3页
R语言命令Tutorial-更新后,CCA ,RDA,PCA, heatmap.docx_第4页
R语言命令Tutorial-更新后,CCA ,RDA,PCA, heatmap.docx_第5页
已阅读5页,还剩14页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

一、GeoChip 数据处理1 准备数据 登录数据库,用户名iegjianqiang,PW:ieg123? 选择GeoChip4数据,再次输入用户密码; 点击Prepare microarray data,点击选择要分析的数据,点击submit,勾选“Remove the spots SNR less than 2”,此即为SNR数据;若勾选“Adjust SNR according to Thermophile probes less than 5%”,此即为Thermo数据; 勾选“by dividing the mean of each sample”,即为DBM数据,若不勾选,则为relative abundance(RA)数据; 这样就有四套数据分别为SNR DBM、SNR RA、Thermo DBM、Thermo RA,选择好要下载的数据后,输入“Experimental Name:”,点击submit,点击“set it as default”点击go,随后点击main回到主界面,点击“Analyze microarray data”在跳出的对话框中点cancel,点击如下图所示项目后,点击submit;在show information框中选择除了“All targets”外的所有项,点击submit;点击download后,待所有数据出来后另存为文本文件,这样就准备好一套数据,将所有4套数据都如此下载好。 数据下载后,在excel中去除各样品重复中只有一个重复有检测到基因信号的数据,即为cut 1; Relative abundance数据需要将各基因信号值分别除以该样品中所有基因信号值得和,再乘以各个样品基因信号值和的平均值,即data1/sum1*average。这样即得到Relative abundance数据 Relative abundance数据继续做两种处理,一是将数据+1后取ln,一是将数据除以1000;这样总共是6套数据,将所有数据中0值替换为空白,同时只留下gene ID和genename两项,另存为tab delimited txt文件,即可用于DCA(Detrended Correspondence Analysis)、Dissimilarity Test、cluster(A simple hierarchical clustering analysis)分析;2 数据预分析2.1 DCA分析 在数据分析界面点击以下项后,上传刚刚准备的数据,即可做DCA分析,结果可获得DCA图及DCA数据,可拷贝出数据自行作图;2.2 Dissimilarity Test点击后,上传数据,选择需要比较的样品,即可做MRPP、anosim、adonis比较,记录distance和sig值;2.3 cluster分析 点击,将数据按各样品取平均值后上传分析,即可得cluster图。 比较以上6套数据分析结果,选择最好的一套进行数据处理,目前一般SNR RA ln的数据最好,可只比较其与Thermo RA ln之间的差异。3 数据处理3.1 Diversity and summary3.1.1 alpha-diversity 点击,上传用于做DCA的数据,即可得到alpha-diversity;3.1.2 beta-diversity 点击,上传用于做DCA的数据,即可得到beta-diversity;3.1.3 Summary gene categories点击,上传数据,此时只用relative abundance的数据,不要取ln,若保留genename则只统计gene水平,保留各个不同级别的category,则统计为各个category的加和;数据上传后选择show intensity,将所得的结果保存;求出各个样品的平均值,取ln去掉0值后保存为txt文件,即可用于genecluster分析;3.1.4 Summary phylogenetic点击,同上但只保留Organism或Lineage即可对phylogenetic基因分布进行统计;3.2 Gene cluster 将3.1.3得到文件进行GeneCluster分析,打开GeneCluster软件,点击“load files”载入文件,如下图勾选计算后可达到3个文件cdt、atr、gtr,atr为每列信号和,gtr为每行信号和; 用GeneTreeView打开cdt文件即可看到图,save tree image保存树图,save zoomed image保存信号图,其中红色方块深浅代表信号值,即该基因丰度,若采用category数据,则表示某类基因丰度,将树图和该图用photoshop拼合即可。3.3 Response ratio点击,上传与3.1.3相同数据,gene水平和category水平数据均可3.4 环境因子拟合采用3.1.3中基因水平或category水平的相对丰度,计算标准差,或采用total intensity,直接与环境因子拟合,correlation test相关性分析即可,excel即可做。3.5 CCA using R 打开R软件,输入All=read.table(D:/桌面/SNR RA ln for cca.csv,header=T, sep=,),输入准确的路径名,以及要读入的数据文件名,采用csv格式数据,Relative abundance取ln的数据,空白值替换为0,不要genename等,格式如下:dim(All)T.All-t(All)dim(T.All),这三个命令是将数据转置为样品在列; All.group=read.table(D:/桌面/Chemical property.csv,header=T, sep=,)Y=T.AllX=All.group,读入理化参数数据,格式如下行为参数值,列为样品名,参数不要用符号空格。 library(vegan),调用vegan程序包; 输入,A=cca(Y pH+TOC+TN+TC+TS+CEC+NH4+AK, data=X),即所有的参数名字,即开始计算CCA. 输入summary.cca(A),看计算结果,可导出自行作图,用“Site scores (weighted averages of species scores)”数据样品点,“Biplot scores for constraining variables”做环境因子; 输入vif.cca(A),看各参数的vif值,10表示该参数为共同影响,一般选择20的参数; 输入plot(A,dis=c(wa,cn)输出CCA排序图; 输入anova.cca(A, by = term, step = 200)计算方差,各环境因子显著性检验;输入anova(A,by=axis,perm=1000),看各轴解释量;再次输入A=cca(Y pH+TOC+TN+TC+TS+CEC+NH4+AK, data=X),可得以下结果:解释量即为某轴值除以Total值,如高亮部分,其解释量为12.81% Inertia Proportion RankTotal 0.3537 1.0000 Constrained 0.1336 0.3778 8Unconstrained 0.2201 0.6222 30Inertia is mean squared contingency coefficient 414 species (variables) deleted due to missingnessEigenvalues for constrained axes: CCA1 CCA2 CCA3 CCA4 CCA5 CCA6 CCA7 CCA8 0.045309 0.034580 0.013917 0.009750 0.009280 0.008328 0.007542 0.004894 Eigenvalues for unconstrained axes: CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8 0.036623 0.013366 0.011339 0.010286 0.009655 0.009085 0.008615 0.007996 (Showed only 8 of all 30 unconstrained eigenvalues) 输入bioenv(Y pH+TOC+TN+TC+TS+CEC+NH4+AK, data=X),该项也可在网站上进行分析,可得出矩阵,各个因子,上传DCA数据和环境因子数据。Best Subset of Environmental Variables with Maximum (Rank) Correlation with Community Dissimilarities3.6 Partial CCA输入cca(Y CEC+Condition(pH+TOC+TN+TC+TS+NH4+AK), data=X),可得CEC的部分解释量cca(Y pH+ TOC+Condition(CEC+ TN+TC+TS+NH4+AK), data=X),则得到pH和TOC的解释量,最终做出Variation partitioning analysis图,即VPA分析。3.7 Partial Mantel TestTo calculate the correlation between microarray data and environmental factor(s)点击,上传做DCA的数据,再上传做CCA的环境因子数据,即可针对不同基因和不同基因category做mantel test,若只选择几个环境因子数据,则为Partial Mantel Test。结果如下:r为相关系数,significance3.8 MRTMultivariate Regression Tree读入数据与CCA分析相同,先读基因数据All=read.table(D:/桌面/SNR RA ln for cca.csv,header=T, sep=,)dim(All)T.All-t(All)dim(T.All),这三个命令是将数据转置为样品在列;再读treatment数据All.group=read.table(D:/桌面/treatment.csv,header=T, sep=,)Y=T.AllX=All.groupMRT=rpart(Y age+plant+landusing,data=X),加亮为处理名字plot(MRT)text(MRT)treatment数据如下图排列所得结果中字母a b即按照处理中字母顺序排列,如Ancient 和 Modern,前者为a,后者为b。画图前应将绘图窗口放大。二、Vegan tutorCommon commandswrite.table(shan,quote=FALSE,sep=t,file=shan.txt)x11(1),打开一个绘图窗口dev.list(),看有几个设备dev.set(2), 改变当前设备为21 DCA using R选用做DCA的数据,只保留geneID即可。library(vegan)x-read.csv(file.choose(), s=1)xis.na(x)-0dim(x)X-t(x)dim(X)X.dca-decorana(X)summary(X.dca)X.cca = cca(X)total.eigen = X.cca$tot.chidca1.percent = round(X.dca$evals1/total.eigen,digits=3)*100dca2.percent = round(X.dca$evals2/total.eigen,digits=3)*100x.label =paste(DCA1 ,dca1.percent, %,sep=)y.label =paste(DCA2 ,dca2.percent,%,sep=)jpeg(file=1304744832.jpeg)plot(X.dca,dis=site,type=text,xlab=x.label,ylab=y.label)points(X.dca, pch=21, col=red, bg=red, cex=0.5) dev.off()2 Diversity using R2.1 a-diversity数据表格同3.9 DCA分析library(vegan)X-read.csv(file.choose(), s=1)Xis.na(X)-0dim(X)spe-t(X)dim(spe) 转置,数据量大,excel无法转置X-read.csv(Chem property.csv,header=T,s=1,sep=,)dim(env)Shannon指数:diversity(spe, index = shannon, MARGIN = 1, base = exp(1) 或者:shan-diversity(spe)Simpson指数:simp - diversity(spe, simpson)invsimp - diversity(spe, inv)pairs(cbind(shan, simp, invsimp), pch=+, col=blue, type=text)Species richness:S - specnumber(spe)Pielous evenness:J - shan/log(S)2.2 b-diversitylibrary(MASS)Non-metric Multidimensional scalingBrayCurtis:bray-vegdist(spe),vegdist的默认index是braycurtiswrite.matrix(bray, file = , sep = ) 即可写出数据,拷贝出来vare.mds0-isoMDS(bray)stressplot(vare.mds0,bray)ordiplot(vare.mds0,type=t)Community dissimilaritiesrankindex(scale(env), spe, c(euc, man, bray, jac, kul) euclidean indice:euclidean - vegdist(decostand(A, norm), euclidean)euclidean - vegdist(decostand(spe, hell), euclidean)chao - vegdist(decostand(spe, norm), chao)vare.mds0-isoMDS(euclidean)stressplot(vare.mds0,euclidean)ordiplot(vare.mds0,type=t)3 PCA using R数据表格同DCA分析(上接DCA)vare.pca - rda(X)vare.pcasummary(vare.pca)x.label =paste(PCA1 ,10.61, %,sep=)y.label =paste(PCA2 ,3.69,%,sep=)plot(vare.pca, dis=site,type=text, xlab=x.label,ylab=y.label)points(vare.pca, pch=21, col=red, bg=red, cex=0.5)spe.h=decostand(spe,hellinger)spe.pca=rda(spe.h)plot(spe.pca, display = sites, scaling = 1, type = text)plot(spe.pca, display = sites, scaling = 1, type = text, choices = c(1,3)plot(spe.pca, display = sites, scaling = 1, type = n, choices = c(1, 2)sc - scores(spe.pca, display = sites, scaling = 1, choices = c(1, 2)my.colors - as.character(eger(group$group) + 1)points(sc, pch = 21, col = my.colors, bg = my.colors)Correspondence analysis (CA):vare.ca - cca(spe)vare.casummary(vare.ca)Correspondence analysis is based on Chi-squared distancechisq.test(spe/sum(spe)env.s=decostand(env,standardize)标准化环境因子4 Environmental interpretation using R4.1 Vector fitting using Rvare.mds - metaMDS(spe, trace = FALSE)ef - envfit(vare.mds, env, permu = 999)efplot(vare.mds, display = sites, type=text)plot(ef, p.max = 0.1) 该值为自己设定的p最大值,环境样品可为0.14.2 Surface fittingef - envfit(vare.mds NO3N + AP, env)plot(vare.mds, display = sites, type=text)plot(ef)tmp - with(env, ordisurf(vare.mds, NO3N, add = TRUE)with(env, ordisurf(vare.mds, AP, add = TRUE,+ col = green4)4.3 Factorsspe.ca - cca(spe)ef - envfit(spe.ca, env, permutations = 999)efplot(spe.ca, display = sites, type=text)plot(ef)4.4 Non-metric Multidimensional scalinglibrary(vegan)library(MASS)pyrospe=read.csv(file.choose(),s=1)pyrospe=t(pyrospe)pyrospe.dis - vegdist(pyrospe)pyrospe.mds0 - isoMDS(pyrospe.dis)Nmds maps observed community dissimilarities nonlinearly onto ordination space and it can handle nonlinear species responses of any shapestressplot(pyrospe.mds0, pyrospe.dis)ordiplot(pyrospe.mds0, type = t)pyrospe.mds - metaMDS(pyrospe, trace = FALSE)plot(pyrospe.mds, type = t)plot(pyrospe.mds, dis=site, type = t)5 Constrained ordination5.1 Model specification用符号,左边是群落数据,右边是限制方程vare.cca - cca(spe Nit+AMM+AP+pH+AK, env)vare.ccax.label =paste(CCA1 ,30, %,sep=)y.label =paste(CCA2 ,20,%,sep=)plot(vare.cca) or plot(vare.cca, dis=c(wa,cn) points(vare.cca, pch=21, col=red, bg=red, cex=0.5)library(scatterplot3d)ordiplot3d(vare.cca, type = h)pl - ordiplot3d(vare.cca, angle=15, type=n)points(pl, points, pch=16, col=red, cex = 0.7)text(pl, arrows, col=blue, pos=3)text(pl, points, col=blue, pos=1, cex = 1.2)5.2 Permutation testsanova(vare.cca)anova(vare.cca, by=term, step = 200) 计算方差,各环境因子显著性检验;anova(vare.cca, by = margin, perm = 500)anova(vare.cca,by=axis,perm=1000) 看各轴解释量6 Corelation test using RY-read.csv(correlation test.csv,s=1)or Y- read.csv(file.choose(), s=1)可选择文件attach(E3)将每列都设为一个变量,名字为每列的名字Y.cor-cor(nosZ,nirS, method = method)得出两个变量之间的相关系数,默认是pearson,还有kendall or spearmanE1.cor.test-cor.test(nosZ,nirS, method = method)显著性检验E3.cor.mat-cor(E3)计算各个变量之间的相关系数,生成相关系数矩阵symnum(Y.cor.mat)符号化相关程度plot(x=nosZ, y=nirS, xlab=nosZ, ylab=nirS, pch=21)画出散点图abline(lm(nirSnosZ)加一条拟合曲线7 Classification7.1 Cluster analysisdis - vegdist(A)或者dis - dist(X)clus - hclust(dis, single)plot(clus)cluc - hclust(dis, complete)plot(cluc)clua - hclust(dis, average)plot(clua)range(dis)cor(dis, cophenetic(clus)cor(dis, cophenetic(cluc)cor(dis, cophenetic(clua),计算各方法的异化性7.2 Display and interpretation of classesplot(cluc)rect.hclust(cluc, 3)grp - cutree(cluc, 3) 分为不同级别的clusterboxplot(Nit grp, data = env, notch = TRUE)ord - cca(dune)plot(ord, display = sites)ordihull(ord, grp, lty = 2, col = red) ordispider and ordiellipse 都可使用7.3 Classfied community tableswa - scores(ord, display = sites, choices = 1)den - as.dendrogram(clua)oden - reorder(den, wa, mean)op - par(mfrow = c(2, 1), mar = c(3, 5, 1, 2) +0.1)plot(den)plot(oden)par(op)8 Dissimilarities and environment8.1 adonis: Multivariate ANOVA based on dissimilaritiesbetad - betadiver(spe, z)adonis(betad Nit, env, perm = 200)adonis(betad Nit+AMM+AP+pH+TC+TN, env, perm = 200)8.2 Homogeneity of groups and beta diversitymod - with(env, betadisper(betad, Nit+AMM+AP+pH+TC+TN)plot(mod)boxplot(mod)anova(mod)permutest(mod)8.3 Mantel testdata(varespec)data(varechem)veg.dist - vegdist(varespec) # Bray-Curtisenv.dist - vegdist(scale(varechem), euclid)mantel(veg.dist, env.dist)mantel(veg.dist, env.dist, method=spear)8.4 Protest: Procrustes testpc - scores(pc, choices = 1:2)pro - protest(vare.mds, pc)plot(pro)pro8.5 anosimspe.dist - vegdist(spe) or spe.dist - vegdist(spe, method=euclidean)group=read.csv(file.choose(),s=1)如下图attach(group)spe.ano - anosim(spe.dist, usage)summary(spe.ano)plot(spe.ano)adonis(spe usage, data=group, permutations=99,method=bray)spe.mrpp - mrpp(spe, group$usage, distance=bray)9 Multivariate homogeneity of groups dispersions (variances) 用于检验重复中的outlier,此处spe用非相对丰度数据spe.h=decostand(spe,hellinger)spe.h.dist=vegdist(spe.h, euclidean)var.test - betadisper(spe.h.dist, group = group$group, type = median, bias.adjust = TRUE)boxplot(var.test)permutest(var.test)permutest(var.test, pairwise = TRUE)10 PcoA Principal Coordinates Analysislibrary(labdsv)data(spe)dis.bc - dsvdis(spe,bray/curtis)veg.pco - pco(dis.bc,k=4)plot(veg.pco, dis=site,type=text)三、Other applic

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论