edgeR-DESeq2分析RNA-seq差异表达

上传人：d*** IP属地：天津上传时间：2022-01-05 格式：DOCX 页数：10 大小：23.89KB 积分：18 举报 版权申诉

免费预览已结束，剩余5页可下载查看

 下载本文档

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

1、edgeR包的安装edgeR包是基于Bioconductor 平台发布的，所以安装不能直接用()命令从CRAN上来下载安装：# try if URLs are not supported>source ("")>biocLite ("edgeR")数据导入由于edgeR对测序结果的下游分析是依赖 count计数来进行基因差异表达分析的，在这里使用的是featureCounts 来进行统计 '.bam' 文件中Map的结果count结果如下：>library (edgeR)>mydata <- ("

2、;" , header = TRUE quote = 't', skip =1)>sampleNames <- c( "CA_1", "CA_2", "CA_3", "CC_1", "CC_2", "CC_3")>names(mydata) 7: 12 <- sampleNames>head( mydata)GeneidChr StartEnd StrandLengthCA_1CA_2CA_3 CC_1 CC_2 CC_

3、31gene131412571745+4890000002gene131521153452ene131638564680+8250000004gene131748665435-5700000005gene131860666836-7710000006gene131972949483+2190000000在这里我们只是需要Geneid和后6列的样本的count信息来组成矩阵，所以要处理下>countMatrix <- ( mydata 7: 12)>rownames( countMatrix ) <- mydata $Geneid>hea

4、d( countMatrix )CA_1 CA_2 CA_3 CC_1 CC_2 CC_3gene1314000000gene1315000000gene1316000000gene1317000000gene1318000000gene1319000000*要导入的矩阵由3v3样本组成（三组生物学重复）创建 DEGlist>group <- factor (c( "CA", "CA", "CA", "CC", "CC", "CC")>y <- DGE

5、List( counts = countMatrix , group = group)>yAn object of class "DGEList" $countsgene13140000gene13150000gene13160000gene13170000gene13180000CA_1 CA_2 CA_3 CC_114212 more rowsCC_2 CC_30000000000$samplesgroupCA_1CA_11788537CA_2CA_21825546CA_3CA_31903017CC_1CC_11826042CC_2CC_22124468CC3CC

6、32025063111111过滤过滤掉那些count结果都为0的数据，这些没有表达的基因对结果的分析没有用，过滤又两点好处：1可以减少内存的压力2可以减少计算的压力>keep <- rowSums( cpm(y) >1) >= 2>y <- y keep,object of class "DGEList"$countsgene1321gene1322CA_11612CA_2 (1383工_31291CC_12181CC_21943CC_32203gene1323202733475146gene132460877986100132gene

7、13253229215875563877 morerows.$samplesgroupCA_1 CA_1 17883621CA_2 CA_2 18253081CA_3 CA_3 19027961CC_1 CC_1 18258891CC_2 CC_2 21241551CC_3 CC_3 20247861标准化处理,只有标准化处理后的数据才edgeR采用的是TMM方法进行标准化处理又可比性>y <- calcNormFactors (y)>yAn object of class "DGEList"$countsCA_1 CA_2 CA_3 CC_1 CC_2

8、 CC_3gene1321161138129218194220gene1322 231133gene1323202733475146gene132460877986100132gene13253229215875563877 more rows .$samplesgroupCA_1 CA_1 1788362CA_2 CA_2 1825308CA_3 CA_3 1902796CC_1 CC_1 1825889CC_2 CC_2 2124155CC_3 CC_3 2024786设计矩阵为什么要一个设计矩阵呢，道理很简单，有了一个设计矩阵才能够更好的分组分析>subGroup <- f

9、actor (substring (colnames (countMatrix ), 4,4)>design <- ( subGroup+group)>rownames( design ) <- colnames (y)>design(Intercept)subGroup2subGroup3groupCCCA_11000CA_21100CA_31010CC_11001CC_21101CC31011attr(,"assign")101 1 2attr(,"contrasts")attr(,"contrasts&quo

10、t;)$subGroup1""attr(,"contrasts")$group1""评估离散度>y <- estimateDisp (y, design , robust =TRUE>y$1#plot邛lotBCV (y)差异表达基因> fit <- glmQLFit (y, design , robust =TRUE> qlf <- glmQLFTest (fit )> topTags (qlf )Coefficient : groupCClogFC logCPMF PValueFDR

11、gene7024gene6612gene2743gene12032gene491gene8941gene2611gene6242gene7252gene6125查看差异表达基因原始的CMP> top <- rownames( topTags ( qlf )> cpm( y) top ,CA_1CA_2CA_3 CC_1 CC_2 CC_3gene7024gene6612gene2743gene12032gene491gene8941gene2611gene6242gene7252gene6125查看上调和下调基因的数目> summary( dt <- decide

12、TestsDGE (qlf ),1> 15360 27931553挑选出差异表达基因的名字> isDE <- (dt)> DEnames<- rownames(y) isDE> head( DEnameS1 "gene1325" "gene1326" "gene1327" "gene1331" "gene1340" "gene1343"差异表达基因画图>plotSmear (qlf , =DEname$>abline ( h=c

13、( -1 , 1), col ="blue")DESeq2包的安装安装：# try if URLs are not supported>source ("")>biocLite ("DESeq2")数据导入count 矩阵condition导入count矩阵，导入数据的方式很多这里直接导入 count结果如下：library (DESeq2sampleNames <- c( "CA_1", "CA_2", "CA_3", "CC_1",

14、"CC_2", "CC_3")mydata <- ("" , header = TRUE, quote = 't', skip =1)names( mydata) 7: 12 <- sampleNames countMatrix <- ( mydata 7: 12) rownames( countMatrix ) <- mydata $Geneid table2 <- (name = c( "CA_1" , "CA_2", "CA_3&qu

15、ot;, "CC_1", "CC_2", "CC_3"), ("CA", "CA", "CA", "CC", "CC", "CC") rownames(table2 ) <- sampleNames head( countMatrix )CA_1CA_2 CA_3 CC_1CC_2 CC_3gene1314000000gene1315000000gene1316000000gene1317000000gene1

16、318000000gene1319000000把count矩阵转化为DESeq2的数据格式>dds <- DESeqDataSetFromMatrix (countMatrix , colData =table2 , design = condition )> ddsclass : DESeqDataSet dim: 14217 6 metadata (0): assays (1): counts rownames( 14217): gene1314 gene1315 . gene6710 gene6709 rowRanges metadata column names( 0

17、): colnames (6): CA_1 CA_2 CC_2 CC_3 colData names( 2) : name condition过滤过滤掉那些count结果都为0的数据，这些没有表达的基因对结果的分析没有用dds <- dds rowSums(counts (dds) >1,ddsclass : DESeqDataSetdim: 4190 6metadata (0):assays (1): countsrownames(4190): gene1321 gene1322 . gene6712 gene6710rowRanges metadata column name

18、s( 0):colnames (6): CA_1 CA_2 . CC_2 CC_3colData names( 2) : name conditionPC矽析rld <- rlog (dds)plotPCA (rld , intgroup =c("name", "condition")当然也可以使用ggplot2 来画 PCA图 library(ggplot2) rld <- rlog(dds) data <- plotPCA(rld, intgroup=c("condition", "name"

19、), returnData=TRUE) percentVar <- round(100 * attr(data, "percentVar")p<- ggplot(data, aes(PC1, PC2, color=condition, shape=name) +geom_point(size=3) +xlab(paste0("PC1: ",percentVar1,"% variance") +ylab(paste0("PC2: ",percentVar2,"% variance")

20、p注意在进行PCA分析前不要library(DESeq)否则无法进行PCA分析差异表达基因分析分析结果输出library (DESeq dds <- DESec(dds) res <- results (dds)(res ,"" , sep = "," ,= TRUE)head (res )log2 fold change (MAP： condition CCvs CA Wald test p- value : condition CC vs CA DataFrame with 6 rows and 6 columnsbaseMean log

21、2FoldChange lfcSEstatpvalue<numeric > <numeric > <numeric > <numeric > <numeric >gene1321 0.gene1322gene13230.gene13240.gene13250.gene1326padj<numeric >gene1321gene1322gene1324gene1325gene1326注：(1)rownames:基因ID (2)baseMean:所有样本矫正后的平均reads 数(3)log2FoldChange:取log2 后的表达量差异(4)pvalue:统计学差异显著性检验指标(5)padj:校正后的pvalue, padj越小，表示基因表达差异越显著summary 查看整体分析结果summary(res)out of 4190 with nonzero total read countadjusted

人人文库> 全部分类> 教育资料 > 备课教案

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

edgeR-DESeq2分析RNA-seq差异表达

文档简介

温馨提示

最新文档

评论

edgeR-DESeq2分析RNA-seq差异表达

文档简介

温馨提示

最新文档

评论

相关文档