hg19 (GRCh37) 与 hg38 (GRCh38) 数据差异比较幻灯片_第1页
hg19 (GRCh37) 与 hg38 (GRCh38) 数据差异比较幻灯片_第2页
hg19 (GRCh37) 与 hg38 (GRCh38) 数据差异比较幻灯片_第3页
hg19 (GRCh37) 与 hg38 (GRCh38) 数据差异比较幻灯片_第4页
hg19 (GRCh37) 与 hg38 (GRCh38) 数据差异比较幻灯片_第5页
已阅读5页,还剩26页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、hg19 (GRCh37) vs. hg38 (GRCh38) Human Genome Reference Comparison,Zuotian Tatum Department of Human Genetics Leiden University Medical Center,1,2,Timeline,GRCh37: First release: Feb 27, 2009 Latest patch: Jun 28, 2013 (p13),GRCh38: First release: Dec 24, 2013 Latest patch: Oct 14, 2014 (p1),http:/ww

2、/projects/genome/assembly/grc/human/data/,2,6/23/2020,Content,GRCh37.p13: Total bases: 3.23 Billion 2.99 Billion (without N) N50: 46 Million Number of alternative loci: 9 Non-nuclear genome: No,GRCh38.p2: Total bases: 3.21 Billion 3.05 Billion (without N) N50: 67 Million Number of

3、alternative loci : 261 Non-nuclear genome: Yes,/projects/genome/assembly/grc/human/data/,3,6/23/2020,UCSC tracks for GRCh38,UCSC RefSeq available since April 2014. Ensembl regulatory build available since September 2014. dbSNP 141 available since October 2014. ENCODE and FA

4、NTOM5 track hubs are still not available (Nov 2014).,4,6/23/2020,New in GRCh38 release,Three new sequence files, in addition to the standard assembly files: - GCA_000001405.15_GRCh38_top-level.fna.gz - GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz - GCA_000001405.15_GRCh38_full_analysis_set.fna

5、.gz The analysis set files are created to avoid false mapping in NGS alignment pipelines.,5,6/23/2020,GCA_000001405.15_GRCh38_top-level.fna.gz,All the top-level objects in the full-assembly Chromosomes unlocalized scaffolds unplaced scaffolds alternate locus scaffolds mitochondrial genome The sequen

6、ce identifiers are International Sequence Database Collaboration (INSDC) accession.versions and the definition lines are GenBank style. No sequences have been hard-masked.,6,6/23/2020,GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz,Chromosomes from the GRCh38 Primary Assembly unit. Note: the two

7、PAR regions on chrY have been hard-masked with Ns. The chromosome Y sequence provided therefore has the same coordinates as the GenBank sequence but it is not identical to the GenBank sequence. Similarly, duplicate copies of centromeric arrays and WGS on chromosomes 5, 14, 19, 21 & 22 have been hard

8、-masked with Ns. Mitochondrial genome from the GRCh38 non-nuclear assembly unit. Unlocalized scaffolds from the GRCh38 Primary Assembly unit. Unplaced scaffolds from the GRCh38 Primary Assembly unit. Epstein-Barr virus (EBV) sequence Note: The EBV sequence is not part of the genome assembly but is i

9、ncluded in the analysis set as a sink for alignment of reads that are often present in sequencing samples.,7,6/23/2020,GCA_000001405.15_GRCh38_full_analysis_set.fna.gz,= GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz + alt-scaffolds from the GRCh38 ALT_REF_LOCI_* assembly units,8,6/23/2020,Alt-l

10、oci add complexity to RNASeq quantification,9,2,Ideogram of GRCh38.p2,10,6/23/2020,RNASeq quantification,- Fragments (reads) per million per killobase (FPKM/RPKM) values to quantify gene expression - Unique mapping only Analysis tools do not distinguish allelic duplication from paralogous duplicatio

11、n - Non overlapping gene regions,11,6/23/2020,To understand the effect of alt-loci on RNASeq quantification,Compare alignment of chromosome 6 MHC region between - hg19 full set with 7 alt-loci - hg38 analysis set without alt-loci Sequence content are largely unchanged between hg19 and hg38.,12,6/23/

12、2020,Mapping/alignment for RNASeq,hg19: with alt loci hg38: without alt loci,13,6/23/2020,Effect of alt loci in RNASeq alignments,Gene RPKM (hg38),14,6/23/2020,Distribution of RPKM difference,15,6/23/2020,Major Histocompatibility complex region on chromosome 6,16,6/23/2020,HLA-A,hg19 full set chr6,D

13、1,17,6/23/2020,HLA-A,hg19 full set chr6,hg38 analysis set,18,6/23/2020,HLA-C,hg19 full set,D1,D2,D3,19,6/23/2020,HLA-DRA,hg19 full set,D1,D2,D3,20,6/23/2020,Major Histocompatibility complex region on chromosome 6,21,6/23/2020,MHC Class III,700kb stretch, 60 genes. The most gene-dense region of the h

14、uman genome 14% coding 72% transcribed Highly conserved Only a free have clearly defined and proven function,22,6/23/2020,TNF,hg19 full set chr6,D1.control,D1.treated,23,6/23/2020,Highly variant immune regions retiled,24,2,LILRA3 moved to alt-loci in hg38,hg19,hg38,LILRB2 LILRA3 LILRA5,LILRB2 LILRA5

15、,25,6/23/2020,Phantom LILRA3,26,6/23/2020,LILRA3 in hg19,Intergenic,LILRB3,LILRA4,LILRB5,27,6/23/2020,Gene length calculation,We need gene length for calculating RPKM. If alignment uses alt loci RPKM would be artificially lowered for alt loci genes. If alignment does not alt loci Remove alt loci ann

16、otations from the official set.,28,6/23/2020,Need more comprehensive approach to genome variation.,Assembly model is neither haploid nor diploid Analysis tools penalize reads mapping to 1 location do not distinguish allelic duplication from paralogous duplication A graph structure is a natural way to represent a population-based genome assembly,29,6/23/2020,Conclusions,RPKM values are highly correlated between hg19 and hg38. Analysis set is preferred for

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论