版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、二代测序的建库与测序原理,何有裕 上海生物信息技术研究中心 上海众信生物技术有限公司 苏州众信生物技术有限公司,内容,样本处理与测序原理简介 罗氏454 Illumina solexa 原始数据质量控制,TruSeq RNA and DNA Sample Preparation,Cluster Generation Overview, 1000-6000 molecules per cluster,OH,Cluster Generation, Template Hybridization,diol,diol,1st cycle denaturation,Cluster Generation,
2、Bridge PCR,Template preparation-bridge RCR,Adaptor ligation,Surface attachment,Bridge amplification,Denaturation,Trends in Genet 24:133(2008),First base incorporated,Cycle 1: Add sequencing reagents,Detect Signal,Cleave Terminator and Dye,Cycle 2-n: Add sequencing reagents and repeat,Sequencing by S
3、ynthesis Overview,Cyclic reversible termination,All four labeled reversible terminators are added per cycle Remove unincorporated bases and detect signal Remove the terminating group and the fluorescent dye,Trends in Genet 24:133(2008),Terminating group,Fluorophore cleavage,Nat Rev Genet 11:31(2010)
4、,Base calling,Flowcell layout on GAII,A flow cell contains 8 lanes,Lane 1,Lane 2,Lane 8,. . .,Column 1 Column 2,Each lane contains 2 columns,Each column contains 60 tiles,Each tile is imaged 4 times per cycle,Primary Data Analysis By Firecrest and Bustard in RTA/OLB,tiff image file,Intensity file,Fi
5、recrest,Bustard,Sequence file,OH,diol,diol,OH,Cluster Generation, Sequencing Primer Hybridization(Single测序方式处理步骤),Sequence multiple samples in the same lanes,DNA insert,Read 1,Index Read,Read 2,DNA insert,Index,Index SP,Rd2 SP,Rd1 SP,Multiplexing multiple samples in the same lanes,Pair-end 测序优势,Mate
6、-pair 建库和测序,Molecular Ecology Resources (2011),Template preparation- emulsion PCR,Trends in Genet 24:133(2008),Pyrosequencing,Single dNTP type flows per cycle Inorganic pyrophosphate (PPi) drives visible light through a series of reactions Remove unincorporated nucleotide,Trends in Genet 24:133(2008
7、),Base calling,Homopolymer error,GV6330,20,灵活的多样本标签技术,454、solexa测序模式,Detect H+ released as a voltage changefast Common microchip design standardslow-cost manufacturing Sequencing volume is increasing,Semiconductor sequencing,Fasta序列格式,Fastq 文件用4行记录一条序列,第一行以字符开头,跟在后面的是序列标识和描述 第二行是序列字符 第三行以+字符开头,后面可以为
8、空,或者和第一行一样 第四行是第二行序列质量数据的编码,长度需和第二行一样,HWI-ST507:211:C18E6ACXX:2:1101:1688:1992 1:N:0:GAGTGG CGACAATTTTTTTTGATATTAATAAAGATAGAACTTTCTTCCTATGAGTTTTCTCTC + CCCFFDFFHHHHGJJGHIIJGIIJJJJIIJJHJJJJJIJJIIIGIIIJGGIHJDIJIGAHEHFFGHGHE,Example:,Illumina sequence identifiers,HWI-EAS364_0004:4:1:995:9044#0/1,Casava
9、 1.8以前的序列标识,Illumina sequence identifiers,HWI-ST507:211:C18E6ACXX:2:1101:1688:1992 1:N:0:GAGTGG,Casava 1.8的序列标识,序列质量,附:Solexa 1.3以前的quality计算公式是:,SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS. .XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX. .IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII. .JJJJJJJJJJJJJJJJJ
10、JJJJJJJJJJJJJJJJJJJJJJ. LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL. !#$%?ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqr | | | | | 33 59 64 73 104 0 -0 0.9.40 3.9.40 1 S - Sanger Phred+33, raw reads typically (0, 40) X - Solexa Solexa+64, raw reads typically (-5, 40) I - Illum
11、ina 1.3+ Phred+64, raw reads typically (0, 40) J - Illumina 1.5+ Phred+64, raw reads typically (3, 40) with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold) (Note: See discussion above). L - Illumina 1.8+ Phred+33, raw reads typically (0, 41),Q值对应ASCII码,454原始数据图片、sff格式、fasta格式(qua
12、l),HSAPGDX01D1KDA length=181 xy=1540_3788 region=1 run=R_2012_08_01_00_39_39 ACGTGTTCTGAGCCATATTGCGGTACTGGAAGGTGCGCCTGCACTGTCTGAGCACTGGTCACTGCTCGATACCAATGAAGCCTTATTTGATGAGGCGCGCACCACGCAGGCGGCGACTATTATCTTCTCGTTTGATCCAGAATAACCAAATCGAAAACGCTGGCAAGGCACACAGGGGATA HSAPGDX01D1KDA length=181 xy=1540_3788 re
13、gion=1 run=R_2012_08_01_00_39_39 40 40 40 40 40 40 40 39 37 38 36 34 24 23 19 19 19 24 20 19 18 18 26 26 18 18 19 18 20 20 20 25 25 26 19 20 20 22 22 22 25 28 26 24 22 22 22 25 24 28 28 28 29 29 28 30 30 30 26 2626 27 27 27 31 31 30 28 28 28 30 30 30 30 26 21 21 20 20 26 27 28 24 25 20 20 20 20 19 1
14、9 19 27 28 28 30 30 31 30 28 28 30 31 31 32 32 31 31 30 30 30 31 27 24 24 22 20 20 20 22 2626 22 22 23 16 16 16 19 22 16 13 13 13 16 22 23 23 23 26 26 24 24 26 13 13 11 11 12 12 19 22 18 18 11 11 13 13 18 24 24 24 24 26 26 26 27 29 29 31 33 32 31 31 27 27 27 29 29 28 2622,454原始数据长度分布(质控后一样),Yield, d
15、ata size produced by sequencer. Reads, sequenced fragments. Read length and quality. Coverage fold, number of times a nucleotide is represented. Depth, the average coverage fold. Coverage rate, ratio of the region sequenced to the whole genome. Homopolymer, e.g. AAAAA,Key lab of systems biology SIBS
16、, Chinese Academy of Sciences,一些测序中提到的基本概念,通常深度测序数据处理流程,Key lab of systems biology SIBS, Chinese Academy of Sciences,序列质量评估, FastQC: A quality control tool for high throughput sequence data Java http:/www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ Function:,QC pipeline,原始数据的质控过滤,Sequence level Short sequences Adaptor/primer polyA | T region Overall low-complexity sequence (Dust) Contamination/unwanted sequences Ns (low quality ends) Quality level Low quality base or region 目标:所有保留的都是高质量的,真正参与生物信息分析的数据。,Clean reads,去掉含有接头序列的r
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 二零二五版酒店安保服务与旅游安全监管合同3篇
- 二零二五版担保居间服务线上线下融合合同3篇
- 二零二五年砂石料采购合同2篇
- 二零二五版国际教育服务合同范本及学生权益保护条款3篇
- 二零二五年度变压器安装与环保排放标准合同3篇
- 样板间装修工程2025版知识产权合同3篇
- 二零二五版单位食堂餐饮服务设施租赁合同3篇
- 二零二五年辣椒种植与加工一体化项目合同3篇
- 二零二五版电子商务移动应用开发与推广合同2篇
- 二零二五年酒店会议室装修与设备安装服务合同3篇
- 新华健康体检报告查询
- 2024版智慧电力解决方案(智能电网解决方案)
- 公司SWOT分析表模板
- 小学预防流行性感冒应急预案
- 肺癌术后出血的观察及护理
- 生物医药大数据分析平台建设-第1篇
- 基于Android的天气预报系统的设计与实现
- 冲锋舟驾驶培训课件
- 美术家协会会员申请表
- 聚合收款服务流程
- 中石化浙江石油分公司中石化温州灵昆油库及配套工程项目环境影响报告书
评论
0/150
提交评论