基因组组装技术课件_第1页
基因组组装技术课件_第2页
基因组组装技术课件_第3页
基因组组装技术课件_第4页
基因组组装技术课件_第5页
已阅读5页,还剩14页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、基因组组装 2019.10.29基因组组装 一、Genome survey Kmer: a continuous nucleic acid sequences, the length is K bp. Suppose the genome is unique to K, we can get G different kmers. when generate a read, the possibility of a certain kmer be sequenced is (L-K+1)/G. L/G is very small, the n_r is very large, this is o

2、bey to Poisson distribution. So,d_k = (L-K+1)/G*n_r n_k = (L-K+1)*n_rthen,G =n_k/d_k一、Genome survey Kmer: a conQuality control and filtering Reads having a N over 10% of its length. Reads from short insert-size libraries having more than 65% bases with the quality 7, and the reads from large insert-

3、size libraries that contained more than 80% bases with the quality 7. Read 1 and read 2 of two paired-end reads that were completely identical (and thus considered to be the products of PCR duplication).Quality control and filteringError correction before assemblyError correction before assemb二、SOAP

4、denovo algorithm SOAPdenovo was developed to assemble large genomes, such as human, it also works well for small genomes like bacteria.Include five major steps: De bruijn graph construction Graph simplification and obtain contigs Pair-end reads mapping to contigs Construct scaffolds Gap filling with

5、 pair-end reads二、SOAPdenovo algorithm SOAPdeSequence assembly refers to aligning and merging fragments to a much longer DNA sequence in order to reconstruct the original sequence.Overlap:contigGe+en+no+om+mi+ic+csGenomicsPair-end:scaffoldnomGenomesemassemblyGenome*assembly221、De bruijn graph constru

6、ctionSequence assembly refers to alReads : AGATCTTGTTATTGTTATTGATCTCCDe bruijn graph constructionliding to take Kmer from reads,storing the links betweenneighboring Kmers.If the Kmer is already existent,merge the links of it with the first ones.AGATCATCTTCTTGTTTGTTTGTTAGTTATATCTCTCTCCGATCTTCTTGTTATT

7、TATTGTTGATATTGATGATCReads : AGATCTTGTTATTGTTATTGATDe bruijn graphDe bruijn graph2、Graph simplification Contigs: GATCTTGTTATTGATCT GATCTCCAGATCTset -R parameterContigs: AGATCTTGTTATTGATCTCCRead1:AGATCTTGTTATT Read2:GTTATTGATCTCCAGATC 1GATCTATCTTGTTATTGATCATCTCC234AGATCGATCTATCTTTCTTGCTTGTTTGTTTGTTAGT

8、TATATCTCTCTCCTTATTTATTGATTGATTGATTGATC2、Graph simplification Contigs3、Pair-end mapping to contig3、Pair-end mapping to contig4、Construct scaffoldsNote:For mate-pair(=2Kb), the order is just opposite.A reliable link will be built between two contigs, when pair-end/mate-pair readssupport larger than th

9、e number be set.The gap size is estimated from the insert size of each reads pair.4、Construct scaffoldsNote:5、Gap closureGet reads located in the gap and then do local assembly.(1) Close gap by pair-end information (One end mapped on the contig, the other end fall in the gap)(2)Do a local assembly u

10、sing the reads fall in the gap to get a sequence connect with the both edges of two contigs.Note: Gap closure here also means extend contigs.5、Gap closureGet reads locatedSchematic overview Schematic overview 三、Evaluation of assembly resultLengthcontig (scaffold) N50 size, N90 size, total length, coverage ratio of genome.AccuracyCoverage of gene sequences, compare to EST or transcriptome sequences.Compare with golden standard (such as BAC/fosmid) .三、Evaluation of assembly resulEvaluation of Gene Region CoverageEvaluation of Gene Region CoveCompare with golden standardCompare with

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论