版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、基因组组装 2019.10.29基因组组装 一、Genome survey Kmer: a continuous nucleic acid sequences, the length is K bp. Suppose the genome is unique to K, we can get G different kmers. when generate a read, the possibility of a certain kmer be sequenced is (L-K+1)/G. L/G is very small, the n_r is very large, this is o
2、bey to Poisson distribution. So,d_k = (L-K+1)/G*n_r n_k = (L-K+1)*n_rthen,G =n_k/d_k一、Genome survey Kmer: a conQuality control and filtering Reads having a N over 10% of its length. Reads from short insert-size libraries having more than 65% bases with the quality 7, and the reads from large insert-
3、size libraries that contained more than 80% bases with the quality 7. Read 1 and read 2 of two paired-end reads that were completely identical (and thus considered to be the products of PCR duplication).Quality control and filteringError correction before assemblyError correction before assemb二、SOAP
4、denovo algorithm SOAPdenovo was developed to assemble large genomes, such as human, it also works well for small genomes like bacteria.Include five major steps: De bruijn graph construction Graph simplification and obtain contigs Pair-end reads mapping to contigs Construct scaffolds Gap filling with
5、 pair-end reads二、SOAPdenovo algorithm SOAPdeSequence assembly refers to aligning and merging fragments to a much longer DNA sequence in order to reconstruct the original sequence.Overlap:contigGe+en+no+om+mi+ic+csGenomicsPair-end:scaffoldnomGenomesemassemblyGenome*assembly221、De bruijn graph constru
6、ctionSequence assembly refers to alReads : AGATCTTGTTATTGTTATTGATCTCCDe bruijn graph constructionliding to take Kmer from reads,storing the links betweenneighboring Kmers.If the Kmer is already existent,merge the links of it with the first ones.AGATCATCTTCTTGTTTGTTTGTTAGTTATATCTCTCTCCGATCTTCTTGTTATT
7、TATTGTTGATATTGATGATCReads : AGATCTTGTTATTGTTATTGATDe bruijn graphDe bruijn graph2、Graph simplification Contigs: GATCTTGTTATTGATCT GATCTCCAGATCTset -R parameterContigs: AGATCTTGTTATTGATCTCCRead1:AGATCTTGTTATT Read2:GTTATTGATCTCCAGATC 1GATCTATCTTGTTATTGATCATCTCC234AGATCGATCTATCTTTCTTGCTTGTTTGTTTGTTAGT
8、TATATCTCTCTCCTTATTTATTGATTGATTGATTGATC2、Graph simplification Contigs3、Pair-end mapping to contig3、Pair-end mapping to contig4、Construct scaffoldsNote:For mate-pair(=2Kb), the order is just opposite.A reliable link will be built between two contigs, when pair-end/mate-pair readssupport larger than th
9、e number be set.The gap size is estimated from the insert size of each reads pair.4、Construct scaffoldsNote:5、Gap closureGet reads located in the gap and then do local assembly.(1) Close gap by pair-end information (One end mapped on the contig, the other end fall in the gap)(2)Do a local assembly u
10、sing the reads fall in the gap to get a sequence connect with the both edges of two contigs.Note: Gap closure here also means extend contigs.5、Gap closureGet reads locatedSchematic overview Schematic overview 三、Evaluation of assembly resultLengthcontig (scaffold) N50 size, N90 size, total length, coverage ratio of genome.AccuracyCoverage of gene sequences, compare to EST or transcriptome sequences.Compare with golden standard (such as BAC/fosmid) .三、Evaluation of assembly resulEvaluation of Gene Region CoverageEvaluation of Gene Region CoveCompare with golden standardCompare with
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 端午节的实践活动方案范文(3篇)
- 广东供应链管理课程设计
- 幼儿舞蹈课程设计
- 幼儿班美术技法课程设计
- 读书心得300字范例(30篇)
- 幼儿园菇类生态课程设计
- 幼儿园牛仔主题课程设计
- 幼儿园法国礼仪课程设计
- 幼儿园松树主题课程设计
- 幼儿园摘柿子课程设计
- 四年级语文上册习作:我的家人【交互版】课件
- 电力绿色转型:绿色电力市场的实践与思考
- 阜阳职业技术学院2024年教师招聘招聘历年高频500题难、易错点模拟试题附带答案详解
- 5.1 走近老师 课件- 2024-2025学年统编版道德与法治七年级上册-1
- 送电线路工(初级)技能鉴定理论考试题库(浓缩300题)
- 围栏喷漆翻新施工方案
- 2024年文化和旅游部直属事业单位招聘历年高频500题难、易错点模拟试题附带答案详解
- 四川宜宾五粮液股份有限公司招聘考试试卷及答案
- 医疗行业智能化医疗设备维修与保养方案
- 2024年黑龙江省哈尔滨市道里区执法局招聘52人历年高频考题难、易错点模拟试题(共500题)附带答案详解
- 《人工智能与大数据技术》高职全套教学课件
评论
0/150
提交评论