版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、动态规划法双序列比对第1页,共55页,2022年,5月20日,15点18分,星期一2 /55习题4,求两条序列的最长共同子序列。【作业】v = TACGGGTATw = GGACGTACG第2页,共55页,2022年,5月20日,15点18分,星期一3 /550123456789000000000001020304050607080905GGACGTACGTACGGGTAT第3页,共55页,2022年,5月20日,15点18分,星期一4Sequence Alignment第4页,共55页,2022年,5月20日,15点18分,星期一5 /55OutlineGlobal Alignment Sc
2、oring MatricesLocal AlignmentAlignment with Affine Gap Penalties第5页,共55页,2022年,5月20日,15点18分,星期一6 /55From LCS to Alignment: Change up the ScoringThe Longest Common Subsequence (LCS) problemthe simplest form of sequence alignment allows only insertions and deletions (no mismatches). In the LCS Problem
3、, we scored 1 for matches and 0 for indelsConsider penalizing indels and mismatches with negative scoresSimplest scoring schema: +1 : match premium - : mismatch penalty - : indel penalty- T G C A T - A - CA T - C - T G A T CAKRANRKAAANK-1 + (-1) + (-2) + 5 + 7 + 3 = 11第6页,共55页,2022年,5月20日,15点18分,星期一
4、7 /55Simple ScoringWhen mismatches are penalized by , indels are penalized by , and matches are rewarded with +1, the resulting score is: #matches (#mismatches) (#indels)第7页,共55页,2022年,5月20日,15点18分,星期一8 /55The Global Alignment ProblemFind the best alignment between two strings under a given scoring
5、schemaInput : Strings v and w and a scoring schemaOutput : Alignment of maximum scorem : mismatch penalty : indel penalty第8页,共55页,2022年,5月20日,15点18分,星期一9 /55Scoring MatricesTo generalize scoring, consider a (4+1) x(4+1) scoring matrix . In the case of an amino acid sequence alignment, the scoring ma
6、trix would be a (20+1)x(20+1) size. The addition of 1 is to include the score for comparison of a gap character “-”.This will simplify the algorithm as follows:第9页,共55页,2022年,5月20日,15点18分,星期一10 /55The Blosum62 Scoring Matrix第10页,共55页,2022年,5月20日,15点18分,星期一11 /55Measuring SimilarityMeasuring the exte
7、nt of similarity between two sequencesBased on percent sequence identityBased on conservation第11页,共55页,2022年,5月20日,15点18分,星期一12 /55Percent Sequence IdentityThe extent to which two nucleotide or amino acid sequences are invariantA C C T G A G A G A C G T G G C A G70% identicalmismatchindel第12页,共55页,2
8、022年,5月20日,15点18分,星期一13 /55Making a Scoring MatrixScoring matrices are created based on biological evidence. Alignments can be thought of as two sequences that differ due to mutations. Some of these mutations have little effect on the proteins function, therefore some penalties, (vi , wj), will be l
9、ess harsh than others.第13页,共55页,2022年,5月20日,15点18分,星期一14 /55Scoring Matrix: ExampleAKRANRKAAANK-1 + (-1) + (-2) + 5 + 7 + 3 = 11ARNKA5-2-1-1R-7-13N-70K-6 Notice that although R and K are different amino acids, they have a positive score. Why? They are both positively charged amino acids will not gre
10、atly change function of protein.第14页,共55页,2022年,5月20日,15点18分,星期一15 /55ConservationAmino acid changes that tend to preserve the physico-chemical properties of the original residuePolar to polaraspartate glutamateNonpolar to nonpolaralanine valineSimilarly behaving residuesleucine to isoleucine第15页,共5
11、5页,2022年,5月20日,15点18分,星期一16 /55Scoring matricesAmino acid substitution matricesPAMBLOSUMDNA substitution matricesDNA is less conserved than protein sequencesLess effective to compare coding regions at nucleotide level第16页,共55页,2022年,5月20日,15点18分,星期一17 /55PAMPoint Accepted Mutation (Dayhoff et al.)1
12、PAM = PAM1 = 1% average change of all amino acid positionsAfter 100 PAMs of evolution, not every residue will have changedsome residues may have mutated several timessome residues may have returned to their original statesome residues may not changed at all第17页,共55页,2022年,5月20日,15点18分,星期一18 /55PAMXP
13、AMx = PAM1xPAM250 = PAM1250PAM250 is a widely used scoring matrix: Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys . A R N D C Q E G H I L K .Ala A 13 6 9 9 5 8 9 12 6 8 6 7 .Arg R 3 17 4 3 2 5 3 2 6 3 2 9Asn N 4 4 6 7 2 5 6 4 6 3 2 5Asp D 5 4 8 11 1 7 10 5 6 3 2 5Cys C 2 1 1 1 52 1 1 2 2 2 1 1Gln Q
14、 3 5 5 6 1 10 7 3 7 2 3 5.Trp W 0 2 0 0 0 0 0 0 1 0 1 0Tyr Y 1 1 2 1 3 1 1 1 3 2 2 1Val V 7 4 4 4 4 4 4 4 5 4 15 10第18页,共55页,2022年,5月20日,15点18分,星期一19 /55BLOSUMBlocks Substitution Matrix Scores derived from observations of the frequencies of substitutions in blocks of local alignments in related prot
15、einsMatrix name indicates evolutionary distanceBLOSUM62 was created using sequences sharing no more than 62% identity第19页,共55页,2022年,5月20日,15点18分,星期一20 /55The Blosum62 Scoring MatrixBLOSUM 62第20页,共55页,2022年,5月20日,15点18分,星期一21 /55BLOSUM90PAM30低趋异度小鼠和大鼠RBPBLOSUM45PAM240高趋异度小鼠和细菌的lipocalinBLOSUM80PAM12
16、0BLOSUM62PAM180相似度越低的序列,在比对的时候,采用PAM矩阵时,后面的数字越大,采用BLOSUM矩阵时,后面的数字越小。第21页,共55页,2022年,5月20日,15点18分,星期一22 /55Local vs. Global AlignmentThe Global Alignment Problem tries to find the longest path between vertices (0,0) and (n,m) in the edit graph.The Local Alignment Problem tries to find the longest pat
17、h among paths between arbitrary vertices (i,j) and (i, j) in the edit graph.第22页,共55页,2022年,5月20日,15点18分,星期一23 /55Local vs. Global AlignmentThe Global Alignment Problem tries to find the longest path between vertices (0,0) and (n,m) in the edit graph.The Local Alignment Problem tries to find the lon
18、gest path among paths between arbitrary vertices (i,j) and (i, j) in the edit graph.In the edit graph with negatively-scored edges, Local Alignmet may score higher than Global Alignment第23页,共55页,2022年,5月20日,15点18分,星期一24 /55Local vs. Global Alignment (contd)Global AlignmentLocal Alignmentbetter align
19、ment to find conserved segment -T-CC-C-AGT-TATGT-CAGGGGACACGA-GCATGCAGA-GAC | | | | | | | | | | | | | | | AATTGCCGCC-GTCGT-T-TTCAG-CA-GTTATGT-CAGAT-C tccCAGTTATGTCAGgggacacgagcatgcagagac |aattgccgccgtcgttttcagCAGTTATGTCAGatc第24页,共55页,2022年,5月20日,15点18分,星期一25 /55Local Alignment: ExampleGlobal alignme
20、ntLocal alignmentCompute a “mini” Global Alignment to get Local第25页,共55页,2022年,5月20日,15点18分,星期一26 /55Local Alignments: Why?Two genes in different species may be similar over short conserved regions and dissimilar over remaining regions.Example:Homeobox genes have a short region called the homeodomai
21、n that is highly conserved between species. A global alignment would not find the homeodomain because it would try to align the ENTIRE sequence第26页,共55页,2022年,5月20日,15点18分,星期一27 /55The Local Alignment ProblemGoal: Find the best local alignment between two stringsInput : Strings v, w and scoring matr
22、ix Output : Alignment of substrings of v and w whose alignment score is maximum among all possible alignment of all possible substrings第27页,共55页,2022年,5月20日,15点18分,星期一28 /55The Problem with this ProblemLong run time O(n4): - In the grid of size n x n there are n2 vertices (i,j) that may serve as a s
23、ource. - For each such vertex computing alignments from (i,j) to (i,j) takes O(n2) time.This can be remedied by giving free rides第28页,共55页,2022年,5月20日,15点18分,星期一29 /55Local Alignment: ExampleGlobal alignmentLocal alignmentCompute a “mini” Global Alignment to get Local第29页,共55页,2022年,5月20日,15点18分,星期一
24、30 /55Local Alignment: Example第30页,共55页,2022年,5月20日,15点18分,星期一31 /55Local Alignment: Example第31页,共55页,2022年,5月20日,15点18分,星期一32 /55Local Alignment: Example第32页,共55页,2022年,5月20日,15点18分,星期一33 /55Local Alignment: Example第33页,共55页,2022年,5月20日,15点18分,星期一34 /55Local Alignment: Example第34页,共55页,2022年,5月20日,
25、15点18分,星期一35 /55Local Alignment: Running TimeLong run time O(n4): - In the grid of size n x n there are n2 vertices (i,j) that may serve as a source. - For each such vertex computing alignments from (i,j) to (i,j) takes O(n2) time.This can be remedied by giving free rides第35页,共55页,2022年,5月20日,15点18分
26、,星期一36 /55Local Alignment: Free RidesVertex (0,0)The dashed edges represent the free rides from (0,0) to every other node.Yeah, a free ride!第36页,共55页,2022年,5月20日,15点18分,星期一37 /55The Local Alignment RecurrenceThe largest value of si,j over the whole edit graph is the score of the best local alignment
27、.The recurrence: 0 si,j = max si-1,j-1 + (vi, wj) s i-1,j + (vi, -) s i,j-1 + (-, wj)Notice there is only this change from the original recurrence of a Global Alignment第37页,共55页,2022年,5月20日,15点18分,星期一38 /55The Local Alignment RecurrenceThe largest value of si,j over the whole edit graph is the score
28、 of the best local alignment.The recurrence: 0 si,j = max si-1,j-1 + (vi, wj) s i-1,j + (vi, -) s i,j-1 + (-, wj)Power of ZERO: there is only this change from the original recurrence of a Global Alignment - since there is only one “free ride” edge entering into every vertex第38页,共55页,2022年,5月20日,15点1
29、8分,星期一39 /55 NP_006735NP_000945第39页,共55页,2022年,5月20日,15点18分,星期一40 /55第40页,共55页,2022年,5月20日,15点18分,星期一41 /55习题考虑序列v=TACGGGTAT和w= GGACGTACG。假设匹配奖励+1,错配和插缺罚分均为-1. 【作业】填写序列v和w之间的全局联配的动态规划表(编辑图或相似度矩阵)。在各单元画出箭头以存储返回信息。全局最优联配的得分是多少?这个得分对应的联配又是什么?填写序列v和w之间的局部联配的动态规划表。在各单元画出箭头以存储返回信息。在这种情形下,局部最优联配的得分是多少?这个得分
30、对应的联配又是什么?第41页,共55页,2022年,5月20日,15点18分,星期一42 /55-012345678900-1-2-3-4-5-6-7-8-91-12-23-34-45-56-67-78-899GGACGTACGTACGGGTAT全局比对第42页,共55页,2022年,5月20日,15点18分,星期一43 /55局部比对-012345678900000000000102030405060708090GGACGTACGTACGGGTAT第43页,共55页,2022年,5月20日,15点18分,星期一44 /55Scoring Indels: Naive ApproachA fix
31、ed penalty is given to every indel: - for 1 indel, -2 for 2 consecutive indels-3 for 3 consecutive indels, etc.Can be too severe penalty for a series of 100 consecutive indels第44页,共55页,2022年,5月20日,15点18分,星期一45 /55Affine Gap PenaltiesIn nature, a series of k indels often come as a single event rather
32、 than a series of k single nucleotide events:ATA_GCATATTGCATAG_GCAT_GTGCNormal scoring would give the same score for both alignmentsThis is more likely.This is less likely.第45页,共55页,2022年,5月20日,15点18分,星期一46 /55Accounting for GapsGaps- contiguous sequence of spaces in one of the rowsScore for a gap o
33、f length x is: -( + x) where 0 is the penalty for introducing a gap: gap opening penalty will be large relative to : gap extension penalty because you do not want to add too much of a penalty for extending the gap.第46页,共55页,2022年,5月20日,15点18分,星期一47 /55Affine Gap PenaltiesGap penalties: - when there
34、is 1 indel -2 when there are 2 indels -3 when there are 3 indels, etc. - x (-gap opening - x gap extensions)Somehow reduced penalties (as compared to nave scoring) are given to runs of horizontal and vertical edges第47页,共55页,2022年,5月20日,15点18分,星期一48 /55Affine Gap Penalties and Edit GraphTo reflect af
35、fine gap penalties we have to add “long” horizontal and vertical edges to the edit graph. Each such edge of length x should have weight - - x *第48页,共55页,2022年,5月20日,15点18分,星期一49 /55Adding “Affine Penalty” Edges to the Edit GraphThere are many such edges!Adding them to the graph increases the running
36、 time of the alignment algorithm by a factor of n (where n is the number of vertices)So the complexity increases from O(n2) to O(n3)第49页,共55页,2022年,5月20日,15点18分,星期一50 /55Manhattan in 3 Layers第50页,共55页,2022年,5月20日,15点18分,星期一51 /55Affine Gap Penalties and 3 Layer Manhattan GridThe three recurrences for the scoring algorithm creates a 3-layered graph. The top level creates/extends gaps in the sequen
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 二零二四版版权许可合同:就某游戏作品的网络游戏改编权的授予3篇
- 2024年技术代理合同3篇
- 陆军基地建设工程招标合同三篇
- 2024年建筑安装项目承包协议3篇
- 煤炭运输协议三篇
- 2024年建筑工程施工合同履行保障协议
- 2024年度桶装水销售分成协议
- 2024年度船只需要求合同
- 2024年二手房买卖资金托管协议书3篇
- 2024年国际货运代理协议汇编大全版B版
- 老年病理生理课件
- DB32-T 4353-2022 房屋建筑和市政基础设施工程档案资料管理规程
- 美国动画发展史课件
- 各省市地图形状课件
- 医院放射科质控小组关于DR图像体外伪影专项质控活动
- 苏教版四年级科学上册全册课件
- 安全生产举报和投诉调查处理台账参考模板范本
- 儿童话剧剧本范文(精选18篇)
- Q∕SY 1128-2007 输油管道集肤效应电伴热技术规范
- 2022年政府采购评审专家考试题库含答案
- DBJ50∕T-304-2018 桥梁结构健康监测系统实施和验收标准
评论
0/150
提交评论