




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、An Algorithmic Approach to Peptide Sequencing via Tandem Mass SpectrometryMing-Yang KaoDepartment of Computer ScienceNorthwestern UniversityEvanston, IllinoisU. S. A.1Collaborators of This ProjectUniversity of Southern CaliforniaTing Chen Harvard Medical SchoolGeorge M. ChurchJohn RushMatthew Tepel
2、2PerspectivesA key goal of bioinformatics: To study biological systems based on global knowledge of genomes, transcriptomes, and proteomes.Genome: entire sets of materials in the chromosomes.Transcriptome: entire sets of gene transcripts.Proteome: entire sets of proteins.Genome (DNA) Transcriptome (
3、RNA) Proteome (Protein)3PerspectivesA key goal of bioinformatics: To study biological systems based on global knowledge of genomes, transcriptomes, and proteomes.Genome: entire sets of materials in the chromosomes.Transcriptome: entire sets of gene transcripts.Proteome: entire sets of proteins.Genom
4、e (DNA) Transcriptome (RNA) Proteome (Protein)this talks focus4ProteomicsProteome: all proteins encoded within a genomehalf millions distinct proteins (temporal, spatial, modifications)30,000 human genesmRNA and protein expressions may not correlateProteomics: study of protein expression by biologic
5、al systemsrelative abundance and stability; post-translational modificationsfluctuations as a response to environment and altered cellular needscorrelations between protein expression and disease stateprotein-protein interactions, protein complexesTechnologies:2D gel electrophoresis mass spectrometr
6、yyeast two-hybrid systemprotein chipsthis talks focus5A Key Step of ProteomicsHow to sequence proteins?How to sequence protein peptides? (this talks focus)6Outline of This TalkProblem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Comp
7、lexity and More Robust AlgorithmsConclusions7Outline of This Talk (1)Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Complexity and More Robust AlgorithmsConclusions8Protein Identification: HPLC-MS-MSMass/ChargeTandem Mass Spect
8、rumMass/ChargeProteinsPeptidesOne PeptideB-ions / Y-ions9Protein Identification: HPLC-MS-MSMass/ChargeTandem Mass SpectrumMass/ChargeProteinsPeptidesOne PeptideB-ions / Y-ions10Peptide Fragmentation and IonizationB-ionY-ionComplementary: Mass(B-ion)+Mass(Y-ion) = Mass(peptide)+4H+O11B-ions and Y-ion
9、sFragmentation12Tandem Mass SpectrumMass / ChargeAbundance (100%)2005088.033100400175.113274.112361.121430.213448.22513Raw Tandem Mass Spectrum14Prediction from Raw Tandem Mass Spectrum15Protein Database SearchFind the peptide sequences in a protein database that optimally fit the spectrum.It does n
10、ot work if the target peptide sequence is not in the database.It does not work if there is an unknown modification at some amino acid.It is very slow because it must search the entire database.E.g., SEQUEST, Yates, Univ. of Washington.16De Novo Peptide Sequencing ProblemInput: (1) the mass W of an u
11、nknown target peptide, and (2) a set S of the masses of some or all b-ions and y-ions of the peptide.Output: a peptide P such that (1) mass(P)=W and (2) S is a subset of all the ion masses of P. Mass / ChargeAbundance (100%)50100274.112361.121Peptide Mass 429.212 DaltonsP = SWR,Mass(P) = 429.212,Ion
12、s(P) = 88.033, 175.113, 274.112, 361.121, 430.213, 448.22517Tandem Mass SpectrumMass / ChargeAbundance (100%)2005088.033100400175.113274.112361.121430.213448.225Peptide Mass 429.212 Daltons18Amino Acid Mass Table19Feature 1All B-ions form a forward mass ladder.Mass / ChargeAbundance (100%)2005088.03
13、3100400175.113274.112361.121430.213448.225SWRPeptide Mass 429.212 Daltonsb1b2b3120Feature 2All Y-ions form a reverse mass ladder.Mass / ChargeAbundance (100%)2005088.033100400175.113274.112361.121430.213448.225SWRRWSPeptide Mass 429.212 Daltonsy1y2y31921Basic Difficulty #1It is unknown whether an io
14、n is a B-ion or an Y-ion.Mass / ChargeAbundance (100%)2005088.033100400175.113274.112361.121430.213448.225Peptide Mass 429.212 Daltons22Basic Difficulty #2There are missing ions.Mass / ChargeAbundance (100%)20050100400274.112361.121Ion 1Ion 2Peptide Mass 429.212 Daltons23Feature 3 (to our Rescue)Com
15、plementary Ion Pairs: b1/y2 and b2/y1Mass / ChargeAbundance (100%)2005088.033100400175.113274.112361.121430.213448.225SWRRWSPeptide Mass 429.212 Daltonsy1y2y3b1b2b324Outline of This Talk (2)Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Comp
16、utational Complexity and More Robust AlgorithmsConclusions25Formulating the Computational ProblemT = an alphabet of 20 characters a1,a2,a20.two special characters: alpha and beta.the mass of alpha = 1, the mass of beta = 19, the mass of ai is mi.A peptide sequence is x1,x2,x3,xn-1,xn, where each xi
17、is from T.A b-ion is x0,x1,x2,xi for some 1 = i = n, where x0 = alpha.A y-ion is xi,xn-2,xn-1,xn,xn+1 for some 1 = i = n, where xn+1 = beta.26De Novo Peptide Sequencing ProblemInput: (1) the mass W of an unknown target peptide, and (2) a set S of the masses of some or all b-ions and y-ions of the pe
18、ptide.Output: a peptide P such that (1) mass(P)=W and (2) S is a subset of all the ion masses of P. Mass / ChargeAbundance (100%)50100274.112361.121Peptide Mass 429.212 DaltonsP = SWR,Mass(P) = 429.212,Ions(P) = 88.033, 175.113, 274.112, 361.121, 430.213, 448.22527Amino Acid Mass Table28Outline of T
19、his Talk (3)Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Complexity and More Robust AlgorithmsConclusions29peptide mass Wtandem mass spectrum SNC-spectrum graphFind feasible paths to order the masses in S to identify all the
20、b-ions and y-ions consistent with S.Basic Computing SchemeConvert feasible paths into legal peptide sequences30NC-Spectrum Graph: Nodes (1)0429.22N0C0mass of this peptide31NC-Spectrum Graph: Nodes (2)mass of this peptide0429.22N0C0174.11273.11 mass( ) + mass( ) = mass(P) + 18 Ion # 1 (274.11) Assump
21、tion 1:If Ion 1 is an y-ionC1: a b-ion nodeAssumption 2:If Ion 1 is a b-ionN1: a b-ion nodeC1N132NC-Spectrum Graph: Nodes (3)0429.22N0C0174.11273.11 mass( ) + mass( ) = mass(P) + 18 Ion # 2 (88.10) 87.10360.12C1N1C2N233NC-Spectrum Graph: Edges (1)0429.22N0C0174.11273.11 87.10360.12C1N1C2N2Mass(S) =
22、87.08.S34NC-Spectrum Graph: Edges (2)0429.22N0C0174.11273.11 87.10360.12C1N1C2N2Mass(S) = 87.08.SMass(W) = 186.21W35NC-Spectrum Graph: Edges (3)0429.22N0C0174.11273.11 87.10360.12C1N1C2N2Mass(S) = 87.08.SMass(W) = 186.21WS+WMass(S+W) = 273.2936NC-Spectrum Graph: Edges (4)0429.22N0C0174.11273.11 87.1
23、0360.12C1N1C2N2Mass(S) = 87.08.SMass(W) = 186.21WS+WMass(S+W) = 273.29RMass(R) = 156.1937NC-Spectrum Graph0429.22N0C0174.11273.11 87.10360.12C1N1C2N238NC-Spectrum Graph: Paths = Sequences0429.22N0C0174.11273.11 87.10360.12C1N1C2N2SWRb-ions39NC-Spectrum Graph: A Feasible Path (1)0429.22N0C0174.11273.
24、11 87.10360.12C1N1C2N2Definition: A feasible path is a path from N0 to C0 that goes through exactly one node for each pair (either Nj or Cj).a feasible pathSWRb-ions40NC-Spectrum Graph: A Feasible Path (2)0429.22N0C0174.11273.11 87.10360.12C1N1C2N2Definition: A feasible path is a path from N0 to C0
25、that goes through exactly one node for each pair (either Nj or Cj).a feasible pathSSGVVb-ionsy-ions41NC-Spectrum Graph: Not A Feasible Path (1)0429.22N0C0174.11273.11 87.10360.12C1N1C2N2Definition: A feasible path is a path from N0 to C0 that goes through exactly one node for each pair (either Nj or
26、 Cj).not a feasible path:miss ion #242NC-Spectrum Graph: Not A Feasible Path (2)0429.22N0C0174.11273.11 87.10360.12C1N1C2N2Definition: A feasible path is a path from N0 to C0 that goes through exactly one node for each pair (either Nj or Cj).not a feasible path:(2) repeat ion #143NC-Spectrum Graph:
27、Not A Feasible Path (3)0429.22N0C0174.11273.11 87.10360.12C1N1C2N2Definition: A feasible path is a path from N0 to C0 that goes through exactly one node for each pair (either Nj or Cj).not a feasible path:miss ion #2repeat ion #144Reformulating the De Novo Peptide Sequencing ProblemInput: an NC-spec
28、trum graph G.Output: a feasible path from N0 to C0. 45ObservationsA longest path does not always go through exactly one of each pair of nodes.It is an NP-hard problem if the spectrum graph is a general directed graph.46Basic AlgorithmInput: a peptide mass W and a tandem mass spectrum S.Output: a fea
29、sible peptide sequence.Steps:Compute the nodes of the NC-spectrum graph G.Compute the edges of G.Compute a feasible path P in G.Convert P into a feasible sequence.47Basic Algorithm (1)Input: a peptide mass W and a tandem mass spectrum S.Output: a feasible peptide sequence.Steps:Compute the nodes of
30、the NC-spectrum graph G.Compute the edges of G.Compute a feasible path P in G.Convert P into a feasible sequence.48Compute the Nodes of the NC-Spectrum GraphStep 2. Rename the nodes from left to right as X0, Xk,Yk,Y00429.22X0Y0174.11273.11 87.10360.12X2Y2Y1X10429.22N0C0174.11273.11 87.10360.12C1N1C2
31、N2Step 1. Compute the nodes and place them in the increasing order of masses.Observation: Xi and Yi form a complementary pair of nodes Ni and Ci for ion i.Running Time: O(k), where k = # of masses in the spectrum.49Basic Algorithm (2)Input: a peptide mass W and a tandem mass spectrum S.Output: a fea
32、sible peptide sequence.Steps:Compute the nodes of the NC-spectrum graph G.Compute the edges of G. inverse of each otherCompute a feasible path P in G.Convert P into a feasible sequence.50Compute the Edges of the NC-Spectrum Graph0429.22X0Y0174.11273.11 87.10360.12X2Y2Y1X1Basic Question: Given a mass
33、 u, is there a protein sequence with that mass?Solution: dynamic programming via a Boolean array E( ).precision = 0.01.Boolean array length L = peptide mass W / precision.Boolean array E(u/0.01) = 1 if u is the mass of a peptide; otherwise 0. dynamic programming E(j) = 1 if only E(j mi) =1 for some
34、amino acid mass mi.Running Time: (1) Computing E() takes O(L) time; or O(L/log L) via 4-Russian preprocessing. (2) Computing the edges takes O(k2) time.51Basic Algorithm (3)Input: a peptide mass W and a tandem mass spectrum S.Output: a feasible peptide sequence.Steps:Compute the nodes of the NC-spec
35、trum graph G.Compute the edges of G.Compute a feasible path P in G.Convert P into a feasible sequence.52Compute a Feasible Path (1)0429.22X0Y087.10360.12Y1X10429.22X0Y0174.11273.11 87.10360.12X2Y2Y1X1Recursion: Use the feasible paths of X0, Xi,Yj,Y0 to computethe feasible paths of X0, Xi, Xi+1,Yj+1,
36、Yj,Y0.Dynamic Programming: M(i,j) = 1 if there exist a path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR together contain exactly one of Xq and Yq for each q = 0, , maxi,j.Observation: There is a feasible path if and only if (1) for some i and k, there is an edge e from Xi to Yk
37、and M(i,k) = 1, or(2) for some k and j, there is an edge e from Xk to Yj and M(k,j) = 153Compute a Feasible Path (2)Dynamic Programming: M(i,j) = 1 if there exist a path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR together contain exactly one of Xq and Yq for each q = 0, , maxi,
38、j.Observation: There is a feasible path if and only if (1) for some i and k, there is an edge e from Xi to Yk and E(i,k) = 1, or(2) for some k and j, there is an edge e from Xk to Yj and E(k,j) = 1X0Y0YkXiPLPReX0Y0XkPLPReYj54Compute a Feasible Path (3)Dynamic Programming: M(i,j) = 1 if there exist a
39、 path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR together contain exactly one of Xq and Yq for each q = 0, , maxi,j.Base Case: M(0,0), M(0,1), M(1,0).Recurrence: If M(i,j-1) = 1 and edge(Xi, Xj) = 1, then M(j,j-1) = 1.If M(i,j-1) = 1 and edge(Yj, Yj-1) = 1, then M(i,j) = 1.If M
40、(j-1,i) = 1 and edge(Xj-1, Xj) = 1, then M(j,i) = 1.If M(j-1,i) = 1 and edge(Yj, Yi) = 1, then M(j-1,j) = 1.Idea: Extend PL and PR by one edge at a time.55Compute a Feasible Path (4)Dynamic Programming: M(i,j) = 1 if there exist a path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR
41、 together contain exactly one of Xq and Yq for each q = 0, , maxi,j.Recurrence: If M(i,j-1) = 1 and edge(Xi, Xj) = 1, then M(j,j-1) = 1.If M(i,j-1) = 1 and edge(Yj, Yj-1) = 1, then M(i,j) = 1.If M(j-1,i) = 1 and edge(Xj-1, Xj) = 1, then M(j,i) = 1.If M(j-1,i) = 1 and edge(Yj, Yi) = 1, then M(j-1,j)
42、= 1.X0Y0Yj-1XiPLPReXjYjX0Y0Yj-1XiPLPReXjYj56Compute a Feasible Path (5)Dynamic Programming: M(i,j) = 1 if there exist a path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR together contain exactly one of Xq and Yq for each q = 0, , maxi,j.Recurrence: If M(i,j-1) = 1 and edge(Xi, Xj
43、) = 1, then M(j,j-1) = 1.If M(i,j-1) = 1 and edge(Yj, Yj-1) = 1, then M(i,j) = 1.If M(j-1,i) = 1 and edge(Xj-1, Xj) = 1, then M(j,i) = 1.If M(j-1,i) = 1 and edge(Yj, Yi) = 1, then M(j-1,j) = 1.Computational Complexity: O(k2).57Algorithmic Result #1: Finding a Feasible PathInput: an NC-Spectrum Graph
44、 G=(V,E)Output: a feasible path in G.Computational Complexity: O(|V|2) time & O(|V|2) space.58Outline of This Talk (4)Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Complexity and More Robust AlgorithmsConclusions59Algorithmic
45、Result #2: Finding a Feasible Path (Improved)Input: an NC-spectrum graph G=(V,E).Output: A feasible path can be found in O(|V|+|E|) time.Idea: Speed up via pre-processing.60Amino Acid ModificationsA modification is an amino acid with slightly different atoms (and thus a different mass) from the typi
46、cal molecule. Importance of modifications: Amino acid modifications are related to functions. For example, a protein is active when phosphorylated and inactive when de-phosphorylated. 61Modification in the Tandem Mass SpectrumMass / ChargeAbundance (100%)20050100400SW+dRRW+dS62Spectrum Graph: As Before63Spectrum Graph: A Modified Feasible PathIdea: One mass change leads to one missi
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 煤矿辅助运输安全员技能理论考试题库150题(含答案)
- 科技美术教育中的心理策略创新教学方法
- 科技创新在办公设备智能化中的实践
- 美发店员工劳动培训与发展合同(2025年度)
- 二零二五年度专利技术著作权许可与转让协议
- 二零二五年度健康养生产品代理中介费协议
- 2025年度电商平台退货运费退款协议合同
- 二零二五年度文化产业投资有限公司股权转让协议书
- 二零二五年度事业单位专业技术人才引进聘用合同
- 二零二五年度农产品电商平台代理协议
- 监理日志表(标准模版)
- H3C-CAS虚拟化平台详细介绍
- 小学生韵母in、ing常见汉字与区分练习
- 药房品种类别及数量清单
- 机关档案管理工作培训PPT课件
- 初中物理人教版八年级下册 第1节牛顿第一定律 课件
- 网站培训内容trswcm65表单选件用户手册
- 连续平压热压机 三篇 俞敏等
- 打印版-圆与二次函数综合题精练(带答案)
- 各种阀门CAD图
- 工程结算书标准
评论
0/150
提交评论