版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、the 7 bridges in knigsberg and compositional representation of protein sequencesbailin hao (郝柏林)(itp & bgi, cas )huimin xie (谢惠民)(math dept. suzhou u)shuyu zhang (张淑誉)(ip. acad. sinica)compositional approach inprokaryote phylogeny justification of using k-tuples instead of primary protein sequen
2、ces. problem of uniqueness of reconstruction of protein sequence from its constituent k-tuples. picking up a special class of proteins without biological knowledge.7 bridges in knigsbergabdca ab bc cd d euler (1736) 4 odd nodes: no! dnes knig, theory of finite and infinite graphs 1st ed.(1932). birk
3、haser (1990)“ from knigsberg to knigs book, so runs the graphic talebasic notions a graph g=(v, e), where v is a set of nodes (vertices), e is a set of edges (bonds) edges: undirected (u,v)=(v,u), uand v adjacent; directed (u,v) differs from (v,u), u incident to v. a weight may be associated with (u
4、,v): cost, distance, transfer function, reaction rate, etc. eulerian graph: each edge appears once and only once in a path hamiltonian graph: each vertex appears once and only once in a path. hamiltonian cycle of minimal weight - travel salesman problem (tsp)an euler path:an euler loop:euler grahp:
5、loopsemi-euler praph: path, no loopproblem of eulerian loop: simple, known solutionsproblem of hamiltonian paths: much harder 1 12 23 34 46 65 57 71 12 23 34 46 65 57 7hamiltonian loops: much harder10 nodes15 arcs di=3 nodestraveling salesman problemnp-hard problemsyes!no!graph = nodes + arcsdirecte
6、d , labeled arcs and nodessimple graph: no rings at nodes: no repeated arcs:i ij jiijindegree din(i) outdegree dout(i)euler graph: din(i) = dout(i) di isimple euler graphdiagonal matrix: m=diag( d1, d2, dn )adjacent matrix: a=aij aij= aii= 0 kirchhoff matrix: c=m-a cij= cij=0 det(c)=0all minors of c
7、 are equal. denote this common minor by ni,j=010ijnumber of euler loops in simple euler graphn g de bruijnt van aardenrie ehrenfestc a b smithw t tuitebest theoreme (g) = (di-1)! inumber of eulerian loops in general eular g. some aii0 rings some aij1 parallel arcsputting auxiliary nodes on these rin
8、gs and parallel arcs makes the graph simple.i ij jino need to work with bigger a matrix.just let some aii0, aij1 in original a.eliminate redundancy caused by unlebeled arcs.modified best theorem: e(g) = (di-1)! aij!iijanpa_pseam 82aamalslftvgqliflfwtmriteaspdpaakaapaaaaapaaaapdtasdaaaaaaltaanakaaael
9、taanaaaaaaatargmalsalsllslfslftftvgtvgqvgqlakaak=5lftvltaa 7aaaa 5paaa 4apaa 3tann 8aana 9sspa 2aaap 6akaa 1auxiliary arc6 ringsfrom pdb.seq-a special selection of swissprot2821-1=2820 proteins ( may 2000 )rnumber of reconstructed aa sequences from a given protein decompositionr rk k1 12-102-101 11-
10、1001-1001 101-100001-10001 1001-001-1000010000 10 10 4 45 56 67 78 89 91 10 01 11 121642164(76.7%)40440490904545212193932 2651651(94%)7777292910104 4494927322732(96.9%)323216163 32 2444427402740(97.1%)232310103 30 044442762763 3(97.9%)1 13 37 71 10 0363627932793(99%)11117 72 21 16 627982798(99.2%)12
11、122 21 11 16 6compositional representation of proteinsthe collection w or w ,n j may be used as an equivalent representation of the original protein sequence.a seemingly trivial result upon further reflection: random aa sequences have unique reconstruction as well.compositional representation works
12、equally for random aa sequences and most of protein sequences.a given realization of a short random aa sequence is as specific as a real protein sequence. j=1mjki=1l= -k+1iknucleotide correlations in dna/rna much studied k=2 correlation functions 16 9 6 see wentian li, computer chem. 21(1997) 257-27
13、1.amino acid correlations in proteins almost no study hard to comprehend 400 correlation functions at k=2 proteins too short to define correlation functions one should approach the problem from a more deterministic point of viewrepeated aa segments in proteins are strong manifestation of correlation
14、s!on-going study: the other extremequit a few proteins have an enormous number of reconstructions. transmembrane antifreeze fibrous: collagenscoarse-graining: closer to biology by reducing the number of aasproteinproteinaaaar(1r(11)1)k kfor r(k)=1for r(k)=1maga_xenlamaga_xenlaspa2_staauspa2_staaus sr rtx_atrentx_atrenebn1_ebvebn1_ebvi icen_psesycen_psesyxpin_xenlaxpin_xenlacaih_mousecaih_mousec cipa_clotmipa_clotm3 303031 150508 84 45 508088640864028285 54 43 39.9732*109.9732*105 510110164641 15.407*105.407*1016162828120012001.55675*101.5
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
评论
0/150
提交评论