生物信息学4 陈润生_第1页
生物信息学4 陈润生_第2页
生物信息学4 陈润生_第3页
生物信息学4 陈润生_第4页
生物信息学4 陈润生_第5页
已阅读5页,还剩156页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、 For the past 30-40 years, biology at the molecular and cellular level has been studied from the perspective of analyzing individual genes and individual proteins. Systems biology, on the other hand, is interested in analyzing whole systems of genes or proteins. What this means is that we use tools

2、for capturing information from many different elements of the overall system. And we have to be able to integrate the information thats obtained from all the different biological levels-DNA information, RNA information, protein information, protein interaction information, pathways and so forth. The

3、 ultimate objective is to use this information to write mathematical models that are capable of predicting something about the structure of the biologic system under evaluation as well as predicting something about its properties, given particular kinds of stimuli or perturbations. Genomes highlight

4、 the Finitenessof the “Parts” in BiologyBacteria, 1.6 Mb, 1600 genes Science 269: 496Eukaryote, 13 Mb, 6K genes Nature 387: 1199519971998Animal, 100 Mb, 20K genes Science 282: 1945Human, 3 Gb, 100K genes ?2000?real thing, Apr 0098 spoof大规模基因功能表达谱的分析 随着人类基因组测序逐渐接近完成,人们自然会提出如下的问题:即使我们已经获得了人的完整基因图谱,那我们

5、对人的生命活动能说明到什么程度呢?人们进一步提出了一系列由上述数据所不能说明的问题,例如:基因表达的产物是否出现与何时出现;基因表达产物的定量程度是多少;是否存在翻译后的修饰过程,若存在是如何修饰的;基因敲除(knock-out)或基因过度表达的影响是什么;多基因差异表达与表现型关系如何等等。概括这些问题,其实质应该是:知道了核酸序列和基因,我们依然不知道它们是如何发挥功能的,或者说它们是如何按照特定的时间、空间进行基因表达的,表达量有多少。 microarraysnAffymetrixnOligosnDont have to know sequencenGlass slidesnPat br

6、own功能图谱From Cell, 2001, V0l. 104, 333基因组(基因组(Genome)转录组转录组 (Transcriptome)蛋白质组(蛋白质组(Proteome)相互作用组(相互作用组(Interactome)定位组(定位组(Localizome)折叠子组(折叠子组(foldome)代谢组(代谢组(Metabolome)表型组(表型组(Phenome)后基因组研究对象的多层次后基因组研究对象的多层次n遗传图谱(遗传图谱(Genetic map)n限制性图谱(限制性图谱(Restriction map)n物理图谱(物理图谱(Physical map)n功能图谱(功能图谱(

7、Functional maps)“快照快照”SPA序列结构功能ABCXYVZ相互作用网络功能研究思路的变化研究思路的变化From Cell, 2001, V0l. 104, 333(二)、系统生物学研究的一些例子基因通过复杂的多反馈网络发挥作用基因通过复杂的多反馈网络发挥作用复杂系统:一个病毒的基因和启动子相互作用的网决定了它是休眠还是复制TRENDS IN GENETICS5(2),67 (1999)Science 15 Jan 1999, Vol 283Mol. Biol. Cell 10, 27032734 (1999).Annotating the Yeast GenomeNetwor

8、k of yeast Sup35 proteinNetwork of yeast SIR protein(4 NOV 1999 Vol 402 , Nature)(4 May 2001 Vol 292 , Science)整合转录组和蛋白质组实验数据后获得的精细功能图谱整合转录组和蛋白质组实验数据后获得的精细功能图谱 Al Gilman主持E-cellScience April 2,1999,Vol 284Trey Ideker, et al, Integrated Genomic and Proteomic Analyses of a S

9、ystemtically Perturbed Metabolic Network, 4 May 2001 Vol 292 Michael T Laub, et al, Global Analysis of the Genetic Network Controlling a Bacterial Cell Cycle, 15 December, 2000 Vol 290, Science H. Jeong, et al.Lethality and centrality in protein networks,Vol 411, 3 MAY 2001 George von Dassow, Eli Me

10、ir, The segment polarity network is a robust developmental module, Vol 406,13 JULY 2000 H. Jeong, et al, The large-scale organization of metabolics networks, v407, 2000 Thomas Simon Shimizu, et al, Molecular model of a lattice of signalling proteins inVolved in bacterial chemotaxis, Nature Cell Biol

11、ogy, Vol 2, 2000 Michael B. Elowitz, et al, A synthetic oscillatory network of transcriptional regulators, , v403, 2000 S. Kalir, et al, Ordering Genes in a Flagella Pathway by Analysis of Expression Kinetics from Living Bacteria, Science, v292, 2001 Matthew Freeman, Feedback control of intercellula

12、r signalling in development, , v408 Chunyan Xu, et al, Overlapping Activators and Repressors Delimit Transcriptional Response to Receptor Tyrosine Kinase Signals in the Drosophila Eye, , Vol.103, 2000 Thomas Surrey, Francois Nedelec, Physical Properties Determining Self-Organization of Motors and Mi

13、crotubules , Vol 292 11 May 2001 Norbert Frey, et al, Decoding calcium signals inVolved in cardiac growth and function , * Volume 6 * Number 11 * November 2000 Reka Albert, et al, Error and attack tolerance of complex networks, , v406, 2000 Nature 415, 123 - 124 (2002) Nature 415, 141 - 147 (2002) M

14、odeling the Heart - from Genes to Cells to the Whole OrganSciences Vol 295 1 March 2002 The exhilarating progress of the past decade has brought an unprecedented wealth of quantitative information on living systems, from genomic sequences to protein structures and beyond. But although technical adva

15、nces make data collection ever easier, investigators are increasingly concerned by their inability to gain a bigger picture. How can this growing mountain of facts be assimilated, and where will the new ideas come from that will help us gain a broader perspective? Networks多信息融合构建功能图谱多信息融合构建功能图谱From

16、Cell, 2001, V0l. 104, 333From Cell, 2001, V0l. 104, 333酵母细胞周期表达谱分析共调控基因Nature, 2000, Vol 405, 15Spectral Analysis of the Protein-protein Interaction Network in Budding yeastThe topological structure in protein-protein interaction network In clique, proteins connect quite tightly, almost interacting

17、with each other. However, in each bipartite, proteins were divided into two parts, proteins seldom connect in same parts but connect tightly with proteins in counter part. A Clique b Bipartite The percentage of function classes in every clique 36912151821242730333639424548050100PercentageNumber of c

18、lique Discordant function Unknown function Main function Function prediction for SSU processome The protein-protein interaction network: before and after spectral analysis BREAKTHROUGH OF THE YEAR(2001): Science celebrates nine other areas in which important findings were reported this year, from su

19、batomic to atmospheric and beyond.First runner-up: RNA ascending. Short RNAs clearly play important biological roles. Dozens of the molecules are now known to exist in the nematode and fruit fly. The coding for these molecules is contained in the DNA sequence. Some 100 of these tiny RNA genes have b

20、een found in the gut bacterium Escherichia coli, and some 200 were uncovered in DNA from mouse brain tissue. In the nematode and fruit fly, they seem to be involved in development; in E. coli, they may facilitate rapid responses to environmental change and could serve similar functions in mammals. W

21、hat is a genome ?n1911 - gene:nElementary unit, responsible for the transmission of hereditary charactersn1920 - genome:nSet of genes of an organismn1944 - Avery et al.nDNA is the molecule of heredityn1950-70 :nDouble helix, Genetic codenGenome = set of DNA molecules present in a cell and transmitte

22、d to the offspringA genome is more than a set of genesnGenes (transcription unit):nProtein-coding genesnRNA genes:nrRNAs, tRNAs, snRNAs, etc.nUntranslated RNA genes (e.g. Xist, H19)nRegulatory elements (promoters, enhancers, etc.)nElements required for chromosome replication (replication origins, te

23、lomeres, centromeres, etc.)nNon-functional sequencesnNon-coding sequencesnRepeated sequencesnPseudogenesGenome sizeNumber of protein genesHuman vs E. coli:Genome size: x 1000Number of genes: x 10Proportion of functional elements within genomes17%0.5% Drosophila85%2%13%E. coli70%2%28%Yeast S. cerevis

24、iae1.5%0.5%98%Human28%0.5%71%Nematode C. elegans0.5%0.01%Lunfish (dipnoi)Coding (protein) RNA Non-coding82%99.5%Functional elements in the human genomenon-translated RNA genes: Xist, H19, His-1, bic, microRNAs, etc.Regulatory elements: promoters, enhancers, etc.Transposable elements (LINEs, SINEs, .

25、): 40-45% 86% no (known) function intergenic DNA60-70%Introns25%Coding regions(proteins) 1.7%tRNA, rRNA, 0,5%Satellite DNA (centromeres, telomeres) 12%3.4 109 nt 30 000 - 40 000 protein genesRepeated sequencesnTandem repeatsnSatellitenMinisatellitenMicrosatellitenInterspersed repeatsnDNA transposons

26、nRetroelements Tandem repeats motifbloc size% human genomesatellite: 2-2000 ntup to 10 Mb10%minisatellite:2-64 nt100-20,000 bp?microsatellite:1-6 nt10-100 bp2%Slippage of the DNA polymerase: CACACACACACAUnequal crossing-over:Centromeres, telomeres: Satellite DNAInterspersed repeatsnTransposable elem

27、ents (autonomous or non-autonomous) :nDNA transposons (rare in mammals)nRetroelements RetroelementsnLINEs (long interspersed elements): 6-8 kb retroposonsnSINEs (short interspersed elements):80-300 bp small-RNA-derived retrosequences (tRNA), pol III nEndogenous Retroviruses: 1.5-10 kbReverse transcr

28、iptase:NucleusCellRNADNAtranscriptionreverse transcriptionintegrationLTR gag pol env LTRRetrovirusRetrotransposonRetroposonRetrosquenceRetrovirusNucleusCellLINE reverse transcriptaseRetrosequences: opportunist retroelementsreverse transcriptionDNARNALINERNART proteinFrequency of transposable element

29、s in the human genomenTotal = 42% (Smit 1999)nProbably underestimatedThe frequency of transposable elements is not uniform along the human genome:e.g. inter-chromosomic variations (Smit 1999)PseudogenesnAfter a gene duplication:nevolution of new function (sub-functionalization or neo -functionalizat

30、ion)nor gene inactivationRetropseudogenesRetropseudogenesn23,000 to 33,000 retropseudogenes in the human genomenOften derive from housekeeping genesWhat is the total number of human genes? 28,0004,000 Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome bein

31、g intergenic DNA. One of the largest challenges is identifying the unknown functions that almost certainly exist in much of the “junk” DNA.Organism Year Millions Total Predicted Number of genes of bases coverage number per million bases sequenced (%) of genes sequenced Human genome rough draft 2001

32、2,693 84 31,780 12 (public sequence)Human genome rough draft 2001 2,654 83 39,114 15 (Celera sequence)Human chromosome 21 2000 34 75 225 7 Human chromosome 22 1999 34 70 545 16 Arabidopsis thaliana 2000 115 92 25,498 221 Drosophila metanogaster 2000 116 64 13,601 117 Caenorhabditis elegans 1998 97 9

33、9 19,099 197 Saccharomyces cerevisiae 1996 12 93 5,800 483 Coding DNA: 11.5%Noncoding DNA: intron 24% intergenic DNA 75% promoter telomeres repetitive 45% LINE 21 850,000(拷贝数) SINE 13 1,500,000 LTR 8 450,000 Transposons 3% 300,000 Noncoding and nonrepetitive 35% LINE play a crucial role in X inactiv

34、ation, the process by which one of the two X chromosomes in a female is turned off early in development. Small RNA 重复序列在基因组中的比例 Human 45% Arabidopsis 11% C. elegans 7% D. melanogaster 3%Mammalian genomes: summarynGenes, regulatory elements: 2%nNon-coding sequences: 98%nSatellite DNA (centromeres) 10

35、%nMicrosatellites 2%nTransposable elements 42%nPseudogenes 1%nOther (ancient transposable elements?) 43%nVariations in gene and repeat density along chromosomesComplexityMulticellular worldUnicellular worldeubacteriaarchaebacteria plants fungi animals-4,000-3,000-2,000-1,000presentTime (mya)single-c

36、elled eukaryotes A simplified biological history of the Earth(protista) Gene numbers do not increase as much as expected with complexity: - worm and fly gene numbers (12-14,000) are only about twice those of yeast (6,000) and P. aeruginosa (5,500)- mammalian (human, mouse) gene numbers (30,000) are

37、only about twice those of invertebrates. The complexity problem This suggests that:- animals have a relatively stable core proteome, whose components are multitasked in differentiation and development- variations in phenotype occurs mainly by variation in the control architecture (unlike prokaryotes

38、) Phenotypic variation in mammals is primarily associated with noncoding regions:- only 10,000 out of 3,000,000 polymorphisms between individual humans (0.3%) occur in protein coding sequences- only 1% of genes are different between humans and mice. 98% of transcriptional output in humans is noncodi

39、ng RNA Excised introns and other noncoding RNAs appear to be relatively stable (not degraded rapidly as is usually thought) Some introns / noncoding RNAs are highly conserved, e.g.:- Drosophila adh gene intron 1, tra gene intron 2, let-7 - Mouse/human T-cell receptor gene- Human / Xenopus g-actin in

40、tron 3 - but most not sequenced.50nspliceosomal RNA 7nU1, U2, U4, U5, U6, U7, U12Biological featuresn一级结构不保守n二级结构保守nRNA聚合酶II (U6 RNA聚合酶III)n种类少n各个物种各种snRNA都有发现U1U2U4U5U6U7U12Packaging RNA(pRNA)pRNA functionnThe pRNA molecules interact intermolecularly via hand-in-hand interaction to form a hexameric

41、 complex that is a crucial part of the viral DNA translocation motor.nThe pRNA appears to be directly involved in the DNA translocation process,leaving the procapsid after DNA packaging is completed.nThe sequential action of pRNA ensures the continuous function of the motor.Secondary structure of pR

42、NASignal Recognition Particle RNA(SRP RNA)SRP FUNCTION Signal recognition particle (SRP) is a ribonucleoprotein complex which interacts with signal sequences as they appear on the surface of translating ribosomes. Subsequent to signal peptide recognition, SRP binds to membrane receptors and peptide

43、recognition, SRP binds to membrane receptors and assures the proper delivery of secretary proteins.Magnus Alm Rosenblad, Jan Gorodkin1, Bjarne Knudsen2, Christian Zwieb3,* and Tore Samuelsson.SRPDB: Signal Recognition Particle Database.Nucleic Acids Research, 2003, Vol. 31, No. 1 363364.Schematic Re

44、presentation of SRPEubacterial RNAs that contain an Alu domain but that lack the helix 6(Bacillus RNA shown)guide RNA (gRNA)Definition Small RNA molecules that hybridise to specific mRNAs and guides the insertion or deletion of uridines (RNA editing) into mRNAs in Kinetoplastida. RNase P RNAnThe RNA

45、 subunit of cellular holoenzyme Ribonuclease P, an endoribonuclease generates mature 5-ends of tRNAs by cleaving the 5-leader elements of precursor-tRNAs. ScRNA(Small cytoplasmic RNA) FunctionA. Small cytoplasmic RNA. Including prosomes which are believed to be involved in post-transcriptional regul

46、tion of gene expression. Examples of scRNAs: nBC200 -scRNA nBC1 scRNA nHY1 - scRNA B. Small cytoplasmic RNA (scRNA) is a member of an evolutionarily conserved signal-recognition-particle-like RNA family. Functional analysis showed that C. perfringens scRNA could compensate for vegetative growth and

47、allow the formation of heat-resistant spores in an scRNA-depleted B. subtilis strain, whereas Escherichia coli 4.5S RNA could not maintain sporulation. C. Several findings suggest an important role of scRNA in protein biosynthesis.A Small RNA in Testis and Brain : Implications for Male Germ Cell Dev

48、elopment Journal of Cell Science vol. 115, no. 6, pp. 1243-1250 (March 15, 2002): BC1 RNA, a small non-coding RNA polymerase III transcript, is selectively targeted to dendritic domains of a subset of neurons in the rodent nervous system. It has been implicated in the regulation of local protein syn

49、thesis in postsynaptic microdomains. The gene encoding BC1 RNA has been suggested to be a master gene for repetitive ID elements that are found interspersed throughout rodent genomes. The infectious agents of prion disease such as CreutzfeldtJakob disease are thought to be composed of protein, with

50、no associated nucleic acids. But that may not be the end of the story. An experiment in a cell-free amplification system shows that unidentified host RNA molecules are required for efficient conversion of normal prion protein into its pathogenic form. Interestingly, these RNA molecules are not prese

51、nt in invertebrate species. This points to a possible involvement of host-coded RNA in the pathogenesis of prion diseases, and also provides a simple way of increasing the sensitivity of diagnostic tests based on the PMCA (protein misfolding cyclic amplification) method. Genome Res. 2003 June; 13 (6

52、b): 13011306 Identification of Putative Noncoding RNAs Among the RIKEN Mouse Full-Length cDNA CollectionKoji Numata,1,2 Akio Kanai,2 Rintaro Saito,2,4 Shinji Kondo,4 Jun Adachi,4 Laurens G. Wilming,6 David A. Hume,7,4,58 RIKEN GER Group4,5,8, GSL Members5,8, Yoshihide Hayashizaki,4,5 and Masaru Tomi

53、ta2,39 1Graduate School of Media and Governance, Bioinformatics Program, Keio University, Fujisawa, Kanagawa 252-8520, Japan 2Institute for Advanced Biosciences, Keio University, Fujisawa, Kanagawa 252-8520, Japan 3Department of Environmental Information, Keio University, Fujisawa, Kanagawa 252-8520

54、, Japan 4Laboratory for Genome Exploration Research Group, RIKEN Genomic Science Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan 5Genome Science Laboratory, RIKEN, Hirosawa, Wako, Saitama 351-0198, Japan 6Wellcome Trust Sanger Institute, Wellcome

55、Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK 7Institute for Molecular Bioscience and School of Molecular and Microbial Sciences, University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia 8Takahiro Arakawa, Piero Carninci, and Jun Kawai.9Corresponding author. Nature. 2003 Aug 1

56、4;424(6950):788-93 Comparative analyses of multi-species sequences from targeted genomic regions.Thomas JW, Touchman JW, Blakesley RW, Bouffard GG, Beckstrom-Sternberg SM, Margulies EH, Blanchette M, Siepel AC, Thomas PJ, McDowell JC, Maskeri B, Hansen NF, Schwartz MS, Weber RJ, Kent WJ, Karolchik D

57、, Bruen TC, Bevan R, Cutler DJ, Schwartz S, Elnitski L, Idol JR, Prasad AB, Lee-Lin SQ, Maduro VV, Summers TJ, Portnoy ME, Dietrich NL, Akhter N, Ayele K, Benjamin B, Cariaga K, Brinkley CP, Brooks SY, Granite S, Guan X, Gupta J, Haghighi P, Ho SL, Huang MC, Karlins E, Laric PL, Legaspi R, Lim MJ, M

58、aduro QL, Masiello CA, Mastrian SD, McCloskey JC, Pearson R, Stantripop S, Tiongson EE, Tran JT, Tsurgeon C, Vogt JL, Walker MA, Wetherby KD, Wiggins LS, Young AC, Zhang LH, Osoegawa K, Zhu B, Zhao B, Shu CL, De Jong PJ, Lawrence CE, Smit AF, Chakravarti A, Haussler D, Green P, Miller W, Green ED.Ge

59、nome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892,USA. 12,900 new human ncRNAsThree papers from Affymetrix:Kapranov et al. 2002. Science 296, 916Kampa et al. 2004. Genome Research 14, 331Cawley et al. 2004. Cell 116, 499 Science

60、 Vol 304 28 May 2004 Ultraconserved Elements in the Human Genome Gill Bejerano 1*, Michael Pheasant 2, Igor Makunin 2, Stuart Stephen 2, W. James Kent 1, John S. Mattick 2, David Haussler 3* 1 Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.2 AR

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论