版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、From Genetics to BioinformaticsDNA structure and its biological functionThe Human GenomeThe Human Genome ProjectModern Biology Molecular Basis of InheritanceI. DNA structure and its biological functionMendel: The Father of Genetics1865 Gregor Mendel discover the basic rules of heredity of garden pea
2、.复杂的生物学特征可以用数学规律来描述。What is these factor? And where are they located?CellChromosomeNuclein(Didnt know its function)DNA structure and its biological functionJohann MiescherJohann Miescher discovered DNA and named it nuclein.Major events in the history of Molecular Biology 1900-19111902 - Emil Fischer
3、 wins Nobel prize: showed amino acids are linked and form proteinsEmil Fischerintroduced formulas depicting the spatial arrangement of groups around chiral carbon atoms Fruit Fly:Finding the Genes1911 Thomas Morgan discovers genes on chromosomes are the discrete units of heredityThomas MorganMajor e
4、vents in the history of Molecular Biology 1900-19111910-1925: Development of Cytological GeneticsCytogenetics is the study of chromosomes and chromosome abnormalities (畸形)The relationship between genes and proteins was first proposed by Garrod in 1908Garrod, a prominent physician at St. Bartholomews
5、 Hospital in London, understood both the new science of biochemistry and the emerging discipline of geneticsInborn Errors of MetabolismFollowing Mendels laws, Garrod concluded that alkaptonuria is a congenital disorder(先天性的变异), not the result of a bacterial infection as was commonly thought.He obser
6、ved that inherited diseases reflect a patients inability to make a particular enzyme, which he referred to as “inborn errors of metabolism”Inborn Errors of MetabolismTetranucleotide HypothesisPhoebus Levene (Russian-American, 1869-1940) He worked with Albrecht Kossel and Emil Fischer, the nucleic ac
7、id and protein experts at the turn of the 20th. CenturyThe simplicity of the structure implied that DNA was too uniform to contribute to complex genetic variationGeorge BeadleEdward TatumIdentify that genes make proteins, but what is gene?Major events in the history of Molecular Biology 1940 - 1950O
8、ne gene-One enzyme Hypothesis“Transforming Principle” identified as DNAAverys workThe Hershey-Chase ExperimentThe Americans Alfred Hershey (1908-1997) and Martha Chase (1930-2003) published in 1952 a now classical paper.Erwin Chargaff showed (1950s):Amount of adenine relative to guanine differs amon
9、g speciesAmount of adenine always equals amount of thymine and amount of guanine always equals amount of cytosine%A=%T and %G=%CEdwin Chagraff (1905-2002)Base RatiosX-ray Crystallography Applied to Nucleic AcidsBetween 1940s and 1950s: Maurice Wilkins (1916-) and Rosalind Franklin (1920-1958) worked
10、 on X-ray/DNA.James Watson (American, 1928-)Francis Crick (British, 1916-2004)Major events in the history of Molecular Biology 1952 - 1960PrincipleDataSourceX-ray crystallographyStacked layers of subunits in spirals; long chain, no ruling out of two chains, sugar-phosphate in the outsideWilkins and
11、Franklin (but mostly Franklin)Organic chemistry4 nucleotidesLeveneBiochemistrya-helix, model buildingPaulingChromatographyBase ratiosChargaffChemical bondingRight form of the basesJ. DonahueMathematicsAttractive forces between DNA basesJ. GriffithEnter Watson and CrickInformational approach(transfer
12、 of information, translation of information)The central dogma of molecular biologyDNAmRNA(messenger)rRNA(ribosomal)tRNA(transfer)ProteinRibosometranscriptiontranscriptiontranscriptiontranslationMolecular biology is born1972 Paul Berg and co-workers create the first recombinant DNA molecule.Major eve
13、nts in the history of Molecular Biology 1970- 1977Proc Natl Acad Sci U S A. 1972 ,69(10):2904-2909. 1977Allan Maxam and Walter Gilbert (pictured) at Harvard University and Frederick Sanger at the U.K. Medical Research Council (MRC) independently develop methods for sequencing DNA ( PNAS , February;
14、PNAS , December). Maxam, A.M., Gilbert, W. Proc Natl Acad Sci, 74 (2): 560-4. 1977. A new method for sequencing DNA.GenBank Database Formed1982,GenBank, NIHs publicly accessible genetic sequence database, was formed at Los Alamos National Laboratory. Scientists submit DNA sequence data from a wide r
15、ange of organisms to GenBank; researchers routinely retrieve and analyze the data in the archive.Major events in the history of Molecular Biology 1980 - 19951983: First Disease Gene Mapped亨廷顿舞蹈病A genetic marker linked to Huntington disease was found on chromosome 4 in 1983, making Huntington disease
16、, or HD, the first genetic disease mapped using DNA polymorphisms. A polymorphic DNA marker genetically linked to Huntingtons disease. Nature, 306(5940):234-8 1983. PCR Invented at 1985: 1993 Nobel Prize in Chemistry 1985, Kary Mullis and colleagues at Cetus Corp. develop PCR , a technique to replic
17、ate vast amounts of DNA 1986Leroy Hood and Lloyd Smith of the California Institute of Technology and colleagues announce the first automated DNA sequencing machineThe first automated DNA sequencing machineThe Secret to Sanger SequencingPrinciples of DNA Sequencing55 Primer3 TemplateG C A T G CdATPdC
18、TPdGTPdTTPddATPdATPdCTPdGTPdTTPddCTPdATPdCTPdGTPdTTPddTTPdATPdCTPdGTPdTTPddCTPGddC GCATGddC GCddA GCAddT ddG GCATddG Automating Sanger Sequencing1986: First Time Gene Positionally Cloned A genetic linkage map of the human genome. Cell. 1987 Oct 23;51(2):319-37. The first comprehensive genetic map of
19、 human chromosomes was based on 400 restriction fragment length polymorphisms (RFLPs), 1987: YACs Developed Yeast artificial chromosomes (YAC) can carry large segments of DNA from other species, like humans. YACs can carry million base-pair-long fragments of human DNA, whereas plasmids and viruses c
20、arry a few thousand base-pair-long pieces only. 1989: Microsatellites, New Genetic MarkersWeber, J.L., May, P.E. Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Am J Hum Genet, 44:388-96 1989. A microsatellite is a stretch of DNA made of a two to fou
21、r base-pair long sequence that is repeated in tandem e.g. a stretch of DNA that looks like this: CAGCAGCAGCAGCAGCAGCAG.1989: Sequence-tagged Sites, Another Marker 序列标记位点A sequence-tagged site (STS) is a unique stretch of DNA that polymerase chain reaction (PCR) can easily detect. STSs are very usefu
22、l for making physical maps of human chromosomes. Creating a physical map is much like putting together a large puzzle, where the pieces of the puzzle are pieces of DNA made by cutting up chromosomes. A Common Language for Physical Mapping of the Human Genome. Science, 245:1434-5. 1989 1988NIH establ
23、ishes the Office of Human Genome Research and snags Watson (pictured) as its head. Watson declares that 3% of the genome budget should be devoted to studies of social and ethical issues.Office of Human Genome Research1990: Launch of the Human Genome ProjectWatson, J.D., Jordan, E. The Human Genome P
24、rogram at the National Institutes of Health. Genomics, 5: 654-56. 1989Beginning in December 1984, the U.S. Department of Energy (DOE), National Institutes of Health (NIH) and international groups had sponsored meetings to consider the feasibility and usefulness of mapping and sequencing the human ge
25、nome. The Human Genome ProjectIn 1990, DOE and NIH published a plan for the first five years of what was projected to be a 15-year project. The goals of the project included: Mapping the human genome and eventually determining the sequence of all 3.2 billion letters in it;Mapping and sequencing the
26、genomes of other organisms important to the study of biology;Developing technology for analyzing DNA; Studying the ethical, legal and social implications of genome research. The human chromosomesGenomeDNA Sequencing and work flowGoals of the Human Genome Project1. Map and sequence the human genomeBu
27、ild genetic and physical maps spanning the human genome. Determine the sequence of the estimated 3 billion letters of human DNA, to 99.99% accuracy. Chart variations in DNA spelling among human beings. Map all the human genes. Begin to label the functions of genes and other parts of the genome. Goal
28、s of the Human Genome Project2. Map and sequence the genomes of model organisms The bacterium E. coli (4.6 million) The yeast S. cerevisiae (12 million) The roundworm C. elegans (100 million) The fruitfly D. melanogaster (180 million) The mouse M. musculus (3 billion) 3. Collect and distribute data
29、Distribute genomic information and the tools for using it to the research community. Release all sequence data that spans more than 2000 base pairs within 24 hours. Create and run databases. Develop software for large-scale DNA analysis. Develop tools for comparing and interpreting genome informatio
30、n. Share information with the wider public. Goals of the Human Genome Project4. Study the ethical, legal and social implications of genetic research 5. Train researchers6. Develop technologiesMake large-scale sequencing faster and cheaper. Develop technology for finding sequence variations. Develop
31、ways to study functions of genes on a genomic scale. 7. Transfer technology to the private sectorGoals of the Human Genome ProjectTime Line of the Human Genome ProjectStandard Molecular Biology techniques running agarose gels.CS-Packard DNA Production Robotic Systems (x 3)Capillary ElectrophoresisSe
32、paration by Electro-osmotic FlowTechnology生命科学本质的探索 第一阶段:建立了遗传的细胞基础染色体。第二阶段:定义了遗传的分子基础DNA双螺旋。第三阶段:解开了遗传的信息基础中心法则。 伴随着细胞识别基因信息的生物学机理的发现,与DNA重组克隆和测序技术的发明,通过运用这些技术,科学家可以探索基因中包含的信息。第四阶段,完成一项伟大科学计划人类基因组计划。 把人类26条染色体上32亿对碱基的序列测出并完成相应的分析。1998: Company Announces Sequencing Plan In May 1998, the company Cele
33、ra Genomics was formed to sequence much of the human genome in three years. While, the company used many HGP-generated resources, unlike the HGP, which built detailed maps before sequencing defined regions, Celera used a shotgun sequencing strategy, in which the entire genome is fragmented and rando
34、m segments are sequenced and then put in order. 1996 HGP startShotgun SequencingSequenceChromatogramSend to ComputerAssembledSequenceDraft Sequences, 2001International Human Genome Sequencing Consortium (public project)Initial Sequencing and Analysis of the Human Genome. Nature 409:860-921, 2001Cele
35、ra Genomics Venter JC et al. (private project)The Sequence of the Human Genome. Science 291:1304-1351, 2001.Biochemistrymolecular biologybioinformatics 核苷酸序列的生化表示方式三字母氨基酸符号表示的序列 Ser-Gly-Tyr-Ala-Leu单字母氨基酸符号表示的序列 SGYAL左侧为蛋白质多肽的N末端,右侧为C末端一段蛋白质序列的化学结构式 氨基酸序列的生化表示方式生物信息学数据中核酸序列的表示方式注意DNA与RNA的表示方式相同,U,T都用
36、T表示。 核苷酸的IUB/IUPAC符号AAdenosine 腺苷MA C,amino 氨基CCytidine 胞啶 SG C, strong 强相互作用的核苷酸GGuanine 鸟苷WA T ,weak 弱相互作用的核苷酸TThymidine 胸苷BG T C 非A核苷酸UUridine 尿苷DG A T 非C核苷酸RG A,purine 嘌呤HA C T 非G核苷酸YT C,pyrimidine嘧啶VG C A 非T核苷酸KG T,keto 酮式N A G C T any,任意一种碱基-Gap of indeterminate length 不明长度的空位 蛋白质的一级结构是氨基酸序列,在
37、生物信息学分析过程中,蛋白质的序列信息通常是以单字母符号进行信息的存贮,而非三字母的形式。 肌红蛋白(Myoglobin),含有154个氨基酸残基的多肽链的,在生物信息学数据库中的以如下方式存贮:MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG其中第一字母M(即甲硫氨酸,Met)为肌红蛋白的N末端,而最后一个字母G(即甘氨酸,Gly)为肌
38、红蛋白的C末端。生物信息学数据中蛋白质序列表示方式序列数据FASTA格式肌红蛋白MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYRgi|4504345|ref|NP_000508.1| alpha 2 globin Homo sapiensMVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKV
39、ADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR 其中标题行相关部分用“|”分隔,其序列的GI号为4504345,登录号为NP_000508.1,英文名称为alpha 2 globin,Home sapiens表示是在“人”种的。 GenBank格式生物信息学数据的文件形式文本文件 (flat-file)信息在文件中顺序存放且具有特定格式记录(Entry)通过“获得号”(accession #)唯一确定同一文件间和不同文件间信息的联系均通过accession #实现关系数据库 (rel
40、ational DB)基于实体联系模型 (E-R模型)表中的记录(record/tuple)键唯一确定表之间通过外键建立联系信息表示:关系数据库semanticmappingAttributesRelations查询语义映射和处理过程结果语义匹配http:/ 生物信息学数据存在的问题信息源分布在世界各地不同的站点上涉及多个数据源的全局问题无法立刻得到答案Painfully collecting unstructured information around the sitesManually putting pieces togetherHopefully getting the right
41、picture.总之,信息源的特点是:自治的 (autonomous)分布式的 (distributed)异构的 (heterogeneous)数据集成Data Integration数据集成XMLXMLSite ASite BData Integration生物信息学最重要的任务是从海量数据中提取新知识三、生物数据库的种类三、生物数据库的种类生物数据库的发展方向序列数据库主要核酸序列数据库: GenBank、EMBL、 DDBJINSDC主要蛋白质序列数据库:Swissprot, PIRUniprot 美国的核酸数据库GenBankBanson,D.A. et al. (1998) Nucleic Acids Res. 26, 1-7从1979年开始建设,1982年正式运行;欧洲分子生物学实验室的EMBL数据库也于1982年开始服务日本于1984年开始建立国家级的核酸数据库DDBJ,并于1987年正式服务。从那个时候以来,DNA序列的数据已经从80年代初期的百把条序列,几十万碱基上升至现在的110亿碱基!这就是说,在短短的约18年间,数据量增长了近十万倍。核酸序列数据库核酸序列核酸序列是由4种核苷酸的单字母(ATGC)符号排成的序列。蛋白质序列数据库SWISS-PROT和PIR是国际上二个主要的
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024年综合性质押担保公司贷款协议3篇
- 2025版生态环境保护与生态修复工程设计合同3篇
- 2025版城市更新项目旧房收购合同范本3篇
- 2025版旅游度假村租赁及经营管理合同3篇
- 2025年度青少年心理健康教育辅导合同范本3篇
- 2025年企业班车租赁与员工上下班接送合同3篇
- 2025年度水利工程PE管材长期供货及售后服务合同3篇
- 2025版酒店与旅行社国际旅游业务合作协议2篇
- 2025年度房地产销售代理合同with标的:公寓楼销售代理服务2篇
- 2024桶装水销售代理合同
- 2025年上半年河南省西峡县部分事业单位招考易考易错模拟试题(共500题)试卷后附参考答案-1
- 深交所创业板注册制发行上市审核动态(2020-2022)
- 手术室护理组长竞聘
- 电力系统继电保护试题以及答案(二)
- 小学生防打架斗殴安全教育
- 2024-2025学年九年级英语上学期期末真题复习 专题09 单词拼写(安徽专用)
- 网络运营代销合同范例
- 2024年新人教版七年级上册历史 第14课 丝绸之路的开通与经营西域
- 植保无人机安全飞行
- 医疗纠纷事件汇报
- 2024年村干部个人工作总结例文(3篇)
评论
0/150
提交评论