生物信息期末考试重要文件.docx_第1页
生物信息期末考试重要文件.docx_第2页
生物信息期末考试重要文件.docx_第3页
生物信息期末考试重要文件.docx_第4页
生物信息期末考试重要文件.docx_第5页
已阅读5页,还剩5页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

生物信息学课程复习思考题一、名词解生物信息学bioinformaticsDotplot算法分子钟molecular clock隐马尔科夫模型hidden Markov model, HMMGene Ontology, GOmolecular phylogenetic tree序列比对sequence alignment空位罚分线性空位罚分 constant gap penalty多序列比对关系数据库Dayhoff突变数据矩阵BLOSUM矩阵blocks substitution matrix蛋白质结构分类数据库SCOP(structural classification of proteins)CATH蛋白质结构分类数据库系统发育树物种树基因树有根数、无根树最大似然法同源建模蛋白质结构预测蛋白质结构从头预测法蛋白质折叠FASTA-ALLNCBIEBIGenBankEntrezSRS系统同源性homology、同一性identity、形似性similarityneutral theory of molecular evolution最小二乘法neighbor-joinning methodmaximum parsimony基因组注释基因组学蛋白质组学PDBMEGA软件PHYLIP软件动态规划算法 dynamic programming algorithmSmith-Waterman algorithmNeedleman-Wunsch算法BLAST,BLASTn, BLASTp复习思考题1. 什么是生物信息学?其主要应用有哪些?2. 简述生物信息学发展史上重大的标志性成果? 3. 有人说生物将是下一场技术革命的热土,你认为生物信息学将对生物产业化有哪些方面的贡献?4.什么是生物学数据库?请举例说明。5. 一级数据库与二级数据库的区别是什么,请举例说明?6 Entrez的检索途径有哪些?7.为什么要进行序列比对?以核酸双序列比对为例简述序列比对的基本原理。8. 假设两条序列:catgt和acgctg。利用动态规划方法来进行序列全局比对分析(完成比对矩阵,并找到最佳比对。记分方法:匹配得分为2,失配得分为-1,空位罚分为-1。)。01c2a3t4g5t00-1-2-3-4-51 a-12 c-23 g-34 c-45 t-56 g-69. 假设两条序列:CACGA和CGA。利用Smith-Waterman算法来进行序列比对分析(建立比对矩阵,并找到最佳比对。记分方法:匹配得分为1,失配得分为0,空位罚分为-1。)。 10. 假设两条序列:CACGA和CGA。利用S. Needleman与C. Wunsch动态规划方法来进行序列比对分析(建立比对矩阵,并找到最佳比对。记分方法:匹配得分为1,失配得分为0,空位罚分为-1。)。11. 简述蛋白质二级结构预测流程。12. 简述蛋白质三级结构同源建模预测流程。13.为什么要进行蛋白质结构比对?简述蛋白质结构比对的基本原理。14.为什么说蛋白质高级结构是由一级结构决定的?15. 简述蛋白质编码基因预测流程。16简述基因组注释的基本流。17. 如何从头预测真核生物蛋白质编码基因?18.简述利用邻近法构建系统发育树的基本思想。19.简述UPGMA法构建系统发育树的基本思想。20. 简叙最大简约构建系统发育树的基本思想。20.设有4段序列,分别为:A:TAGG; B:TACG; C:AAGC; D:AGCC。利用UPGMA方法构建系统发育树。21.设有4段序列,分别为:A:TAGG; B:TACG; C:AAGC; D:AGCC。利用邻近法构建系统发育树。22. 什么是中性学说?中性学说对分子进化有什么影响?23. 你认为生物信息学学习需要掌握哪些基本的计算机基础?生物学基础?数学基础?24.什么是分子钟假说?25.简述构建系统发育树的步骤。三、文献阅读1. Welcome to the UCSC Genome Browser website. This site contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides portals to the ENCODE and Neandertal projects.We encourage you to explore these sequences with our tools. The Genome Browser zooms and scrolls over chromosomes, showing the work of annotators worldwide. The Gene Sorter shows expression, homology and other information on groups of genes that can be related in many ways. Blat quickly maps your sequence to the genome. The Table Browser provides convenient access to the underlying database. VisiGene lets you browse through a large collection of in situ mouse and frog images to examine expression patterns. Genome Graphs allows you to upload and display genome-wide data sets.2WebLogo is a web based application designed to make the generation of sequence logos as easy and painless as possible. Click here to create your own sequence logos. Sequence logos are a graphical representation of an amino acid or nucleic acid multiple sequence alignment developed by Tom Schneider and Mike Stephens. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each amino or nucleic acid at that position. In general, a sequence logo provides a richer and more precise description of, for example, a binding site, than would a consensus sequence.3.The National Center for Biotechnology Information (NCBI) is one of the worlds premier Web sites for biomedical and bioinformatics research. Based within the National Library of Medicine at the National Institutes of Health, USA, the NCBI hosts many databases used by biomedical and research professionals. The services include PubMed, the bibliographic database; GenBank, the nucleotide sequence database; and the BLASTalgorithm for sequence comparison, among many others.4. KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from genomic and molecular-level information. It is a computer representation of the biological system, consisting of molecular building blocks of genes and proteins (genomic information) and chemical substances (chemical information) that are integrated with the knowledge on molecular wiring diagrams of interaction, reaction and relation networks (systems information).The KEGG website at www.kegg.jp has become the primary site of the KEGG database developed by Kanehisa Laboratories. The GenomeNet website at www.genome.jp operated by Kyoto University Bioinformatics Center will continue to mirror the KEGG database and provide additional KEGG-based analysis services. 5. The GenBank sequence database is an annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced at National Center for Biotechnology Information (NCBI) as part of an international collaboration with the European Molecular Biology Laboratory (EMBL) Data Library from the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). GenBank and its collaborators receive sequences produced in laboratories throughout the world from more than 100,000 distinct organisms. GenBank continues to grow at an exponential rate, doubling every 10 months. Release 134, produced in February 2003, contained over 29.3 billion nucleotide bases in more than 23.0 million sequences. GenBank is built by direct submissions from individual laboratories, as well as from bulk submissions from large-scale sequencing centers.6.PubMed is a database developed by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM), one of the institutes of the National Institutes of Health (NIH). The database was designed to provide access to citations (with abstracts) from biomedical journals. Subsequently, a linking feature was added to provide access to full-text journal articles at Web sites of participating publishers, as well as to other related Web resources. PubMed is the bibliographic component of the NCBIs Entrez retrieval system.7. Bioinformatics consists of a computational approach to biomedical information management and analysis. It is being used increasingly as a component of research within both academic and industrial settings and is becoming integrated into both undergraduate and postgraduate curricula. The new generation of biology graduates is emerging with experience in using bioinformatics resources and, in some cases, programming skills.8.The resources provided by NCBI for studying the three-dimensional (3D) structures of proteins center around two databases: the Molecular Modeling Database (MMDB), which provides structural information about individual proteins; and the Conserved Domain Database (CDD), which provides a directory of sequence and structure alignments representing conserved functional domains within proteins(CDs). Together, these two databases allow scientists to retrieve and view structures, find structurally similar proteins to a protein of interest, and identify conserved functional sites.To enable scientists to accomplish these tasks, NCBI has integrated MMDB and CDD into the Entrez retrieval system (Chapter 15). In addition, structures can be found by BLAST, because sequences derived from MMDB structures have been included in the BLAST databases (Chapter 16). Once a protein structure has been identified, the domains within the protein, as well as domain “neighbors” (i.e., those with similar structure) can be found. For novel data not yet included in Entrez, there are separate search services available.Protein structures can be visualized using Cn3D, an interactive 3D graphic modeling tool. Details of the structure, such as ligand-binding sites, can be scrutinized and highlighted. Cn3D can also display multiple sequence alignments based on sequence and/or structural similarity among related sequences, 3D domains, or members of a CDD family. Cn3D images and alignments can be manipulated easily and exported to other applications for presentation or further analysis.9. R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, .) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of Rs strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. R is available as Free Software under the terms of the Free Software Foundations GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.10.EMBL-EBI is a pivotal partner in ELIXIR, the European life sciences infrastructure for biological information, as part of the European Strategy on Research Infrastructures (ESFRI) process. On behalf of ELIXIR, EMBL-EBI coordinates BioMedBridges, which develops technical bridges for interoperability between data and services in the ESFRI biomedical sciences research infrastructures. These collaborative projects are undertaken with our partners in the European Member States.11.PubMed Central (PMC) is the National Library of Medicines digital archive of full-text journal literature. Journals deposit material in PMC on a voluntary basis. Articles in PMC may be retrieved either by browsing a table of contents for a specific journal or by searching the database. Certain journals allow the full text of their articles to be viewed directly in PMC. These are always free, although there may be a time lag of a few weeks to a year or more between publication of a journal issue and when it is available in PMC. Other journals require that PMC direct users to the journals own Web site to see the full text of an article. In this case, the material will always be available free to any user no more than 1 year after publication but will usually be available only to the journals subscribers for the first 6 months to 1 year.To increase the functionality of the database, a variety of links are added to the articles in PMC: between an article correction and the original article; from an article to other articles in PMC that cite it; from a citation in the references section to the corresponding abstract in PubMed and to its full text in PMC; and from an article to related records in other Entrez databases such as Reference Sequences, OMIM, and Books.12. The primary data produced by genome sequencing projects are often highly fragmented and sparsely annotated. This is especially true for the Human Genome Project as a result of its policy of releasing sequence data to the public sequence databases every day (1, 2). So that individual researchers do not have to piece together extended segments of a genome and then relate the sequence to genetic maps and known genes, NCBI provides annotated assemblies of public genome sequence data. NCBI assimilates data of various types, from numerous sources, to provide an integrated view of a genome, making it easier for researchers to spot informative relationships that might not have been apparent from looking at the primary data. The annotated genomes can be explored using Map Viewer (Chapter 20) to display different types of data side-by-side and to follow links between related pieces of data.This chapter describes the series of steps, the “pipeline”, that produces NCBIs annotated genome assembly from data deposited in the publi

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论