基因功能注释工具与数据库_第1页
基因功能注释工具与数据库_第2页
基因功能注释工具与数据库_第3页
基因功能注释工具与数据库_第4页
基因功能注释工具与数据库_第5页
已阅读5页,还剩44页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Gene Functional Annotation Tools and DatabaseszhangminProviding advanced genomic solutions! OutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classificationInterPro and InterProScanPracticeOutlineWhat is functional anno

2、tation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classificationInterPro and InterProScanA simple exampleGenome AssemblyAssemble the Pieces RightGene PredictionWhen on board HMS Beagle, as naturalist, I was much struck with certain facts in the dist

3、ribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers .Identi

4、fy the wordsWhen on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of s

5、pecies - that mystery of mysteries, as it has been called by one of our greatest philosophers .Functional AnnotationWhen on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to

6、 the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers .naturalist nach-er-uh-list, nach-ruh-noun1. a person who studies or is an expert in natural history,

7、especially a zoologist or botanist.2. an adherent of naturalism in literature or art.Origin: 158090; natural + -istOrigin of Species, Thenoun( On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life ) a treatise (1859) by Charles Darwin

8、setting forth his theory of evolution. Identify the function (i.e., meaning) of each wordDATABASESPROFILESWhat information can be used for functional annotation?Sequence based approachesProtein A has function X, and protein B is a homolog (ortholog) of protein A; Hence B has function XStructure base

9、d approachesProtein A has structure X, and X has so-so structural features; Hence As function sites areMotif based approaches (sequence motifs, 3D motifs)A group of genes have function X and they all have motif Y; protein A has motif Y; Hence protein As function might be related to X“Guilt-by-associ

10、ation”Gene A has function X and gene B is often “associated” with gene A, B might have function related to XDomain fusion, phylogenetic profiling, PPI, etcOutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classification

11、InterPro and InterProScanA simple exampleBiological SequencesSequence similarity is a powerful tool for discovering biological function. Just as the ancient Greeks used comparative anatomy to understand the human body and linguists used the Rosetta stone to decipher Egyptian hieroglyphs, today we ca

12、n use comparative sequence analysis to understand genomes, RNAs, and proteins. But why are biological sequences similar to one another in the first place? The answer to this question isnt simple and requires an understanding of molecular and evolutionary biology. Biological sequences like proteins m

13、ay have important functions necessary for the survival of an organism. But DNA sequence can mutate randomly, and this may change how a sequence functions. Over time, both functional constraints and random processes impact the course of sequence evolution. The degree to which a sequence follows a fun

14、ctional or random path depends on natural selection and neutral evolution. So the reason why sequences are similar to one another is because they start out similar to one another and follow different paths. Basic Local Alignment Search ToolDivide a query sequence into short chunks called words,Look

15、for exact matchesin case of hit try extending the alignmentStatistical assessmentDifferent flavors!BLASTNQueries nucleotide vs. nucleotide sequencesBLASTPQueries protein vs. protein sequencesBLASTXQueries 6 possible frames of nucleotide sequences vs. protein sequencesTBLASTNReciprocal of BLASTX(库和核算

16、序列都翻译成6框)TBLASTXQueries 6 possible frames of nucleotide sequences vs. 6 possible frames of nucleotide sequences inside the databaseHMMER HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabili

17、stic models called profile hidden Markov models (profile HMMs).Representation of a Hidden Markov model based on a multiple sequence alignment.HMMER algorithmshmmscan - search protein sequences against collections of profiles, e.g. Pfam. In HMMER2 this was called hmmpfam.hmmsearch - used to search on

18、e or more profiles against a protein sequence database. jackhmmer - iteratively search a query protein sequence, multiple sequence alignment or profile HMM against the target protein sequence database.phmmer - used to search one or more query protein sequences against a protein sequence database./se

19、arch/hmmscanOutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classificationInterPro and InterProScanA simple exampleNucleotide and protein databasesNCBI (USA), EMBL (Europe), DDBJ (Japan)EST, STS, GSS, Genomes, RefSeq,

20、 HTG, etc. International Nucleotide Sequence Database CollaborationGenbankCoreNucleotide - Nt/NrdbESTdbGSSNCBI Nt/NrNt - Nucleotide collection The nucleotide collection consists of GenBank+ EMBL+ DDBJ+ PDB+RefSeq sequences, but excludes EST, STS, GSS, WGS, TSA, patent sequences as well as phase 0, 1

21、, and 2 HTGS sequences. The database is partially non-redundant.Nr - Non-redundant protein sequences All non-redundant GenBank CDS translations+PDB +SwissProt + PIR+PRF excluding environmental samples from WGS projects.UniProtKB/Swiss-ProtUniProtKB - Protein knowledgebase, consists of two sections:S

22、wissProt: manually annotated and reviewed.TrEMBL: automatically annotated and is notreviewed.Model Organism GenomesUseful ToolsKey word searchBLAST, BLATGenome browseBiomartOther functional resourseOutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGe

23、ne functional annotation and classificationInterPro and InterProScanA simple exampleGene functional annotation and classificationTo interpret a protein in the context of biological functionProtein domains, families, functional sites, pathways or other biological meaningful aspectsProtein domain fami

24、ly, PFAMGene ontologyKEGG pathwayKOG/COG PFAM14831 families, high quality Pfam-A, low quality Pfam-B.Annotation tools: hmmscan (HMMER 3.0)The Pfam database is a large collection of protein families, each represented bymultiple sequence alignmentsandhidden Markov models (HMMs).PFAM featuresGene Ontol

25、ogyAim to standardizing the representation of gene and gene product attributes across species and databases.GO covers three domains: biological process, cellular component and molecular function.For example, Cytochrome P450 11B1, mitochondrialGO cellular component term:GO:0005743Where is it?Mitochon

26、drial p450mitochondrial inner membraneGO molecular function term:GO:0004497What does it do?substrate + O2 = CO2 +H2O + productmonooxygenase activityGO biological process term: GO:0006118Which process is this?electron transportDAGpart_ofis_aGO AnnotationMappings to GOEC2GO, Pfam2GO, COG2GOAnnotation

27、toolsBlast2goGoannaGotchaCOG/KOGClusters of Orthologous Groups of proteinseuKaryotic Ortholog GroupsHow to define ortholog?Bet - best hitEach COG included proteins from at least three sufficiently distant species?COG/KOGKyoto Encyclopedia of Genes and Genomes KEGG (Kyoto Encyclopedia of Genes and Ge

28、nomes) is a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The PATHWAY database records networks of molecular interactions in the cells, and variants of them specific to particular organisms. Kanehisa LaboratoriesKEGG orthologyKAAS, for ortholog as

29、signment and pathway mappingA set of represent genomes, bi-directional best hitKEGG pathwayhsa00010ko00010map00010Glycolysis / GluconeogenesisKEGG APIhttp:/rest.kegg.jp/ = info | list | find | get | conv | link = | : path for kegg pathway, ko for kegg orthology : + + TASKGet a kegg pathway map.Get gene list that involve that pathway.TASK 1http:/rest.kegg.jp/info/pathwayhttp:/rest.kegg.jp/list/pathway http:/rest.kegg.jp/get/map00010/image Glycolysis / GluconeogenesisGet a kegg pathway map.TASK 2Get gene list that involve that p

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论