版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Gene Functional Annotation Tools and DatabaseszhangminProviding advanced genomic solutions! OutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classificationInterPro and InterProScanPracticeOutlineWhat is functional anno
2、tation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classificationInterPro and InterProScanA simple exampleGenome AssemblyAssemble the Pieces RightGene PredictionWhen on board HMS Beagle, as naturalist, I was much struck with certain facts in the dist
3、ribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers .Identi
4、fy the wordsWhen on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of s
5、pecies - that mystery of mysteries, as it has been called by one of our greatest philosophers .Functional AnnotationWhen on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to
6、 the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers .naturalist nach-er-uh-list, nach-ruh-noun1. a person who studies or is an expert in natural history,
7、especially a zoologist or botanist.2. an adherent of naturalism in literature or art.Origin: 158090; natural + -istOrigin of Species, Thenoun( On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life ) a treatise (1859) by Charles Darwin
8、setting forth his theory of evolution. Identify the function (i.e., meaning) of each wordDATABASESPROFILESWhat information can be used for functional annotation?Sequence based approachesProtein A has function X, and protein B is a homolog (ortholog) of protein A; Hence B has function XStructure base
9、d approachesProtein A has structure X, and X has so-so structural features; Hence As function sites areMotif based approaches (sequence motifs, 3D motifs)A group of genes have function X and they all have motif Y; protein A has motif Y; Hence protein As function might be related to X“Guilt-by-associ
10、ation”Gene A has function X and gene B is often “associated” with gene A, B might have function related to XDomain fusion, phylogenetic profiling, PPI, etcOutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classification
11、InterPro and InterProScanA simple exampleBiological SequencesSequence similarity is a powerful tool for discovering biological function. Just as the ancient Greeks used comparative anatomy to understand the human body and linguists used the Rosetta stone to decipher Egyptian hieroglyphs, today we ca
12、n use comparative sequence analysis to understand genomes, RNAs, and proteins. But why are biological sequences similar to one another in the first place? The answer to this question isnt simple and requires an understanding of molecular and evolutionary biology. Biological sequences like proteins m
13、ay have important functions necessary for the survival of an organism. But DNA sequence can mutate randomly, and this may change how a sequence functions. Over time, both functional constraints and random processes impact the course of sequence evolution. The degree to which a sequence follows a fun
14、ctional or random path depends on natural selection and neutral evolution. So the reason why sequences are similar to one another is because they start out similar to one another and follow different paths. Basic Local Alignment Search ToolDivide a query sequence into short chunks called words,Look
15、for exact matchesin case of hit try extending the alignmentStatistical assessmentDifferent flavors!BLASTNQueries nucleotide vs. nucleotide sequencesBLASTPQueries protein vs. protein sequencesBLASTXQueries 6 possible frames of nucleotide sequences vs. protein sequencesTBLASTNReciprocal of BLASTX(库和核算
16、序列都翻译成6框)TBLASTXQueries 6 possible frames of nucleotide sequences vs. 6 possible frames of nucleotide sequences inside the databaseHMMER HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabili
17、stic models called profile hidden Markov models (profile HMMs).Representation of a Hidden Markov model based on a multiple sequence alignment.HMMER algorithmshmmscan - search protein sequences against collections of profiles, e.g. Pfam. In HMMER2 this was called hmmpfam.hmmsearch - used to search on
18、e or more profiles against a protein sequence database. jackhmmer - iteratively search a query protein sequence, multiple sequence alignment or profile HMM against the target protein sequence database.phmmer - used to search one or more query protein sequences against a protein sequence database./se
19、arch/hmmscanOutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classificationInterPro and InterProScanA simple exampleNucleotide and protein databasesNCBI (USA), EMBL (Europe), DDBJ (Japan)EST, STS, GSS, Genomes, RefSeq,
20、 HTG, etc. International Nucleotide Sequence Database CollaborationGenbankCoreNucleotide - Nt/NrdbESTdbGSSNCBI Nt/NrNt - Nucleotide collection The nucleotide collection consists of GenBank+ EMBL+ DDBJ+ PDB+RefSeq sequences, but excludes EST, STS, GSS, WGS, TSA, patent sequences as well as phase 0, 1
21、, and 2 HTGS sequences. The database is partially non-redundant.Nr - Non-redundant protein sequences All non-redundant GenBank CDS translations+PDB +SwissProt + PIR+PRF excluding environmental samples from WGS projects.UniProtKB/Swiss-ProtUniProtKB - Protein knowledgebase, consists of two sections:S
22、wissProt: manually annotated and reviewed.TrEMBL: automatically annotated and is notreviewed.Model Organism GenomesUseful ToolsKey word searchBLAST, BLATGenome browseBiomartOther functional resourseOutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGe
23、ne functional annotation and classificationInterPro and InterProScanA simple exampleGene functional annotation and classificationTo interpret a protein in the context of biological functionProtein domains, families, functional sites, pathways or other biological meaningful aspectsProtein domain fami
24、ly, PFAMGene ontologyKEGG pathwayKOG/COG PFAM14831 families, high quality Pfam-A, low quality Pfam-B.Annotation tools: hmmscan (HMMER 3.0)The Pfam database is a large collection of protein families, each represented bymultiple sequence alignmentsandhidden Markov models (HMMs).PFAM featuresGene Ontol
25、ogyAim to standardizing the representation of gene and gene product attributes across species and databases.GO covers three domains: biological process, cellular component and molecular function.For example, Cytochrome P450 11B1, mitochondrialGO cellular component term:GO:0005743Where is it?Mitochon
26、drial p450mitochondrial inner membraneGO molecular function term:GO:0004497What does it do?substrate + O2 = CO2 +H2O + productmonooxygenase activityGO biological process term: GO:0006118Which process is this?electron transportDAGpart_ofis_aGO AnnotationMappings to GOEC2GO, Pfam2GO, COG2GOAnnotation
27、toolsBlast2goGoannaGotchaCOG/KOGClusters of Orthologous Groups of proteinseuKaryotic Ortholog GroupsHow to define ortholog?Bet - best hitEach COG included proteins from at least three sufficiently distant species?COG/KOGKyoto Encyclopedia of Genes and Genomes KEGG (Kyoto Encyclopedia of Genes and Ge
28、nomes) is a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The PATHWAY database records networks of molecular interactions in the cells, and variants of them specific to particular organisms. Kanehisa LaboratoriesKEGG orthologyKAAS, for ortholog as
29、signment and pathway mappingA set of represent genomes, bi-directional best hitKEGG pathwayhsa00010ko00010map00010Glycolysis / GluconeogenesisKEGG APIhttp:/rest.kegg.jp/ = info | list | find | get | conv | link = | : path for kegg pathway, ko for kegg orthology : + + TASKGet a kegg pathway map.Get gene list that involve that pathway.TASK 1http:/rest.kegg.jp/info/pathwayhttp:/rest.kegg.jp/list/pathway http:/rest.kegg.jp/get/map00010/image Glycolysis / GluconeogenesisGet a kegg pathway map.TASK 2Get gene list that involve that p
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026年云南国防工业职业技术学院单招职业倾向性考试题库带答案详解(综合题)
- 2026年上海市单招职业倾向性考试题库及答案详解(夺冠)
- 2026年上海电机学院单招综合素质考试题库附答案详解(巩固)
- 2026年云南省西双版纳傣族自治州单招职业适应性考试题库附答案详解
- 2026年仙桃职业学院单招职业技能考试题库附参考答案详解(模拟题)
- 2026年上海海洋大学单招职业适应性测试题库带答案详解(新)
- 2026年上海中医药大学单招综合素质考试题库带答案详解ab卷
- 2026年云南商务职业学院单招职业倾向性考试题库带答案详解(完整版)
- 2026年上海建桥学院单招综合素质考试题库附答案详解(满分必刷)
- 2026年上海师范大学天华学院单招职业技能考试题库附参考答案详解(a卷)
- 亚朵酒店卫生管理制度
- 专题一·中国古代政治制度的演变(山东专版)-东北三省2026届高考二轮复习 历史讲义
- 北京市丰台区2026届(年)高三年级(上)学期期末考试政治试题卷+答案
- 2025膝关节周围截骨术治疗膝关节骨关节炎指南建议(全文)
- 危重病人生命体征监测技巧
- 手机抵押协议书模板
- 2025 年大学运动人体科学(体能训练)上学期期末测试卷
- 中药湿热敷教学课件
- 2025年杭州余杭区招聘公办幼儿园劳动合同制职工考试笔试试题(含答案)
- 有色金属加工厂节能设计规范
- 安全生产四个清单
评论
0/150
提交评论