版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Gene Functional Annotation Tools and DatabaseszhangminProviding advanced genomic solutions! OutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classificationInterPro and InterProScanPracticeOutlineWhat is functional anno
2、tation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classificationInterPro and InterProScanA simple exampleGenome AssemblyAssemble the Pieces RightGene PredictionWhen on board HMS Beagle, as naturalist, I was much struck with certain facts in the dist
3、ribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers .Identi
4、fy the wordsWhen on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of s
5、pecies - that mystery of mysteries, as it has been called by one of our greatest philosophers .Functional AnnotationWhen on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to
6、 the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers .naturalist nach-er-uh-list, nach-ruh-noun1. a person who studies or is an expert in natural history,
7、especially a zoologist or botanist.2. an adherent of naturalism in literature or art.Origin: 158090; natural + -istOrigin of Species, Thenoun( On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life ) a treatise (1859) by Charles Darwin
8、setting forth his theory of evolution. Identify the function (i.e., meaning) of each wordDATABASESPROFILESWhat information can be used for functional annotation?Sequence based approachesProtein A has function X, and protein B is a homolog (ortholog) of protein A; Hence B has function XStructure base
9、d approachesProtein A has structure X, and X has so-so structural features; Hence As function sites areMotif based approaches (sequence motifs, 3D motifs)A group of genes have function X and they all have motif Y; protein A has motif Y; Hence protein As function might be related to X“Guilt-by-associ
10、ation”Gene A has function X and gene B is often “associated” with gene A, B might have function related to XDomain fusion, phylogenetic profiling, PPI, etcOutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classification
11、InterPro and InterProScanA simple exampleBiological SequencesSequence similarity is a powerful tool for discovering biological function. Just as the ancient Greeks used comparative anatomy to understand the human body and linguists used the Rosetta stone to decipher Egyptian hieroglyphs, today we ca
12、n use comparative sequence analysis to understand genomes, RNAs, and proteins. But why are biological sequences similar to one another in the first place? The answer to this question isnt simple and requires an understanding of molecular and evolutionary biology. Biological sequences like proteins m
13、ay have important functions necessary for the survival of an organism. But DNA sequence can mutate randomly, and this may change how a sequence functions. Over time, both functional constraints and random processes impact the course of sequence evolution. The degree to which a sequence follows a fun
14、ctional or random path depends on natural selection and neutral evolution. So the reason why sequences are similar to one another is because they start out similar to one another and follow different paths. Basic Local Alignment Search ToolDivide a query sequence into short chunks called words,Look
15、for exact matchesin case of hit try extending the alignmentStatistical assessmentDifferent flavors!BLASTNQueries nucleotide vs. nucleotide sequencesBLASTPQueries protein vs. protein sequencesBLASTXQueries 6 possible frames of nucleotide sequences vs. protein sequencesTBLASTNReciprocal of BLASTX(库和核算
16、序列都翻译成6框)TBLASTXQueries 6 possible frames of nucleotide sequences vs. 6 possible frames of nucleotide sequences inside the databaseHMMER HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabili
17、stic models called profile hidden Markov models (profile HMMs).Representation of a Hidden Markov model based on a multiple sequence alignment.HMMER algorithmshmmscan - search protein sequences against collections of profiles, e.g. Pfam. In HMMER2 this was called hmmpfam.hmmsearch - used to search on
18、e or more profiles against a protein sequence database. jackhmmer - iteratively search a query protein sequence, multiple sequence alignment or profile HMM against the target protein sequence database.phmmer - used to search one or more query protein sequences against a protein sequence database./se
19、arch/hmmscanOutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classificationInterPro and InterProScanA simple exampleNucleotide and protein databasesNCBI (USA), EMBL (Europe), DDBJ (Japan)EST, STS, GSS, Genomes, RefSeq,
20、 HTG, etc. International Nucleotide Sequence Database CollaborationGenbankCoreNucleotide - Nt/NrdbESTdbGSSNCBI Nt/NrNt - Nucleotide collection The nucleotide collection consists of GenBank+ EMBL+ DDBJ+ PDB+RefSeq sequences, but excludes EST, STS, GSS, WGS, TSA, patent sequences as well as phase 0, 1
21、, and 2 HTGS sequences. The database is partially non-redundant.Nr - Non-redundant protein sequences All non-redundant GenBank CDS translations+PDB +SwissProt + PIR+PRF excluding environmental samples from WGS projects.UniProtKB/Swiss-ProtUniProtKB - Protein knowledgebase, consists of two sections:S
22、wissProt: manually annotated and reviewed.TrEMBL: automatically annotated and is notreviewed.Model Organism GenomesUseful ToolsKey word searchBLAST, BLATGenome browseBiomartOther functional resourseOutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGe
23、ne functional annotation and classificationInterPro and InterProScanA simple exampleGene functional annotation and classificationTo interpret a protein in the context of biological functionProtein domains, families, functional sites, pathways or other biological meaningful aspectsProtein domain fami
24、ly, PFAMGene ontologyKEGG pathwayKOG/COG PFAM14831 families, high quality Pfam-A, low quality Pfam-B.Annotation tools: hmmscan (HMMER 3.0)The Pfam database is a large collection of protein families, each represented bymultiple sequence alignmentsandhidden Markov models (HMMs).PFAM featuresGene Ontol
25、ogyAim to standardizing the representation of gene and gene product attributes across species and databases.GO covers three domains: biological process, cellular component and molecular function.For example, Cytochrome P450 11B1, mitochondrialGO cellular component term:GO:0005743Where is it?Mitochon
26、drial p450mitochondrial inner membraneGO molecular function term:GO:0004497What does it do?substrate + O2 = CO2 +H2O + productmonooxygenase activityGO biological process term: GO:0006118Which process is this?electron transportDAGpart_ofis_aGO AnnotationMappings to GOEC2GO, Pfam2GO, COG2GOAnnotation
27、toolsBlast2goGoannaGotchaCOG/KOGClusters of Orthologous Groups of proteinseuKaryotic Ortholog GroupsHow to define ortholog?Bet - best hitEach COG included proteins from at least three sufficiently distant species?COG/KOGKyoto Encyclopedia of Genes and Genomes KEGG (Kyoto Encyclopedia of Genes and Ge
28、nomes) is a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The PATHWAY database records networks of molecular interactions in the cells, and variants of them specific to particular organisms. Kanehisa LaboratoriesKEGG orthologyKAAS, for ortholog as
29、signment and pathway mappingA set of represent genomes, bi-directional best hitKEGG pathwayhsa00010ko00010map00010Glycolysis / GluconeogenesisKEGG APIhttp:/rest.kegg.jp/ = info | list | find | get | conv | link = | : path for kegg pathway, ko for kegg orthology : + + TASKGet a kegg pathway map.Get gene list that involve that pathway.TASK 1http:/rest.kegg.jp/info/pathwayhttp:/rest.kegg.jp/list/pathway http:/rest.kegg.jp/get/map00010/image Glycolysis / GluconeogenesisGet a kegg pathway map.TASK 2Get gene list that involve that p
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 《证据习题刑诉法》课件
- 江苏省镇江一中等2025届高考考前模拟英语试题含解析
- 14.1《故都的秋》课件 2024-2025学年统编版高中语文必修上册
- 2025届山东省威海市乳山一中高考冲刺押题(最后一卷)数学试卷含解析
- 2025届山东省微山二中高考数学三模试卷含解析
- 2025届上海市虹口区复兴高中高考考前模拟数学试题含解析
- 10.1《劝学》课件 2024-2025学年统编版高中语文必修上册-3
- 湖南省邵阳市邵东县创新实验学校2025届高三第三次模拟考试英语试卷含解析
- 河南省驻马店市2025届高考考前模拟数学试题含解析
- 全国普通高等学校招生统一考试2025届高考语文五模试卷含解析
- 社会学与中国社会学习通超星期末考试答案章节答案2024年
- 艺术鉴赏学习通超星期末考试答案章节答案2024年
- 广东省2024年中考数学试卷三套合卷【附答案】
- 2024-2025学年四川省成都市高新区六年级数学第一学期期末考试试题含解析
- 《管理学原理与方法》考试复习题库(含答案)
- 2023年格力电器偿债能力分析
- 2024年人工智能训练师认证考试题库(浓缩600题)
- 比亚迪试驾协议书模板
- 医学影像诊断学智慧树知到答案2024年湖北科技学院
- 国家开放大学《初级经济学》形考任务1-3参考答案
- 2024短剧出海白皮书
评论
0/150
提交评论