版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Gene Functional Annotation Tools and DatabaseszhangminProviding advanced genomic solutions! OutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classificationInterPro and InterProScanPracticeOutlineWhat is functional anno
2、tation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classificationInterPro and InterProScanA simple exampleGenome AssemblyAssemble the Pieces RightGene PredictionWhen on board HMS Beagle, as naturalist, I was much struck with certain facts in the dist
3、ribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers .Identi
4、fy the wordsWhen on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of s
5、pecies - that mystery of mysteries, as it has been called by one of our greatest philosophers .Functional AnnotationWhen on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to
6、 the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers .naturalist nach-er-uh-list, nach-ruh-noun1. a person who studies or is an expert in natural history,
7、especially a zoologist or botanist.2. an adherent of naturalism in literature or art.Origin: 158090; natural + -istOrigin of Species, Thenoun( On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life ) a treatise (1859) by Charles Darwin
8、setting forth his theory of evolution. Identify the function (i.e., meaning) of each wordDATABASESPROFILESWhat information can be used for functional annotation?Sequence based approachesProtein A has function X, and protein B is a homolog (ortholog) of protein A; Hence B has function XStructure base
9、d approachesProtein A has structure X, and X has so-so structural features; Hence As function sites areMotif based approaches (sequence motifs, 3D motifs)A group of genes have function X and they all have motif Y; protein A has motif Y; Hence protein As function might be related to X“Guilt-by-associ
10、ation”Gene A has function X and gene B is often “associated” with gene A, B might have function related to XDomain fusion, phylogenetic profiling, PPI, etcOutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classification
11、InterPro and InterProScanA simple exampleBiological SequencesSequence similarity is a powerful tool for discovering biological function. Just as the ancient Greeks used comparative anatomy to understand the human body and linguists used the Rosetta stone to decipher Egyptian hieroglyphs, today we ca
12、n use comparative sequence analysis to understand genomes, RNAs, and proteins. But why are biological sequences similar to one another in the first place? The answer to this question isnt simple and requires an understanding of molecular and evolutionary biology. Biological sequences like proteins m
13、ay have important functions necessary for the survival of an organism. But DNA sequence can mutate randomly, and this may change how a sequence functions. Over time, both functional constraints and random processes impact the course of sequence evolution. The degree to which a sequence follows a fun
14、ctional or random path depends on natural selection and neutral evolution. So the reason why sequences are similar to one another is because they start out similar to one another and follow different paths. Basic Local Alignment Search ToolDivide a query sequence into short chunks called words,Look
15、for exact matchesin case of hit try extending the alignmentStatistical assessmentDifferent flavors!BLASTNQueries nucleotide vs. nucleotide sequencesBLASTPQueries protein vs. protein sequencesBLASTXQueries 6 possible frames of nucleotide sequences vs. protein sequencesTBLASTNReciprocal of BLASTX(库和核算
16、序列都翻译成6框)TBLASTXQueries 6 possible frames of nucleotide sequences vs. 6 possible frames of nucleotide sequences inside the databaseHMMER HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabili
17、stic models called profile hidden Markov models (profile HMMs).Representation of a Hidden Markov model based on a multiple sequence alignment.HMMER algorithmshmmscan - search protein sequences against collections of profiles, e.g. Pfam. In HMMER2 this was called hmmpfam.hmmsearch - used to search on
18、e or more profiles against a protein sequence database. jackhmmer - iteratively search a query protein sequence, multiple sequence alignment or profile HMM against the target protein sequence database.phmmer - used to search one or more query protein sequences against a protein sequence database./se
19、arch/hmmscanOutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGene functional annotation and classificationInterPro and InterProScanA simple exampleNucleotide and protein databasesNCBI (USA), EMBL (Europe), DDBJ (Japan)EST, STS, GSS, Genomes, RefSeq,
20、 HTG, etc. International Nucleotide Sequence Database CollaborationGenbankCoreNucleotide - Nt/NrdbESTdbGSSNCBI Nt/NrNt - Nucleotide collection The nucleotide collection consists of GenBank+ EMBL+ DDBJ+ PDB+RefSeq sequences, but excludes EST, STS, GSS, WGS, TSA, patent sequences as well as phase 0, 1
21、, and 2 HTGS sequences. The database is partially non-redundant.Nr - Non-redundant protein sequences All non-redundant GenBank CDS translations+PDB +SwissProt + PIR+PRF excluding environmental samples from WGS projects.UniProtKB/Swiss-ProtUniProtKB - Protein knowledgebase, consists of two sections:S
22、wissProt: manually annotated and reviewed.TrEMBL: automatically annotated and is notreviewed.Model Organism GenomesUseful ToolsKey word searchBLAST, BLATGenome browseBiomartOther functional resourseOutlineWhat is functional annotation?Popular tools - BLAST and HMMERNucleotide and protein databasesGe
23、ne functional annotation and classificationInterPro and InterProScanA simple exampleGene functional annotation and classificationTo interpret a protein in the context of biological functionProtein domains, families, functional sites, pathways or other biological meaningful aspectsProtein domain fami
24、ly, PFAMGene ontologyKEGG pathwayKOG/COG PFAM14831 families, high quality Pfam-A, low quality Pfam-B.Annotation tools: hmmscan (HMMER 3.0)The Pfam database is a large collection of protein families, each represented bymultiple sequence alignmentsandhidden Markov models (HMMs).PFAM featuresGene Ontol
25、ogyAim to standardizing the representation of gene and gene product attributes across species and databases.GO covers three domains: biological process, cellular component and molecular function.For example, Cytochrome P450 11B1, mitochondrialGO cellular component term:GO:0005743Where is it?Mitochon
26、drial p450mitochondrial inner membraneGO molecular function term:GO:0004497What does it do?substrate + O2 = CO2 +H2O + productmonooxygenase activityGO biological process term: GO:0006118Which process is this?electron transportDAGpart_ofis_aGO AnnotationMappings to GOEC2GO, Pfam2GO, COG2GOAnnotation
27、toolsBlast2goGoannaGotchaCOG/KOGClusters of Orthologous Groups of proteinseuKaryotic Ortholog GroupsHow to define ortholog?Bet - best hitEach COG included proteins from at least three sufficiently distant species?COG/KOGKyoto Encyclopedia of Genes and Genomes KEGG (Kyoto Encyclopedia of Genes and Ge
28、nomes) is a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The PATHWAY database records networks of molecular interactions in the cells, and variants of them specific to particular organisms. Kanehisa LaboratoriesKEGG orthologyKAAS, for ortholog as
29、signment and pathway mappingA set of represent genomes, bi-directional best hitKEGG pathwayhsa00010ko00010map00010Glycolysis / GluconeogenesisKEGG APIhttp:/rest.kegg.jp/ = info | list | find | get | conv | link = | : path for kegg pathway, ko for kegg orthology : + + TASKGet a kegg pathway map.Get gene list that involve that pathway.TASK 1http:/rest.kegg.jp/info/pathwayhttp:/rest.kegg.jp/list/pathway http:/rest.kegg.jp/get/map00010/image Glycolysis / GluconeogenesisGet a kegg pathway map.TASK 2Get gene list that involve that p
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年房屋登记考试题及答案
- 2025年普高护士面试试题及答案
- 2025年财务管理考试题库及答案
- (2025年)荆门市公务员遴选考试模拟试题及答案
- 2026北京对外经济贸易大学科研博士后招聘149人备考题库含答案详解
- 2026江安宜江通公交客运有限公司员工招聘60人备考题库有答案详解
- 2025河南漯河市农业科学院人才引进7人备考题库及答案详解一套
- 2026中华女子学院(全国妇联干部培训学院)服务保障部公寓管理中心编外聘用人员招聘备考题库及答案详解(考点梳理)
- 2026中规院直属企业招聘度高校毕业生21人备考题库及参考答案详解1套
- 2025浙江宁波宁麓置地(宁波)有限公司招聘13人备考题库及一套参考答案详解
- JGJ256-2011 钢筋锚固板应用技术规程
- 上海建桥学院简介招生宣传
- 《智慧教育黑板技术规范》
- 《电力建设安全工作规程》-第1部分火力发电厂
- 歌曲《我会等》歌词
- 八年级物理上册期末测试试卷-附带答案
- 小学英语五年级上册Unit 5 Part B Let's talk 教学设计
- 老年痴呆科普课件整理
- 学生校服供应服务实施方案
- GB/T 22900-2022科学技术研究项目评价通则
- 自动控制系统的类型和组成
评论
0/150
提交评论