版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Microarray and Bioinformatics基因芯片的生物信息学Dr Jingfu Qiu 邱景富School of Public Health公共卫生学院Microarray and BioinformaticsAims for the Microarray BioinformaticsUnderstand basic microarray technology and its use in gene expression analysis. 基因芯片技术与表达谱分析中的应用Learn basic data analysis methods and how to apply t
2、hem in the analysis of gene expression data 基因芯片的数据分析Data acquisition 数据获得Data normalization 数据归一化Data analysis 数据分析Data Clustering 数据聚类Aims for the Microarray BioinfVocabulary-Review 回顾Gene 基因: hereditary DNA sequence at a specific location on chromosome.Genetics 遗传学: study of heredity & variation
3、in organisms.Genome 基因组: an organs total content (full DNA sequence)Genomics 基因组学: study of organisms in terms of their genome.2002年2月12日, 历时10载耗资20亿美元的人类基因组计划最终完成, 并报道了99% 的人类基因组序列.Vocabulary-Review 回顾Gene 基因: hVocabulary-Review回顾Protein 蛋白质 : sequence of amino acids that “does something”Proteomics
4、 蛋白质组学 : study of all of the proteins that can come from an organisms genomeBioinformatics 生物信息学 : the collection, organization & analysis of large-scale, complex biological data.Functional Genomics 功能基因组学: study of obtaining an overall picture of genome functions, including the expression profiles
5、at the mRNA level and the protein levelVocabulary-Review回顾Protein 蛋白质Microarray 基因芯片 A high throughput technology that allows detection of thousands of genes Simultaneously gene chip, biochip ,array Much rely on computer aids Central platform for functional genomicsMicroarray 基因芯片 A high througTypes
6、 of Microarrays 芯片的种类DNA microarrays, such as cDNA microarrays and oligonucleotide microarrays MMChips, for surveillance of microRNA populations Protein microarrays Tissue microarrays Cellular microarrays (also called transfection microarrays Chemical compound microarrays Antibody microarraysTypes o
7、f Microarrays 芯片的种类DNATypes of DNA Microarrays1. cDNA chip (DNA microarray, two-channel array) cDNA芯片 : Probe cDNA (5005,000 bases long) is immobilized to a solid surface such as glass Using robot spotting Traditionally called DNA microarray Firstly developed at Stanford University2. Gene chip (DNA
8、chip, Affymetrix chip) 基因芯片: Oligonucleotide (2080-mer oligos) is synthesized either in situ (on-chip) or by conventional synthesis followed by on-chip immobilization Historically called DNA chips Developed at Affymetrix, Inc. , under the GeneChip trademark Many companies are manufacturing oligonucl
9、eotide based chips using alternative technologiesTypes of DNA Microarrays1. cDNHistory 历史HGP (human genome project): suggested by Delbecco on Mar.7,1986,started in Oct. 1990, rapid and sensitive techniques for human genome information analysis80S: suggestion based on computer chip, W Brains tried it
10、 firstly.90S: Stephen Fodor(Present of Affymetrix now) made it successfully.1995:Quantitative monitoring of gene expression patterns with a complementary DNA microarray End of 1996: the first DNA chipHistory 历史HGP (human genome prMicroarrays are Popular 芯片技术的普及At NYU Med Center now collecting about
11、3 GB of microarray data per week (60 chips, 6-10 different experiments)PubMed search microarray= 24,431 papersMicroarrays are Popular 芯片技术的普What problems can it solve?基因芯片的应用 Differing expression of genes over time, between tissues, and disease states 基因表达差异 Identification of complex genetic disease
12、s 复杂性基因疾病的诊断 Drug discovery and toxicology studies 药理与毒理学研究 Mutation/polymorphism detection (SNPs) SNP 检测 Pathogen analysis 诊断病原What problems can it solve?基因Features 特点 Parallelism 高平行 Thousands of genes simultaneously Miniaturization 小型化 Small chip size Multiplexing 高通量 Multiple samples at the same
13、 time Automation 自动化 Chip manufacturing ReagentsFeatures 特点 Parallelism 高Differential Gene Expression基因表达差异A Few Examples:Cell type specific -e.g. skin cell vs. brain cell Developmental stage -e.g. embryonic skin cell vs. adult skin cellDisease state -e.g. normal skin cell vs. skin tumor cellEnviron
14、ment-specific -e.g. skin cell untreated vs. treateddrugs, toxinsDifferential Gene Expression基What is its pitfall 缺陷与不足? Detect transcription mRNA level, not translation protein level Many factors (variations) can affect the result:影响因素众多 Chip and probe design Experiment design Sample preparation Ima
15、ge acquisition Data normalization Data analysis . Success crucial 成功关键: You know both the biology problem and the computer aids (software, statistics).What is its pitfall 缺陷与不足? DeRequrimentsArray spotter 点样仪Array scanner 扫描仪Chemistry systems 杂交体系Softwares 软件 RequrimentsArray spotter 点样仪Market predi
16、ct 市场预期At 1999:1 billion USDLess than 5 yrs: 20 billions2005:5 billions(USA)2010:40 billions(USA) Dont include disease diagnosticThe largest industry instead of microelectricsMarket predict 市场预期At 1999:Principle 原理 Similar to Northern Base-Pairing, hybridization between nucleic cids Major difference
17、s from Northern Detects thousands of genes simultaneously /individual Probes fixation on glass slide / nylon membrane Target samples labeling with fluorescent/radioactive dNTPPrinciple 原理 Similar to NorthDesigning the Probes 探针的设计The probes need to be of high specificity to avoid hybridization with
18、wrong target molecules. 特异性The probes need to generate an output that is easy to read (spots lie in defined positions and be of regular size and shape and even spacing). 杂交结果容易判读The probes have to have high sensitivity to detect the mRNA and the intensity of the spot light must be differentiable fro
19、m background noise. 敏感性Results must be reproducible across multiple experiments. 重复性Designing the Probes 探针的设计TheSpotting Process 点样过程Spotting Process 点样过程点样针点样针基因芯片的生物信息学课件基因芯片的生物信息学课件Spot robot 点样仪Cheung et al. 1999Spot robot 点样仪Cheung et al. 1Affymetrix 基因芯片Affymetrix 基因芯片表达差异检测表达差异检测基因芯片的生物信息学课件
20、Comparison of Probe Types两种探针比较AdvantagesNo need to isolate and purify cDNAs because oligonucleotides can be synthesized.Short oligonucleotides are less likely to have cross-reactivity with other sequences in the target DNA.Density of chips is higher than with cDNAs.LimitationsThe sequence has to be
21、 known.Synthesis can be expensive and time-consuming.The short sequences are not as specific for target DNA, so appropriate controls must be added.In-situ Synthesis / OligosPCR Products / cDNA ProbesAdvantagesFlexibility to study cDNAs from any source.cDNAs do not require any a priori information ab
22、out the corresponding genes.Longer sequences increase hybridization specificity, which reduces false positives. LimitationsIsolation of individual cDNAs to immobilize on each spot can be cumbersome.Density is lower than synthesizing oligonucleotides on the surface of the chip.cDNAs are longer sequen
23、ces and are more likely to randomly contain sequences found in target DNA, which results in cross-reactivity.Many other variations of the technology exist, such as the use of longer oligos, the use of fibre optics, etc.Comparison of Probe Types两种探针HomemadeTailoredCheaper?Maximum 24,000 features per
24、arrayProne to variabilityCommercially available“Off the rack”More expensive?Maximum 500,000 features per arrayLess variabilitySpotted ArraysAffymetrix ArraysHomemadeCommercially availableProcess of manufacture a microarray芯片制备流程Start with individual genes, e.g. the 4,200 genes of the genome or Y.pes
25、tisAmplify all of them using polymerase chain reaction (PCR)“Spot” them on a medium, e.g. an ordinary glass microscope slideEach spot is about 100 m in diameterSpotting is done by a robotComplex and potentially expensive taskProcess of manufacture a micrB21B22B23B24B25B26B27B28B29B30B31B32B17B18B19B
26、20B5B6B7B8B9B10B11B12B13B14B15B16B1B2B3B448矩阵1717 点阵一共8448个点;4005条鼠疫菌基因+若干对照DNA;每样品相邻重复两个点。基因选择4015条芯片点样基因的PCR扩增产物纯化和浓缩4005条基因全基因组芯片研制引物设计B21B22B23B24B25B26B27B28B29B30Biological questionDifferentially expressed genesSample class prediction etc.TestingBiological verification and interpretationMicroa
27、rray experimentEstimationExperimental designImage analysisNormalizationClusteringDiscriminationR, G16-bit TIFF files(Rfg, Rbg), (Gfg, Gbg)Biological questionTestingBiolMicroarray Steps 基因芯片分析过程 Experiment and Data Acquisition 实验过程与数据获得 Chip manufacturing 芯片制备 Sampling and labeling 点样 Hybridization 杂
28、交 Image scaling 图像扫描 Data acquisition 数据获得 Data normalization 数据归一化 Data analysis 数据分析 Biological interpretation 生物学解释Microarray Steps 基因芯片分析过程 ExpReading an array (cont.)BlockColumnRowGene NameRedGreenRed:Green Ratio111tub12,3452,4670.95112tub23,5892,1581.66113sec14,1091,4692.80114sec21,5003,5890.4
29、2115sec31,2461,2580.99116act11,9372,1040.92117act22,5611,5621.64118fus12,9623,0120.98119idp23,5851,2092.971110idp12,7961,0052.781111idh12,1704,2450.511112idh21,8962,9960.631113erd11,0233,3540.311114erd21,6982,8960.59Reading an array (cont.)BlockCColor Coding扫描结果Tables are difficult to readData is pr
30、esented with a color scaleCoding scheme:Green = repressed (less mRNA) gene in experimentRed = induced (more mRNA) gene in experimentBlack = no change (1:1 ratio)OrGreen = control condition (e.g. aerobic)Red = experimental condition (e.g. anaerobic)We only use ratioColor Coding扫描结果Tables are diNoise
31、干扰Noise sources干扰来源:Sample preparation, labeling, amplificationReaction variationsEnvironmentTarget volumeHybridization parameters (temperature, time, .)Aspecific hybridizationDustScanner settingsQuantizationNoise 干扰Noise sources干扰来源:Other Image Processing Problems Spot Quality ProblemsUneven grid p
32、ositionsCurves within a gridVariable Spot size or shapeVariable Distance between spotsTypical Problems of Raw OutputOther Image Processing ProblemTwo slidesP04 vs. P01 (pg2)A1 vs. P01 (pg2)Two slidesP04 vs. P01 (pg2)A1Noise filtering 干扰过滤Noise filtering 干扰过滤Noise filtering 干扰过滤Gridding: identify spo
33、t locationsSegmentation: distinguish foreground from backgroundFixed Circle: put a circle around the foreground areaSeeded region growing: identify initial spot “seeds” and grow high intensity regionsEdge detection algorithmsBackground cancellationIntensity = FGintensity - BGintensityNoise filtering
34、 干扰过滤Gridding: Normalization 归一化The word normalization describes techniques used to suitably transform the data before they are analysed.Goal is to correct for systematic differencesbetween samples on the same slide, orbetween slides, which do not represent true biological variation between samples.
35、Normalization 归一化The word normNormalization 归一化Noralize data to correct for artificial variancesRed = FGred - BGredGreen = FGgreen BGgreenPixelValue = log2(Red/Green)-log2(Redavg/Greenavg)Pixel color:Green if pixel value 0Normalization 归一化Noralize dataNormalization 归一化Calibrated, red and green equal
36、ly detectedUncalibrated, red light under detectedNormalization 归一化Calibrated, rThe origin of systematic differences系统误差的产生原因Systematic differences due to Dye biases, which vary with spot intensity, Location on the array, Plate origin, Printing quality which may vary between PinsTime of printingScann
37、ing parameters,The origin of systematic diffeDNA array Data AcquisitionDNA 芯片数据的获得Image Analysis software packages exist for the analysis of the output of custom made chips (e.g. GenePix Pro, Array Vision, TIGR Spot Finder, etc) Need chip description file (CDF) For probe locationDNA array Data Acqui
38、sitionDNAIntroduction of Software-SAMSAM软件介绍Significance Analysis of MicroarraysTusher, Tibshirani and Chu (2001): Significance analysis of microarrays applied to the ionizing radiation response. PNAS 2001 98: 5116-5121, (Apr 24).Excel pluginFreePermutation basedMost published method of microarray d
39、ata analysisIntroduction of Software-SA基因芯片的生物信息学课件基因芯片的生物信息学课件基因芯片的生物信息学课件chose = .5. producing about 65 significant genes and about 5.9 false positives on the average.The choice of is up to the user, depending how many false positives he/she is comfortable with.The False Discovery Rate (FDR) is co
40、mputed as median (or 90th percentile) of the numberof falsely called genes divided by the number of genes called significant.chose = .5. producing about Handling Missing Data 丢失数据的操作There are currently two options for imputing missing values in SAM.Row Average Each value is imputed with the average
41、of non-missing values for that gene.K-Nearest Neighbor In the other (default) option- missing values are imputed using a k-nearest neighbor average in gene space (default k = 10):Handling Missing Data 丢失数据的操作Clustering 聚类软件Hypothesis: Genes with similar function have similar expression profilesFind
42、group of genes with similar expression profilesFind groupd of individuals with similar expression profiles within a populationClustering 聚类软件Hypothesis: GeClustering = Group identificationClustering = Group identificatClustering Steps 聚类分析步骤Choose a similarity metric to compare the transcriptional r
43、esponse or the expression profiles:Pearson CorrelationSpearman CorrelationEuclidean Distance特征抽取和模式表示Choose a clustering algorithm:HierarchicalK-meansClustering Steps 聚类分析步骤Choose Cluster algorithm聚类算法 - Unsupervised Analysis 非监督算法 - HierarchicalK-meanSelf-organizing mapsOthers - Supervised Analysis
44、: classification rules 监督算法Cluster algorithm聚类算法 - UnsHyerarchical Clustering ExampleEisen et al. (1998), PNAS, 95(25): 14863-14868Hyerarchical Clustering ExamplHyerarchical Clustering ExampleHyerarchical Clustering Exampl/cgi/content/full/95/25/14863基因芯片的生物信息学课件系统聚类法步骤1、将n个样品各作为一类;2、计算n个样品两两之间的距离,构
45、成距离矩阵;3、合并距离最近的两类为一新类;4、计算新类与当前各类的距离。 再合并、计算,直至只有一类为止;5、画聚类树形图,确定距离切点、类组,解释。 在SPSS软件中的操作步骤:Analyze-Classify-Hierarchical 系统聚类法步骤1、将n个样品各作为一类;2、计算n个样品Hierarchical Clustering系统聚类法g1g2g3g4g5g10.230.000.95-0.63g20.910.560.56g30.320.77g4-0.36g5g1g4g1g2g3g4g5g10.230.000.95-0.63g20.910.560.56g30.320.77g4-0
46、.36g5 Find largest value is similarity matrix. Join clusters together. Recompute matrix and iterate.Hierarchical Clustering系统聚类法gHierarchical Clustering 系统聚类g1 , g4g2g3g5g1 , g40.370.16-0.52g20.910.56g30.77g5g1g4g2g3g1 , g4g2g3g5g1 , g40.370.16-0.52g20.910.56g30.77g5 Find largest value is similarity
47、 matrix. Join clusters together. Recompute matrix and iterate.Hierarchical Clustering 系统聚类gHierarchical Clustering系统聚类g1 , g4g2 , g3g5g1 , g40.27-0.52g2 , g30.68g5g1g4g2g3g5g1 , g4g2 , g3g5g1 , g40.27-0.52g2 , g30.68g5 Find largest value is similarity matrix. Join clusters together. Recompute simila
48、rity matrix and iterate.Hierarchical Clustering系统聚类g1Interpreting the Resultsg1g4g2g3g52 clusters ?3 clusters ?Interpreting the Resultsg1g4g2k-means 聚类分析k-means 聚类分析是一种广为人知的方法,它通过尽量缩小一个分类中的项之间的差异,同时尽量拉大分类之间的距离,来分配分类成员身份。k-means 中的 means 指的是分类的“中点”,它是任意选定的一个数据点,之后反复优化,直到真正代表该分类中的所有数据点的平均值。k 指的是用于为聚类分析过程设种子的任意数目的点。k-means 算法计算一个分类中的数据记录之间的欧几里得距离的平方,以及表示分类平均值的矢量,并在和达到最小值时在最后一组 k 分类
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年《价值为纲》学习心得范例(3篇)
- 课题申报参考:接受美学视野下的卞之琳莎学研究
- 课题申报参考:教育强国进程中高校继续教育高质量发展的保障机制和推进路径研究
- 2025版房地产销售代理授权委托合同3篇
- 二零二五年度物流仓储中心临时搬运工劳动合同书4篇
- 2025版学校游泳池配套设施租赁与管理承包合同示范2篇
- 二零二五版艺术品拍卖师佣金分成合同3篇
- 个性化离婚合同与起诉状套装2024版版B版
- 二零二五年度健康管理与养老服务业合作协议3篇
- 二零二五年度图书封面及插图设计合同4篇
- 山东铁投集团招聘笔试冲刺题2025
- 真需求-打开商业世界的万能钥匙
- 2025年天津市政集团公司招聘笔试参考题库含答案解析
- GB/T 44953-2024雷电灾害调查技术规范
- 2024-2025学年度第一学期三年级语文寒假作业第三天
- 2024年列车员技能竞赛理论考试题库500题(含答案)
- 心律失常介入治疗
- 6S精益实战手册
- 展会场馆保洁管理服务方案
- 监理从业水平培训课件
- 广东省惠州市实验中学2025届物理高二第一学期期末综合测试试题含解析
评论
0/150
提交评论