生物晶片室所使用的微点阵技术是以arrayer微阵列晶片制_第1页
生物晶片室所使用的微点阵技术是以arrayer微阵列晶片制_第2页
生物晶片室所使用的微点阵技术是以arrayer微阵列晶片制_第3页
生物晶片室所使用的微点阵技术是以arrayer微阵列晶片制_第4页
生物晶片室所使用的微点阵技术是以arrayer微阵列晶片制_第5页
已阅读5页,还剩21页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Knowledge-based analysis of microarray gene expression data by using support vector machines Michael P. S. Brown*, William Noble Grundy, David Lin*, Nello Cristianini , Charles Walsh Sugnet, Terrence S. Furey*,Manuel Ares, Jr., and David Haussler*Department of Computer Science and Center for Molecul

2、ar Biology of RNA, Department of Biology, University of California, Santa Cruz, Santa Cruz, CA95064; Department of Computer Science, Columbia University, New York, NY 10025; Department of Engineering Mathematics, University of Bristol, Bristol BS8 1TR, United Kingdom Advisor:Dr.Hsu Reporter:Hung Chi

3、ng-wenOutlineMotivationObjectiveA unsupervised learning method.A supervised learning method.Experiment dataDNA Microarray DataSupport Vector MachineKernelAn imbalance in the number of positive and negativeExperimental Design PerformanceResults and DiscussionConclusions OpinionMotivation DNA microarr

4、ay technology can provide the ability to measure the expression levels of thousands of genes in a single experiment The experiments suggest that genes of similar function yield similar expression patterns in microarray hybridization experiments.Objective We introduce a method of functionally classif

5、ying genes by using gene expression data from DNA microarray hybridization experiments. The method is support vector machine (SVM). SVM is a supervised computer learning method.(with prior knowledge of the true functional classes of the genes.)A unsupervised learning method Unsupervised gene express

6、ion analysis methods use with similarity (or a measure of distance) between expression patterns without prior knowledge of the true functional classes of the genes. A clustering algorithm such as hierarchical clustering or selforganizing mapsA supervised learning method. A supervised learning techni

7、ques would begin with a set of genes that have a common function:for example, genes coding for ribosomal proteins A training set with two classes of genes expression data:the functional class(positive) and the un-functional class (negative)A supervised learning method Using this training set, SVM wo

8、uld learn to discriminate between the positive and negative of a given functional class based on expression data. Having learned the expression features of the class, the SVM could recognize new genes as positive or negative of the class based on their expression data.Experiment data We analyze expr

9、ession data from 2,467 genes from the budding yeast genes measured in 79 different DNA microarray hybridization experiments. We learn to recognize five functional classes from MYGD. We subject these data to analyses by SVM, Fishers linear discriminant, Parzen windows, and two decision tree learnersD

10、NA Microarray Data DNA Microarray Data. Each data point produced by a DNA microarray hybridization experiment represents the ratio of expression levels of a particular gene under two different experimental conditionsDNA Microarray Data 生物晶片室所使用的微點陣技術是以arrayer微陣列晶片製作儀將數千至上萬個基因探針(cDNA、oligonucleotide)

11、,依特定的排列方式固定在玻璃玻片上形成DNA晶片(DNA chip),再將target RNA(/DNA) (control and reference) 經不同螢光標記後與DNA晶片上的基因探針進行雜合(hybridization),藉由螢光掃瞄分析儀及分析軟體判讀雜交訊號並得到各基因表現強弱之數據,最後藉由電腦分析軟體及資料庫中快速地獲得大量生物資訊。 DNA Microarray Data the expression vector X= (X1, . . . , X79) The expression level Ei for gene X in experiment I and th

12、e expression level Ri of gene X in the reference state. The data set: 79-element gene expression vectors for 2,467 yeast genesSupport Vector Machines SVM is a simple way to build a binary classifier is to construct a hyperplane separating positive from negative in this space. Unfortunately, most rea

13、l-world problems involve nonseparable data. One solution to the inseparability problem is used with kernel to map the data into a higher-dimensional spacekernel the simplest kernel K(x,y)=XYK (X, Y) =(XY+1), yields a quadratic separating surfaceK (X, Y) =(XY+1)An imbalance in the number of positive

14、and negative It is likely to cause the SVM to make incorrect classifications. We sovle this problem by modifying the matrix of kernel values computed during SVM optimization.X(1), . . . , X(n) be the genes in the training set, the matrix K=kij, kij=k(X(i),X(j) k is kernel Kij =Kij + (n*/N), n* is th

15、e number of positive,N is the total number, is scale factor For negative example : n* replaced by n-Experimental Design Using the class definitions made by the MYGD, we trained SVMs to recognize six functional classes:tricarboxylic acid (TCA) cycle, respiration, cytoplasmic ribosomes, proteasome, hi

16、stones, and helix-turn-helix proteins. The performance of the SVM classifiers was compared with that of four standard machine learning algorithms: Parzen windows, Fishers linear discriminant, and two decision tree learners (C4.5and MOC1).Experimental Design Performance was tested by using a three-wa

17、y cross-validated experiment. The gene expression vectors were randomly divided into three groups. Classifiers were trained by using two-thirds of the data and were tested on the remaining third. This procedure was then repeated two more times, each time using a different third of the genes as test

18、genes.Performance Performance:false positive (FP), false negative(FN), true positive (TP), and true negative (TN) overall performance:C(M)= fp(M)+ 2fn(M), fp(M) is the number of false positives for method M, and fn(M) is the number of false negatives for method M.S(M) =C(N) -C(M). N:classifies all t

19、est examples as negative.Results and Discussion(SVMs Outperform Other Methods)Results and Discussion(SVMs Outperform Other Methods) For every class (except the helix-turn-helix class), the best performing method is a support vector machine using the radial basis or a higher-dimensional dot product k

20、ernel. But the results also show the inability of all classifiers to learn to recognize genes that produce helix-turn-helix proteins, as expected.(s(M) 0)Results and Discussion (Significance of Consistently Misclassified Annotated Genes.)Results and Discussion (Significance of Consistently Misclassi

21、fied Annotated Genes.)Many of the false positive genes in Table 2 are known from biochemical studies to be important for the functional class assigned by the SVM, even though MYGD has not included these genes intheir classification. For example, YAL003W and YPL037C,Results and Discussion(Functional

22、Class Predictions for Genes of Unknown Function.) The predictions below may merit experimental testing. In some cases described in Table 3, additional information supports the prediction. For example, a recent annotation shows that a gene predicted to be involved in respiration, YPR020W, is a subuni

23、t of the ATP synthase complex, confirming this predictionConclusions We have demonstrated that support vector machines can accurately classify genes into some functional categories and have made predictions aimed at identifying the functions of unannotated yeast genes. SVMs that use a higher-dimensional kernel function provide the best performance.Conclusions The supervised learning framework allows a researcher to start with a set of interesting genes and ask two questions: What other genes are coexpre

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论