




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、P.F. LemkinLECB, CCR, NCI/FCRDCmail: This document is under constructionRevised: 06-19-2002 Software Design of the MicroArray Explorer Data Mining Tool Home: :/ /MAExplorerOpen Source: :/ Mozilla 1.1 public license This work was produced by Peter Lemkin of the Nation
2、al Cancer Institute, an agency of the United States Government. As a work of the United States Government there is no associated copyright. It is offered as open source software under the Mozilla Public License (version 1.1) subject to the limitations noted in the accompanying LEGAL file. Both files
3、 are available on :/ Java MAExplorer program design issues1. Overview2. Design decisions, Client-centric vs. Server-Centric 3. MAExplorer GUI (graphical user interface)4. Web database I/O5. Genes and Gene lists6. Gene data filter7. Multiple plot windows8. Other plot window implementations9. Reports
4、& access to other Web databases10. Synchronizing windows11. Dumping text and plot windows to .txt and .gif files12. Saving and Restoring the MAExplorer state13. Miscellaneous classes14. MAEPlugin design15. MaeJavaAPI design16. Examples (links): MAExplorer menu organization Contents of this documentT
5、his document discusses the software design of MAExplorerMAExplorer is Open Source with the source code and a collaborative environment for improving the code available at :/ first part contains the primary design discussionThe second part contains Examples of computer screens for many of the windows
6、 illustrating these data structures and classes1. Overview - MicroArray ExplorerMAExplorer is a flexible Java microarray data-mining tool Handles multiple cDNA or oligo array samples Handles duplicate spots/array and replicate samplesHandles intensity or ratio (Cy3/Cy5) quantified array dataData org
7、anized by 2- (X vs Y) and N-condition lists expression profiles, sample lists, data structures could support ordered lists of condition listsGene data-filters gene set computed by statistics, clustering, gene setsDirect data manipulation in pseudoarray image, graphics, spreadsheetsAccess genomic Web
8、 servers from plots and reportsOriented toward mRNA data, could extend to protein arraysStand-alone (off-line) or applet (Web-based)User data converted using Cvt2Mae “wizard toolMAEPlugins allow users to add new analysis methods1.1 Array data mining - finding patterns of genesQuantified array spot d
9、ata for multiple samplesOrganize by: sample, gene expression, gene setsChange views: normalization, data filtersVisualize and query: plots, cluster, reportsExplore: external genomic databasesSpecificGeneralData ReductionSubset of genes for further analysis1.2 Data preparation for MAExplorer 1.3 Para
10、digm: local & genomic-Web databases1.4 Overview of MAExplorer database system (Data preparation steps prior to MAExplorer analysis are in cyan)1.5 MAExplorer analysis environmentStand-alone Java application for user dataDownload MAExplorer program from Web site Installers: Windows, MacOS/-X, Solaris
11、, Linux, Unix, etc.Documentation, tutorials, MGAP demo database on Web site Cvt2Mae “wizard tool converts user data for use with MAExplorerMay download data from NCI/CIT mAdb microarray data server RDBMS for direct use with MAExplorer1.6 MAExplorer user interface - main Analysis menuPseudoarray imag
12、e is ratio of samples:pregnancy (HP-X) / lactation (HP-Y)Genes passing data filter have white circlesCurrent gene has yellow circleSelect X and Y samples1.7 Data conversion “Wizard for MAExplorerCvt2Mae “Wizard converts commercial and user-defined array formats (e.g. Affymetrix, GenePix, Scanalyze,
13、etc)Users may create and save Array Layout descriptions for subsequent conversionsConversion generates a MAExplorer project directory tree of files that are ready to analyze After MAExplorer installed, click on projects MAE/Start.mae file to start analysisSee Cvt2Mae home page for details on operati
14、on1.7.1 Cvt2Mae array data conversion “wizardDatabase of built-in and user- defined array layout formats1.8 MAExplorer Home Page :/ /MAExplorerReference ManualTutorialsDownload MAExplorerData conversion wizardPluginsExamples2. Our design decisionsComputation is done in the Java appli
15、cation for better user interactionInitially used Java JDK1.1, since not all Web browsers handled JDK1.2/1.3. JDK1.1 is the least common denominator for applet portability. Not important with stand-alone version which uses JDK1.3. The JDK1.3 is packaged with the InstallersAll data files (except image
16、s) are tab-delimited ASCII files - Excel-compatible. The Cvt2Mae data converter translates user data files to MAExplorer formatted filesMinimize amount of data that needs to be read initiallyUse object-oriented design with new base classes and extended classes as required. Write custom GUI classes t
17、o better control the user interfaceUse Integrated Development Environment such as Suns free “Forte for Java for rapid debuggingOptimize code and garbage-collect data structures often 2.1 Client-Centric computations - advantages/disadvantagesClient-centric approach uses stand-alone programs + Java ru
18、ns on all operating systems as either stand-alone or browser applets + handles rapid response required for direct manipulation on desktop computers + stand-alone version may be restarted quickly from local or cached data + size limitations are not a problem with stand-alone Java applications + Java
19、plug-ins allows prototyping new analysis methods by any group of users + easy to build large stable stand-alone programs handling very large data sets - for applet version, slow startup since program & data downloaded when run - difficult to build large stable Web-applets handling very large data se
20、ts - for stand-alone application, must be installed on clients computer2.2 Server-Centric (CGI or Applet) computations - advantages/disadvantagesServer-centric approach uses mix of HTML, CGI, Java Applets + may have better resources for very large data sets but with dependence on server + faster sta
21、rtup than full applet since minimal GUI required and little data is downloaded + easier to prototype and distribute new functionality using centralized CGI or servlets - susceptible to Internet traffic bandwidth problems for large numbers of users - susceptible to server-load dependencies for large
22、numbers of users - difficult to get very rapid response for direct manipulation for data mining 2.3 The MAExplorer projectA database resides in a project directory which contains all samples the user may wish to analyzeMultiple MAExplorer projects may exist on a local disk (or Web server) - each hav
23、ing a standard project directory tree (shown in next slide)A projects database file (maeProjects.txt) in the stand-alone installation directory tracks the names of these projects and disk location and last active databaseThe (File | Database | Open file DB) menu command specifies a particular databa
24、se startup file within a project directoryMAExplorer is started by opening a .mae startup file that specifies the subset of samples to use (Note: .mae is the file extension - not the complete file name!)Clicking on a .mae file (e.g. in Windows) will start MAExplorer on that database2.3.1 MAExplorer
25、Project Directory Tree /Cache - (optional) cached data saved from initial download from Web DB server /Config - project databases for Configuration DB, GIPO DB, Samples DB files /MAE - set of .mae stand-alone startup files for subsets of the project samples /Plugins - (optional) set of MAEPlugins wr
26、itten by the user. Normally, it checks the /Plugins directory in the MAExplorer installation directory /Quant - quantified spot data files for each hybridized sample /Report - (optional) directory where text and GIF files are saved by user with SaveAs Report and SaveAs GIF commands /State - (optiona
27、l) the GeneBitSets (.cbs) and SampleSets (.hpl) “SaveAs DB files2.3.2 Required project directoriesAll data files (except images) are tab-delimited filesThe .mae startup file is a physical file on a local disk or a Web server virtual CGI file in the MAE/ directoryThere may be any number of .mae start
28、up files. They all end with “.maeEach .mae startup file points may point to a specific configuration fileThe Config/ directory contains: 1. Configuration database file describing the array architecture and other DB files 2. Samples database file listing all of the samples available in the database 3
29、. GIPO (Gene In Plate Order) print table database file mapping spot position to geneomic information through a spot identifierThe Quant/ directory contains quantified spot data files for each hybridized sample that includes channel(s) intensity and background, good spot flags (QualCheck), spot ident
30、ifier2.3.3 Optional/Generated project directoriesThe Cache/ directory can be used to saving all data files downloaded from a Web server to avoid returning to the server when accessing that data in subsequent data mining sessionsThe State/ directory is used for saving the named gene sets (.cbs files)
31、 that are shared between all startup files.The State/ directory is also used for saving the named lists of hybridized samples (.hpl) files that are shared between all startup filesWhen saving a data mining session using (File | Database | SaveAs DB) menu command, the named gene sets and sample lists
32、 are written to the State/ directory as .cbs and .hpl filesSaving the database also writes the complete state into a .mae startup file that overides the configuration file data the next time MAExplorer is startedThe Report/ directory is used for saving text reports as .txt files and all plots as .gi
33、f files2.3.4 Opening a database from local diskIn stand-alone mode, you can browse a project database containing a set of startup databases. 2.4 Notation : JDK vs. new MAExplorer classesClasses which are part of the SUN JDK library are in Green, classes which we wrote specifically for MAExplorer are
34、 in Red.The stand-alone main() method is in MAExplorerThere are about 140 classes including the MAEPlugins supportThe MAExplorer class contains instances of the single instantiations of all classes that may require global access (for speed) from any subsequent processingThese global instances are cr
35、eated in MAExlorer.init() at startup 2.4.1 New MAExplorer base classesGuesser - scrollable text area for selecting one or more items using prefix (e.g. “carbonic) or wild card (e.g. “*onco*) notation. Includes: PopupGeneGuesser, PopupHPmenuGuesser, PopupProjDirGuesser.Chooser - specialized Chooser t
36、hat lets users move objects from a “remainder list to a “selected List. Includes: PopupHPChooser for selecting a the HP-X, HP-Y sets and HP-E list. Table - tab-delimited table constructor and file reader. Includes: ConfigTable, GipoTable, MaHPquantTable, MaInfoTable, SamplesTable. Table is also used
37、 to compute intermediate data structures for computing reports 2.4.1 New MAExplorer base classes (continued)Table is extended from SimpleTable that can be used (and is used) to make short-lived temporary tables for various purposes including reportsDraw2Dplot - create scrollable 2D plots. The DrawSc
38、atterPlot, ExprProfileOverlay extend Draw2DplotGene - create Gene instances. It is used in GeneList.mList that may contain ordered lists of GenesGeneBitSet - creates efficient 64-bit/word bit-sets representing Gene sets and operations on these sets. It is normally accessed through GeneList.bitSet 2.
39、4.2 Data structures: genes, hybridized samplesAny class instance that may need to be accessed from other classes has a class instance kept in MAExplorer. Most classes have an “initialization constructor that captures the MAExplorer instance (typically called mae) for future use by that class. This l
40、ets methods in that class access any other classes requiredAll variables and methods are private unless they need to be accessed from outside of the classState data is found primarily in two classes in MAExplorer and Config Fundamental objects are genes (Gene) and hybridized samples (MaHybridSample)
41、 which are then used in other classes and lists 2.4.3 Data structures: HPs, MIDs, GIDs, gang GIDsA HP is all of the array data for a hybridized sample. It contains data for multiple spot intensity channels (e.g. F1&F2, or Cy3&Cy5), background, QualCheck flags, etc.A GID is a Grid index ID and unique
42、ly defines a spot in the array database. Corresponding spots in different samples have the same GIDsReplicate spotted grid in the array have Gang GIDsA MID is a Master gene ID and uniquely defines a gene in the database. All GIDs representing the same gene have the same MIDThere is one copy of a Gen
43、e instance in the database and it has all gene specific data (gene name, GenBank ID, Clone ID, etc.) 2.4.4 Data structures: MapsThe Map class defines maps between MIDs, GIDs, GridCoords, and GenesThe master gene list, Map.midStaticCL, has a list of all Genes instances indexed by MID as Map.midStatic
44、CL.mListmid The Map.gidStaticCL.mListgid accesses corresponding Gene instances by GIDThe Map.gid2midgid looks up the MID given the GIDThe Map.mid2gidmid looks up the GID given the MIDThe Map.gidToGangGidgid looks up the Gang GID given the GIDThere are other maps between lists of spot GridCoord (fiel
45、d,grid,row,column) and GIDs 2.4.5 Data structures: GeneThe Gene Gene is the base class used to define a single gene (clone or oligo) data structure consisting of sample-specific data fields and sample independent genomic identifiers and name fields. The latter is represented by the Master Gene ID (M
46、ID) which is unique for any number of spots for that gene and the Grid coordinate ID (GID) which corresponds to a particular spot for that geneThe Gene.midList0:nMid-1 is a list of all other Gene instance MIDs that are the same gene (i.e. replicates)This identifies replicate genes on the array that
47、are available for computing statistics Quantified data may be temporarily stored in (data, data1, data2, pValue, geneDist, etc.) variables Generally, F1 (Cy3) is data1, F2 (Cy5) is data2 and F1/F2 or Cy3/Cy5 is data 2.4.5 Data structures: Gene (continued)The Genomic IDs include: Clone_ID, GenBankAcc
48、, GenBankAcc3, GenBankAcc5, Unigene_ID, dbEST3, dbEST5, SwissProt, RefSeqID, LocusID. The Master_ID is set to one of these. The arrays GenomicID and nGenomicID may be used for specifying external identifiers for particular user databases Gene names include: Gene_Name, UGclusterName. The MasterGeneNa
49、me is set to one of theseAdditional identifiers include: Gene_Class, plate, plate_row, plate_colEach gene has various properties indicated the inclusive-or of C_xxxx constants2.5 Algorithms: initialization and event handlingMAExplorer.init()1. Read (name,value) parameters from Applet PARAMs or Confi
50、g file2. Read database files and set up database structures3. Create GUI with scrollable pseudoarray image with ArrayScroller, ScrollableImageCanvas, DrawPseudoImage 4. Create pull-down MenuBar with MenuBarFrameEventMenu - pull-down menu event handling1. Menu item command EventMenu.handleActions() -
51、 eval menu command2. Menu checkbox item EventMenu.handleItemStateChanged() - eval menu checkbox commandScrollableImageCanvas - pseudoarray direct manipulation event handling1. Select spots invokes PopupRegistry current gene change 2. Select sample invokes PopupRegistry change current HP sample and F
52、ilter 2.6 Startup (name,value) Parameters: .mae file or Applet PARAMsThe Config class contains many of the state variables (MAExplorer contains most of the rest)The parameters are set in the Config class using the GetParam class to get (Name,Value) data definitions if they exist. They are defined in
53、 an overide hierarchy:1) Parameters are initially defined by reading the Config file using the ConfigTable class. If they are not defined, then either the variables are not defined or use hardwired values 2.a) These are overidden using PARAM values if they exist when using an applet, or2.s) They are
54、 overidden using (Name,Value) data from the .mae startup file when used in stand-alone mode2.7 Data structures: hybridized samples (HP)The MaHybridSample class contains the un-normalized data for a particular sample read by MAExplorer. It uses a one-time instance of MaHPquantTable class to read and
55、parse the .quant data fileThe SampleSets class contains the working HP-X, HP-Y and HP-E lists of MaMybridSample instances used by the data Filter. It also contains the menu sample names (parsed from SamplesTable class by StageNames class)The HPxyData class contains the the current HP-X and HP-Y samp
56、le sets data for a particular gene MID including the statistics for each set for use in computations on a single gene across X-Y samplesNew set statistics are computed for a new MID using HPxyData.updateData()2.7.1 Data structures: sample condition setsThe Condition class contains named lists of nam
57、ed hybridized samples These may be copied to the SampleSets data structure working lists for the HP-X set, HP-Y set and HP-E list It also contains ordered lists of Conditions that could be used with expression lists of averaged samples2.8 Data structures: spot data for genesThe SpotData is a data-on
58、ly class holds the raw and normalized data copied from a single Gene for a single sampleThe SpotData instance is loaded using the MaHybridSample.getData() methods. There are a number of different methodsThe SpotFeature class methods computes a pretty-print string summary line or 3 lines for data for
59、 a single Gene for one HP, HP-(X,-Y), HP-(X,Y) sets, or HP-EThis summary line is modified for single channel (F1 or F2) and (Cy3 or Cy5) intensity data or ratio data (Cy3/Cy5), HP-X/HP-Y etc. taking the normalization mode into account2.8.1 Data structures: lists of spot data for genesArrays of a spe
60、cific sample HP (F1,F2) (I.e. Cy3, Cy5) gene spot data passing the data Filter are loaded using the MaHybridSample.getF1F2data() methodArrays of single HP-(X,Y) samples data passing the data Filter are accessed using the CompositeDatabase.getHP_XandYdata() methodsArrays of sets of HP-(X,Y) data pass
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025物业管理服务合同书 合同范本
- 2025年个人抵押借款合同模板2
- 2025合同管理助力企业经营:合规降险提效增值
- 2025家庭保姆雇佣合同范本
- 2024年黑色金属冶炼及压延产品项目资金筹措计划书代可行性研究报告
- 编程语言基础考核试卷
- 2025版办公室租赁合同范本
- 2025购物中心商铺租赁合同
- 2025年农村房屋买卖合同范本
- 2025郑州市购销合同书模板
- 液体配制安全
- 《电动航空器电推进系统技术规范》
- 2024河北高考地理真题卷解析 课件
- 城市道路日常养护作业服务投标文件(技术方案)
- 《当前国际安全形势》课件
- 3.1 贯彻新发展理念 课件-高中政治统编版必修二经济与社会
- 《互换性复习》课件
- 《光伏系统设计培训》课件
- 设备的运行动态管理制度(4篇)
- 抖店仲裁申请书模板
- 借款利率协议
评论
0/150
提交评论