重构基因大数据分析解决方案_第1页
重构基因大数据分析解决方案_第2页
重构基因大数据分析解决方案_第3页
重构基因大数据分析解决方案_第4页
重构基因大数据分析解决方案_第5页
已阅读5页,还剩28页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、重构基因大数据分析解决方案从自动化到智能化随着成本降低及分析流程的简化,基因测序分析技术正越来越多进入 到生命科学研究的各个领域$100$1,000$10,000$100,000$1,000,000$10,000,000$100,000,00020012003200520072009201120132015201720COST PER HUMAN GENOMEMoores Law/27541954/dna-sequencing-costs-data/肿瘤诊断、分型药物研发,伴随诊断群体遗传研究个体化医学常见慢性病罕见疾病感染性疾病农业育种数字时代信息时代工业化时代人类基因组计 划DNA的发现孟

2、德尔时代TECHNICAL & BUSINESS SUPPORTMARKETPLACEANALYTICSDEV/OPSMOBILESERVICESIoTAIENTERPRISEAPPSHYBRIDARCHITECTUREMIGRATIONAPP SERVICESINFRASTRUCTURECORE SERVICESSECURITY & COMPLIANCEMANAGEMENT TOOLS数字化时代工具和技术AWS提供的丰富服务将帮助创造新的商业,产品,服务和体验CORE SERVICESIntegrated NetworkingRules EngineDevice ShadowsDevice

3、SDKsDevice GatewayRegistryLocal ComputeCustom Model Training & HostingConversational ChatbotsVirtual DesktopsApp StreamingSchema ConversionImage & Scene RecognitionSharing & CollaborationExabyte-Scale Data MigrationText to SpeechCorporate EmailApplication MigrationDatabase MigrationRegionsAvailabili

4、ty ZonesPoints of PresenceBusiness IntelligenceData WarehousingElasticsearchHadoop/SparkData PipelinesStreaming Data CollectionETLStreaming Data AnalysisInteractive SQL QueriesWorkflowQueuing & NotificationsEmailTranscodingDeep Learning (Apache MXNet, TensorFlow, & others)Server MigrationCommunicati

5、onsMARKETPLACEBusiness AppsBusiness IntelligenceDevOps ToolsSecurityNetworkingDatabasesStorageAPI GatewaySingle Integrated ConsoleIdentitySyncMobile AnalyticsMobile App TestingTargeted Push NotificationsOne-click App DeploymentDevOps Resource ManagementApplication Lifecycle ManagementContainersTrigg

6、ersResource TemplatesBuild & TestAnalyze & DebugIdentity ManagementKey Management & StorageMonitoring & LogsConfiguration ComplianceWeb Application FirewallAssessment & ReportingResource & Usage AuditingAccess ControlAccount GroupingDDOSProtectionTECHNICAL & BUSINESS SUPPORTSupportProfessional Servi

7、cesOptimization GuidancePartner EcosystemTraining & CertificationSolutions ManagementAccount ManagementSecurity & Billing ReportsPersonalized DashboardMonitoringManage ResourcesData IntegrationIntegrated Identity & AccessIntegrated Resource & Deployment ManagementIntegrated Devices & Edge SystemsRes

8、ource TemplatesConfiguration TrackingServer ManagementService CatalogueSearchMIGRATIONHYBRID ARCHITECTUREENTERPRISE APPSMACHINE LEARNINGIoTMOBILE SERVICESDEV OPSANALYTICSAPP SERVICESINFRASTRUCTURESECURITY & COMPLIANCEMANAGEMENT TOOLSComputeVMs, Auto-scaling, Load Balancing, Containers, Virtual Priva

9、te Servers, Batch Computing, Cloud Functions, Elastic GPUs, Edge ComputingStorageObject, Blocks, File, Archivals, Import/Export, Exabyte-scale data transferCDNDatabasesRelational, NoSQL, Caching, Migration, PostgreSQL compatibleNetworkingVPC, DX, DNSFacial Recognition & AnalysisFacial SearchPatching

10、Contact Center信息化时代构建复杂工作流程Hours基因数据分析工作流概要tool-1tool-2tool-4tool-atool-3tool-b原始数据atgatct gatcgat ctga处理结果0100100101010010102Mins.3计算密集型 内存密集型Files inFiles out基因大数据分析中面临的挑战海量,多元化, 分散数据可扩展计算多种基因数据类型(全基因组, 外显子, 目标区域,RNA,)单个样本基因数据大小,从10+GB,到100+GB。测序技术进步使得更多测序被完成从前端实验室到后台数据中心的传输存储基因分析涉及多个步骤和工具针对不同基因数据

11、类型和工具,有不同计算 资源需求On- PremisesLift & Shift典型的 TCO 比较可使用的基因分析工具开源工具AWS中国区已支持30+款常用基因分析工具自动部署.Sfn/Cromwell/Nextflow/AirflowGATK,BLAST,GROMACS, AMBEROMOP Common Data ModelObservational Health Data Scienceand Informatics (OHDSI) toolsProject REDCapHail, Bulter for genomicsRStudio, Rshiny, Jupyter,DeepVari

12、ant,DLNexNVIDIA Clara GenomicsHPC orchestrators, ParallelCluster, Slumer, SGE, HTCondor,ML frameworks like TensorFlow, MXNet, etc.商业版工具在AWS Marketplace里可以找到诸多商业版工具.Illumina DRAGENSentieonSASTableauNVIDIA ParabricsSchodingerHortonworksCorda Enterprise BlockchainInformaticaCloudyClusterAlces FlightMat

13、hWorks MATLABTeradataAnd more 自研工具定制化分析流程.Create automation for any of your existing tools.Automate the application of your organizations security best practices.On- PremisesLift & ShiftInstance Right- SizingImproved ElasticityMeasure, Monitor, ImproveOptimized EC2典型的 TCO 比较利用AWS FPGA实例的Illumina Dra

14、gon2017 年费城儿童医院(CHOP)使用DRAGEN平台,在2小时25分钟内完成1000例儿 科基因组数据分析,被授予吉尼斯世界纪录。一键式部署的弹性基因分析HPC集群Amazon S3DynamoDBAmazon SQSCloudWatchInternet Gateway (IGW)私有子网 / 安全组 / 置放群组CloudFormation公有子网VPC NATGateway客户端堡垒机互联网计算节点1 ComputeNode1 (pbs_mom)计算节点2计算节点群集Computenode2 Computenodes (pbs_mom)(pbs_mom)自动扩展组Private

15、subnet挂载 NFS主节点NFS 文件共享服务本文使用Masternode直接挂载共享Masternode(/public/)(pbs_server)挂载 NFS互联网访问外部客户端访问SSH 通信NFS 文件共享图例AWS CloudVPCPublic subnet借助AWS深度定制的集群工具, 星亢原的工程师能够在分钟级启 动HPC环境(CPU/GPU),并且按照资源需求自动地进行弹性伸 缩,在降低成本的同时将分析任务周期缩短30%。可视化编排流程分析工具Nextflow借助AWS的服务,金匙基因建立了一个新的HPC分析平台,该平台 可将分析速度提高一倍,同时降低运营和维护成本On-L

16、ift & ShiftInstanceImprovedMeasure,OptimizedStoragePremisesRight-ElasticityMonitor,EC2OptimizationSizingImprove典型的 TCO 比较Amazon S3 分级存储 与 自动生命周期管理频繁不频繁活动,频繁访问毫秒级访问 3 AZ起算: ¥0.147/GB访问频率可变毫秒级访问 3 AZ起算: ¥0.1470 至¥0.0875/GB逐对象监控计费.最小存储期限不常访问毫秒级访问 3 AZ起算: ¥0.0875/GB数据获取按GB计费最小存储时长最小对象大小S3 StandardS3 Sta

17、ndard-IA访问频率S3 Glacier归档数据恢复在线延时3-5小 时 3 AZ起算: ¥0.028/GB数据获取按GB计费最小存储时长最小对象大小S3 Intelligent- TieringS3 Glacier Deep Archive归档数据恢复在线延时10+小时 3 AZ起算: ¥0.007/GB数据获取按GB计费最小存储时长最小对象大小通过生命周期管理优化基因数据存储EBSS3 / EFS /FSxS3 - IAGlacierGlacier Deep ArchiveBCLFASTQCRAM/BAMVCF“”genomicsMacrogen 利用Amazon S3 Glacier

18、实现更安全、更可靠的标准化全球 大规模数据管理通过自动化分布式的存储,提 升了数据稳定性和持久性满足安全合规要求建立了标准化的全球备份系, 相比自建本地数据中心备份成 本降低了35%解决方案用户受益Sukang Lee, Chief Operating Officer, Macrogen, Inc.Company: Industry: Country:Macrogen, Inc. Life SciencesRepublic of KoreaEmployees: 550+ globally Website: 关于 MacrogenMacrogen 是一家在韩国KOSDAQ上 市的生物创新企业,是

19、全球领先的精 准医学及生物技术公司。Macrogen 通过其在韩国最大测序中 心和数据基础设施,为全球153个国 家上万的客户提供基因数据服务。Macrogen manages 15+PB of data and generates massive genomic data for further analysis across global sites every day. Using Amazon S3 Glacier, Macrogen is able to manage big data in a more secure, reliable, cost-effective, andst

20、andardized manner globally.挑战需要安全、可靠以及经济的方式用于每天备份大量基因数据满足安全合规要求,例如欧盟GDPR 以及韩国的ISMS需要标准化的备份系统服务于全 球各分子公司Amazon S3 Glacier 提供了 可靠,便宜的大规模数据备 份方案全球AWS区域,提供了标准 化,安全的数据安全备份方 案”“On- PremisesLift & ShiftInstance Right- SizingImproved ElasticityMeasure, Monitor, ImproveOptimized EC2StorageServerless Optimiza

21、tion Architecture典型的 TCO 比较借助 Lambda 和事件触发机制构建自动化报告交付在云上推理结果传输至S3对象存储桶自动触发无服务器lambda生成 报告,交付给终端用户AI InferenceInstanceAmazon RDSAmazon Elastic Block StoreLabel analysisAI TrainingP3 instanceLoad data to NFSS3LifecycleAmazon S3 GlacierUpload dataInstanceNFSMONITORHospital ADoctorMONITORHospital BDocto

22、rModel FilesInstanceInstanceOutput ModelUpload dataUpload dataLambdaReferenceResultReportS3Reference Request“”genomics“”CSIRO 利用AWS Lambda 服务讲将数天的分 析缩减到分钟级别解决方案2018年CSIRO开发并发布了Gen-Phen- Insight工具,用于支持临床决策。基于 Amazon API Gateway,AWS Lambda,和 Amazon SNS搭建的无服务器架构,Gen- Phen-Insight允许用户在输入基因名称或坐 标后快速返回可能的

23、目标位点。同时利用 AWS X-Ray CSIRO能够更好的监控软件的 运行状态。挑战CSIRO需要处理大量不可预期 的分析负载需求,他们通过持 续运行的在线网页服务,来处 理大量实时需求。用户受益降低了 80%运行时间处理大量请求的时间由数天降低至 数分钟节省了硬件及架构管理成本Saved in hard costs and infrastructure management快速完成交付,测试,原型组件Company: CSIROIndustry: Country: Website:Scientific & Industrial Research Australiawww.csiro.au-

24、 Denis Bauer, Head of Cloud Computing and Bioinformatics at CSIRO(We) reduced the runtime from days to minutes. The cost of traditional approaches would have been prohibitive if we wanted to persist appropriately- sized compute resources.关于 CSIROCSIRO (Commonwealth Scientific and Industrial Research

25、 Organisation) 位于澳 大利亚的首都堪培拉,是一个支持临床和 科学研究的基因分析机构。On- PremisesLift & ShiftInstanceRight-ElasticityMonitor,EC2Optimization ArchitectureServicesOptimizedSizingImproveImprovedMeasure,OptimizedStorageServerlessManagedTrue AWS典型的 TCO 比较数据正在改变医疗生命科学行业的创新模式市场数 据个人健康环境数据临床信息基因组学对数据的需求将会愈发强烈分析人工智能机器学习哪些数据可访问

26、到?公开数据集AWS hosts a variety of public datasets that anyone can access for free. Below are just a few examples.1000 Genomes ProjectThe Cancer Genome AtlasInternational Cancer Genome Consortium3000 Rice GenomeGenome in a Bottle (GIAB)The Genome Modeling SystemMedicare Drug SpendingThe Human Connectome

27、ProjectThe Human Microbiome ProjectOpenNeuroPhysionetTabula murisOpenStreetMapsand more.本地数据Access your existing data.Electronic health recordsMedical imaging (PACS/DiCOM/VNA)LabsGenomicPatient monitoringFinancialSupply chain共享数据Privately give and receive access to data.Easily give and receive acces

28、s to data on AWS from other researchersCreate shared data lakes with other institutionsAccess commercial data setsAWS 公开数据集生命科学International Neuroimaging Data-Sharing Initiative (INDI)Fly Brain Anatomy: FlyLight Gen1 and Split-GAL4 ImageryeBird Status and Trends Model ResultsOpen NeuroDataNYU Langon

29、e & FAIR FastMRI Dataset3000 Rice Genomes ProjectEncyclopedia of DNA Elements (ENCODE)Allen Mouse Brain AtlasAfrica Soil Information Service (AfSIS) Soil ChemistryMIMIC-III (Medical Information Mart for Intensive Care)Allen Brain Observatory - Visual Coding AWS Public Data Set基因相关肿瘤COVID19生命科学基因相关NI

30、H NCBI Sequence Research Archive (SRA) on AWS1000 GenomesThe Genome Modeling SystemGATK Test DataGenome ArkBasic Local Alignment Sequences Tool (BLAST) DatabasesThe Human Microbiome ProjectVariant Effect Predictor (VEP) and the Loss-Of-Function Transcript Effect Estimator (LOFTEE) PluginRefgenie ref

31、erence genome assetsICGC on AWSCloud Indexes for Genomic Analyses肿瘤COVID19AWS 公开数据集生命科学基因相关肿瘤The Cancer Genome AtlasTherapeutically Applicable Research to Generate Effective Treatments (TARGET)Cancer Cell Line Encyclopedia (CCLE)Clinical Proteomic Tumor Analysis Consortium 2 (CPTAC-2)Clinical Proteo

32、mic Tumor Analysis Consortium 3 (CPTAC-3)CoMMpass from the Multiple Myeloma Research FoundationBeat Acute Myeloid Leukemia (AML) 1.0Clinical Trial Sequencing Project - Diffuse Large B-Cell LymphomaFoundation Medicine Adult Cancer Clinical Dataset (FM-AD)Variant Effect Predictor (VEP) and the Loss-Of

33、-Function Transcript Effect Estimator (LOFTEE) PluginCancer Genome Characterization Initiatives - Burkitt Lymphoma Genome Sequencing ProjectNational Cancer Institute Center for Cancer Research - Diffuse Large B Cell Lymphoma (DLBCL) Genomics and ExpressionGabriella Miller Kids First Pediatric Resear

34、ch Program (Kids First)ICGC on AWSPancreatic Cancer Organoid ProfilingHuman Cancer Models Initiative (HCMI) Cancer Model Development CenterOregon Health & Science University Chronic Neutrophilic Leukemia DatasetCancer Genome Characterization Initiatives - Burkitt Lymphoma, HIV+ Cervical CancerCOVID1

35、9AWS 公开数据集生命科学基因相关肿瘤COVID19COVID-19 Molecular Structure and Therapeutics HubCOVID-19 Genome Sequence DatasetOzone Monitoring Instrument (OMI) / Aura NO2 Tropospheric Column DensityCOVID-19 Harmonized DataCOVID-19 Data LakeCOVID-19 Open Research Dataset (CORD-19)AWS 公开数据集Complying with virtually ever

36、y regulatory agency符合医疗生命科学合规要求CSACloud Security Alliance ControlsISO 9001Global Quality StandardISO 27001Security ManagementControlsISO 27017Cloud Specific ControlsISO 27018Personal Data ProtectionPCI DSS Level 1 Payment Card StandardsSOC 1Audit Controls ReportSOC 2Security, Availability, & Confidentiality ReportSOC 3General Controls ReportGlobalUnited StatesCJISCriminal Justice Information ServicesDoD SRGDoD Data ProcessingFedRAMPGovernment Data StandardsFERPA

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论