面向交互式分析的大数据索引方案_第1页
面向交互式分析的大数据索引方案_第2页
面向交互式分析的大数据索引方案_第3页
面向交互式分析的大数据索引方案_第4页
面向交互式分析的大数据索引方案_第5页
已阅读5页,还剩39页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、CarbonData:面向交互式分析的索引文件格式Big DataNetwork54B records per day750TB per monthComplex correlated data2Consumer100 thousands of sensors2 million events per secondTime series, geospatial dataEnterprise100 GB to TB per dayData across different domains企业数据量大,维度多,结构复杂,且在快速增长Typical ScenarioBig TableEx. CDR,

2、transaction, Web log,Small tableSmall tableUnstructured datadataReport & DashboardOLAP & Ad-hocBatch processingMachine learning企业中包含多种数据应用,从商业智能到批处理到机器学习Realtime Analytics3Analytic ExamplesTracing and Record Query for Operation Engineer4过去1天使用Whatapp应用的终端按流量排名情况?过去1天上海市每个小区的网络拥塞统计?Challenge - Data5D

3、ata SizeSingle Table 10 BFast growingMulti-dimensionalEvery record 100 dimensionAdd new dimension occasionallyRich of DetailBillion level high cardinality1B terminal * 200K cell * 1440 minutes = 28800 (万亿)百亿级数据量多维度细粒度Challenge - ApplicationEnterprise IntegrationSQL 2003 Standard SyntaxBI integration

4、, JDBC/ODBCFlexible QueryAny combination of dimensionsOLAP Vs Detail RecordFull scan Vs Small scanPrecise search & Fuzzy searchSmall Scan QueryFull Scan QueryMulti-dimensional OLAP Query企业应用集成6灵活查询 无固定模式How to choose storage?7如何构建数据平台?Option1:NoSQL DatabaseKey-Value store: low latency, 5ms只能通过Key访问,

5、一键一值适合实时应用对接,不适合分析型应用8Option2:Parallel databaseParallel scan + Fast computeQuestionable scalability and fault-toleranceCluster size 100 data nodeNot suitable for big batch job细粒度控制并行计算,适合中小规模 数据分析(数据集市)9扩展能力有上限 查询内容错能力弱不适合海量数据分析(企业级数仓Option3:Search engineAll column indexedFast searchingSimple aggreg

6、ationDesigned for search but not OLAPNot for TopN, join, multi-level aggregation34X data expansion in sizeNo SQL support数据膨胀10无法完成复杂计算专用语法,难以迁移适合多条件过滤,文本分析Option4:SQL on HadoopModern distributed architecture, scale well in computation.Pipeline based: Impala, Drill, Flink, BSP based: Hive, SparkSQLBU

7、T, still using file format designed for batch jobFocus on scan onlyNo index support, not suitable for point or small scan queries仍然使用为批处理设 计的存储,场景受限11并行扫描+并行计算 适合海量数据计算Capability MatrixOptionStoreGoodBadKV StoreHBase, Cassandr a, Parallel databaseGreenplum, Vertica, Search engineSolr, ElasticSearc h

8、, SQL on Hadoop - PipelineImpala, HAWQ,Drill, SQL on Hadoop - BSPHive, SparkSQLSmall Scan QueryFull Scan QueryMulti-dimensional OLAP Query只针对某个场景设计,解决一部分问题12Architects choiceLoadingDataReplicationDataChoice 1: Compromising做出妥协,只满足部分应用App1App2App313Choice 2: Replicating of data复制多份数据,满足所有应用App1App2Ap

9、p3MotivationCarbonData: Unified StorageSmall Scan QueryFull Scan QueryMulti-dimensional OLAP Query一份数据满足多种分析场景详单过滤,海量数仓,数据集市,14Spark + CarbonData:打造大数据交互式分析引擎15Apache CarbonData社区介绍CarbonData 2016年6月全票通过正式进入Apache孵化器。目标:更易用,一份存储覆盖更多场景更高的分析性能,面向用户提供交互式分析已发布了3个Apache稳定版本欢迎订阅邮件列表和贡献:Code: /apache/incub

10、ator-carbondataJIRA: /jira/browse/CARBONDATAMaillist: dev贡献者来自: Huawei, Talend, Intel, eBay, Inmobi, 美团, 阿里, 乐视,HuluComputeStorage16Carbon-Spark IntegrationBuilt-in Spark integrationSpark 1.5, 1.6,2.0InterfaceSQLDataFrame APIData ManagementBulk load/Incremental loadDelete loadCompactionReader/Writer

11、Data ManagementQuery OptimizationCarbon FileCarbon FileCarbon File17Integration with SparkQuery CarbonData TableDataFrame APISQLcarbonContext.read.format(“carbondata”).option(“tableName”, “table1”).load()18CREATE TABLE IF NOT EXISTS T1 (name String, PhoneNumber String) STORED BY “carbondata”LOAD DAT

12、A LOCAL INPATH path/to/data INTO TABLE T1sqlContext.read.format(“carbondata”).load(“path_to_carbon_file”)With late decode optimization and carbon-specific SQL command supportData Ingestion19Bulk Data IngestionLoad from CSV fileLoad from other tableSave Spark Dataframe as Carbon data filedf.write.for

13、mat(“carbondata).options(tableName“, “tbl1).mode(SaveMode.Overwrite).save()LOAD DATA LOCAL INPATH folder path OVERWRITE INTO TABLE tablename OPTIONS(property_name=property_value, .)INSERT INTO TABLE tablennmeselect_statement1 FROM table1;Segment IntroductionSegmentJDBCServer (Load)JDBCServer (Query)

14、Carbon TableEvery data load becomes one segment in CarbonData table, data is sorted within one segment.Segment Manager (ZK based)Carbon FileSegmentCarbon FileSegmentCarbon File20CarbonData Table OrganizationCarbon FileData FooterCarbon FileData FooterCarbon FileData FooterCarbon FileData FooterDicti

15、onary FileDictionary MapIndexIn Memory B TreeIndex FileAll FooterSchema FileLatest Schema/tableName/fact/segmentId/tableName/metaHDFSSpark(append only)(rewrite)(Index is stored in the footer of each data file)21Data Compaction22Data compaction is used to merge small filesRe-clustering across loads f

16、or better performanceTwo types of compactions supportedMinor compactionCompact adjacent segment based on number of segmentAutomatically triggerMajor compactionCompact segments based on sizeManually triggerALTER TABLE db_name.table_name COMPACT MINOR/MAJORCarbonData File StructureBuilt-in Columnar &

17、IndexStore index and data in the same file, co-located in HDFSBalance between batch and point queryIndex support:Multi-dimensional Index (B+ Tree)Min/Max indexInverted indexEncoding:Dictionary, RLE, DeltaSnappy for compressionData Type:Primitive type and nested typeSchema Evolution:Add, Remove, Rena

18、me columnsCarbonData FileBlockletBlockletBlockletFooter23Index IntroductionSpark DriverExecutorCarbon File DataFooterCarbon File DataFooterCarbon File DataFooterFile Level Index & ScannerTable Level IndexExecutorFile Level Index & ScannerCatalystColumn Level Index24Multi-level indexes:Table level in

19、dex: global B+ tree index, used to filter blocksFile level index: local B+ tree index, used to filter blockletColumn level index: inverted index within column chunkEncoding ExampleData are sorted along MDK (multi- dimensional keys)Data stored as index in columnar formatDictionary Encoding1,1,1,1,1 :

20、142,114321,1,1,3,2 :541,547021,1,1,1,3 :443,446221,1,2,1,4 :545,588711,1,2,1,5 :675,561811,1,3,3,6 : 52,97491,1,3,1,7 :570,510181,1,3,2,8 :561,55245Blocklet Logical ViewSort (MDK Index)1,1,1,1,1 :142,114321,1,1,1,3 :443,446221,1,1,3,2 :541,547021,1,2,1,4 :545,588711,1,2,1,5 :675,561811,1,3,1,7 :570,

21、510181,1,3,2,8 :561,552451,1,3,3,6 :Sorted MDK IndexC1C2C3C4C5C6C71111114211432111134434462211132541547021121454558871112156755618111317570510181132856155245113365297491241114411532124395255039826YearsQuartersMonthsTerritoryCountryQuantitySales2003QTR1JanEMEAGermany14211,4322003QTR1JanAPACChina54154

22、,7022003QTR1JanEMEASpain44344,6222003QTR1FebEMEADenmark54558,8712003QTR1FebEMEAItaly67556,1812003QTR1MarAPACIndia529,7492003QTR1MarEMEAUK57051,0182003QTR1MarJapanJapan56155,2452003QTR2AprAPACAustralia52550,3982003QTR2AprEMEAGermany14411,532Column splitFile LevelBlocklet Index1111111200011 1 21250001

23、1 2 1111200011 2 212500011 3 1111200011 3 2125000Blocklet 212 1 3231100012 2 3231100012 3 3231100013 1 434200013 1 534100013 2 4342000Blocklet 313 2 534100013 3 434200013 3 534100014 1 4112000014 2 4112000014 3 41120000Blocklet 4211111120002112125000212111120002122125000213111120002132125000Blocklet

24、 Index Blocklet1Start Key1End Key1Start Key1 End Key4StartEnd Key2Key1StartKey3 End Key4StarEnd Key1t Key1StartKey2End Key2StartEnd Key3Key3StartKey4 End Key4File FooterBlockletBlocklet 1Build in-memory file level MDK index tree for filteringMajor optimization for efficient scanC1(Min, Max).C7(Min,

25、Max)Blocklet4Start Key4End Key4C1(Min, Max).C7(Min, Max)C1(Min,Max) C7(Min,Max)C1(Min,Max) C7(Min,Max)C1(Min,Max) C7(Min,Max)C1(Min,Max) C7(Min,Max)26BlockletBlock PruningHDFSBlockFooterBlockletBlockletC1C2C3C4C5C6C7C9Spark Driver side index (table level)Idnverte IndexQuery optimizationPredicate pus

26、h-down: leveraging multi-level indexesColumn PruningBlockBlockletBlockletFooterBlockFooterBlockletBlockletBlockBlockletBlockletFooter27Blocklet Rows1|1 :1|1 :1|1 :1|1 :1|11|2 :1|2 :1|2 :1|2 :1|91|3 :1|3 :1|3 :1|4 :2|31|4 :1|4 :2|4 :1|5 :3|21|5 :1|5 :2|5 :1|6 :4|41|6 :1|6 :3|6 :1|9 :5|51|7 :1|7 :3|7

27、:2|7 :6|81|8 :1|8 :3|8 :3|3 :7|61|9 :2|9 :4|9 :3|8 :8|71|10:2|10:4|10:3|10 :9|10: 142:11432: 443:44622: 541:54702: 545:58871: 675:56181: 570:51018: 561:55245: 52:9749: 144:11532: 525:50398Blocklet( sort column within column chunk)Run Length Encoding & CompressionDim1 Block 1(1-10)Dim3 Block 1(1-3)2(

28、4-5)Dim4 Block 1(1-2,4-6,9)2(7)Dim5 Block 1(1,9)2(3)6(8)7(6)8(7)9(10)Columnar StoreDim2 Block 1(1-8)2(9-10)Column chunk Level inverted IndexMeasure1 Measure2 BlockBlock142:11432443:44622570:51018561:5524552:9749144:11532525:50398Column ChunkInverted IndexOptionally store column data as inverted inde

29、x within column chunksuitable to low cardinality columnbetter compression & fast predicate filteringBlocklet Physical View142114324434462254154702545588713(6-8)3(3,8,10)3(2)541:54702675561814(9-10)4(4)545:58871570510185(5)675:56181561552455297491441153252550398rdrdrdd111111111110 1081031062212224292

30、21311339333311147422111354111C1C2C3C4C5C6drdrrC728Column GroupAllow multiple columns form a column groupstored as a single column chunk in row- based formatsuitable to set of columns frequently fetched togethersaving stitching cost for reconstructing row10223233815.210250152918.510351185222.81166029

31、1632.912868321821.6Chunk Chunk ChunkBlocklet 1C1C2C3C4C5C6ColColColCol ChunkCol Group Chunk29Nested Data Type RepresentationRepresented as a composite of two columnsOne column for the element valueOne column for start index & length of ArrayArraysRepresented as a composite of finite number of column

32、sEach struct element is a separate columnStrutsNameArrayJohn192,191Sam121,345,333Bob198,787NameArray start,lenPh_NumberJohn0,2192Sam2,3191Bob5,2121345333198787NameInfo StrutJohn31,MSam45,FBob16,MNameInfo.ageInfo.genderJohn31MSam45FBob16M30Encoding & CompressionEfficient encoding scheme supported:DEL

33、TA, RLE, BIT_PACKEDDictionary:table level global dictionary - Enable Lazy Decode optimizationCompression:Column data compression: SnappyAdaptive Data Type CompressionSpeedup AggregationReduce run-time memory footprintEnable fast distinct count31Big Win:Lazy DecodeAggregationFilter ScanAggregationFil

34、terScanDictionaryDecodeTranslate dictionary key to valueGroupby on dictionary keyOriginal plan32Optimized planCarbonData性能对比(Spark-Parquet)1.45X 131X33测试环境p 集群:3(Worker)+1(Master),40 核, 384GB, 10G网络带宽p 软件:Hadoop 2.7.2,Spark 1.5.2p 数据:10亿记录,300列,原始数 据1.9TB查询特点p Point Query:基于主key过滤p Small Scan:包含多个列过

35、滤p Full Scan:复杂聚合、Join,无过 滤条件p OLAP Query:同时带过滤,聚合i对于小结果集的场景,mpala 依赖全表扫描,而 Carbon使用索引测试环境p 集群:3(Worker)+1(Master),40 核, 384GB, 10G网络带宽p 软件:Hadoop 2.7.2,Spark 1.5.2,Impala 2.6p 数据:10亿记录,300列,原始数 据830G查询特点p 多维度过滤p 多个Join:与多个维表34CarbonData性能对比(Impala)Success Case应用案例35客户价值l 资源采用Yarn统一管理,用户可配,可调l 百亿-千亿数据,秒级响应(半年查询10秒,1年 查询700s,且极易查询失败)l 交互查询稳定性问题(impala存在挂死情况)l impala资源不能统一管理,无法共享应对方案36l 批量加工: Hivel 交互式分析: SparkSQL+CarbonData客户价值l 资源采用Yarn统一管理,用户可配,可调l 百亿-千亿数据,任意维度过滤查询,秒级响应电信详单分析:开源系统无法满足业务性能和稳定性要求数据源数据导入批量作业网络数据每5分钟入库,200万条每秒Spark+CarbonYarnHDFS数据平台系统管理M

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论