ApacheKylin在大数据系统中应用课件

上传人：2*** IP属地：贵州上传时间：2023-09-21 格式：PPT 页数：36 大小：2.86MB 积分：25 举报 版权申诉

已阅读5页，还剩31页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Apache

KylinOLAP

HadoopApacheKylinOLAPonHadoop1http://kylin.ioAgenda

What’s

Apache

Kylin?Tech

HighlightsPerformanceRoadmapQ&Ahttp://kylin.ioAgendaWhat’sA2Extreme

OLAP

Engine

for

Big

DataKylin

open

source

Distributed

Analytics

Engine

from

eBay

thatprovides

SQL

interface

and

multi-dimensional

analysis

(OLAP)

onHadoop

supporting

extremely

large

datasetsWhat’s

Kylinkylin

ˈkiːˈlɪn

麒麟--n.

(in

Chinese

art)

mythical

animal

composite

form•

Open

Sourced

Oct

1st,

2014•

accepted

Apache

Incubator

Project

Nov

25th,

2014ExtremeOLAPEngineforBigDa3

Big

Data

Era

and

data

becoming

available

Hadoop

Limitations

existing

Business

Intelligence

(BI)

Tools

Limited

support

for

HadoopData

size

growing

exponentiallyHigh

latency

interactive

queriesScale-Up

architecture

Challenges

adopt

Hadoop

interactive

analysis

system

Majority

analyst

groups

are

SQL

savvyNo

mature

SQL

interface

HadoopOLAP

capability

Hadoop

ecosystem

not

ready

yetBigDataEraLimitedsupport45

Why

notBuild

engine

from

scratch?5 Whynot5

Extreme

Scale

OLAP

EngineKylin

designed

query

10+

billions

rows

Hadoop

ANSI

SQL

Interface

HadoopKylin

offers

ANSI

SQL

Hadoop

and

supports

most

ANSI

SQL

query

functions

Seamless

Integration

with

ToolsKylin

currently

offers

integration

capability

with

Tools

Tableau.

Interactive

Query

CapabilityUsers

can

interact

with

Hive

tables

sub-second

latency

MOLAP

CubeDefine

data

model

from

Hive

tables

and

pre-build

Kylin

Scale

Out

ArchitectureQuery

server

cluster

supports

thousands

concurrent

users

and

provide

high

availabilityFeatures

HighlightsExtremeScaleOLAPEngineKyli6

Compression

and

Encoding

SupportIncremental

Refresh

CubesApproximate

Query

Capability

for

distinct

count

(HyperLogLog)Leverage

HBase

Coprocessor

for

query

latencyJob

Management

and

MonitoringEasy

Web

interface

manage,

build,

monitor

and

query

cubesSecurity

capability

set

ACL

Cube/Project

LevelSupport

LDAP

IntegrationFeatures

Highlights…CompressionandEncodingSupp7Cube

DesignerCubeDesigner8Job

ManagementJobManagement9Query

and

VisualizationQueryandVisualization10Tableau

IntegrationTableauIntegration11CaseCubeSizeRawRecordsUserSessionAnalysis26TB28+billionrowsClassifiedTrafficAnalysis21TB20+billionrowsGeoXBehaviorAnalysis560GB1.2+billionrows

eBay

90%

query

seconds

Baidu

Map

internal

analysis

Many

other

Proof

Concepts

Bloomberg

Law,

British

GAS,

JD,

Microsoft,

StubHub,

Tableau

…Who

are

using

KylinCaseCubeSizeRawRecordsUserSess12http://kylin.ioAgenda

What’s

Apache

Kylin?Tech

HighlightsPerformanceRoadmapQ&Ahttp://kylin.ioAgendaWhat’sA13OLAPCubeKylin

Architecture

Overview15SQL-Based

Tool

(BI

Tools:

Tableau…)

JDBC/ODBC

SQL

Online

AnalysisData

Flow

Offline

Data

Flow

Clients/Users

interactive

with

Kylin

via

SQL

OLAP

Cube

transparent

users

Mid

Latency

MinutesHadoop

Hive

Star

Schema

DataLow

Latency

-Seconds

Data

Cube

(HBase)

Key

Value

Data3rd

Party

App(Web

App,

Mobile…)

REST

API

SQL

REST

Server

Query

Engine

Routing

MetadataCube

Build

Engine

(MapReduce…)OLAPCubeKylinArchitectureOve14Cube:

…Fact

Table:

…Dimensions:

…Measures:

…Storage(HBase):

…DimDimDimFact

SourceStar

SchemaColumn

FamilyRow

Key

row

CColumn

Val

TargetHBase

Storage

MappingCube

MetadataData

Modeling

End

UserCube

ModelerAdminCube:…FactTable:…DimDimDim 15time,

itemtime,

item,

locationtime,

item,

location,

suppliertimeitemlocationsuppliertime,

locationTime,

supplieritem,

locationitem,

supplierlocation,

suppliertime,

item,

suppliertime,

location,

supplieritem,

location,

supplier1-D

cuboids2-D

cuboids3-D

cuboids4-D(base)

cuboid•Base

vs.

aggregate

cells;

ancestor

vs.

descendant

cells;

parent

vs.

child

cells1.2.3.4.5.(9/15,

milk,

Urbana,

Dairy_land)

<time,

item,

location,

supplier>(9/15,

milk,

Urbana,

<time,

item,

location>(*,

milk,

Urbana,

<item,

location>(*,

milk,

Chicago,

<item,

location>(*,

milk,

<item>••OLAP

Cube

–

Balance

between

Space

and

Time

Cuboid

one

combination

dimensions

Cube

all

combination

dimensions

(all

cuboids)

0-D(apex)

cuboidtime,itemtime,item,location16Cube

Build

Job

FlowCubeBuildJobFlow17How

Store

Cube?

–

HBase

SchemaHowToStoreCube?–HBaseSch18

Dynamic

data

management

framework.Formerly

known

Optiq,

Calcite

Apache

incubator

project,

used

byApache

Drill

and

Apache

Hive,

among

others.How

Query

Cube?Query

Engine

–

CalciteDynamicdatamanagementframe19•••••Metadata

SPI

–

Provide

table

schema

from

Kylin

metadataOptimize

Rule

–

Translate

the

logic

operator

into

Kylin

operatorRelational

Operator

–

Find

right

cube

–

Translate

SQL

into

storage

engine

API

call

–

Generate

physical

execute

plan

linq4j

java

implementationResult

Enumerator

–

Translate

storage

engine

result

into

java

implementation

result.SQL

Function

–

Add

HyperLogLog

for

distinct

count

–

Implement

date

time

functions

(i.e.

Quarter)How

Query

Cube?Kylin

Extensions

Calcite•MetadataSPIHowtoQueryCube20Query

Engine

–

Kylin

Explain

PlanSELECT

test_cal_dt.week_beg_dt,test_category.category_name,

test_category.lvl2_name,

test_category.lvl3_name,test_kylin_fact.lstg_format_name,

test_sites.site_name,

SUM(test_kylin_fact.price)

GMV,

COUNT(*)

TRANS_CNTFROM

test_kylin_factLEFT

JOIN

test_cal_dt

test_kylin_fact.cal_dt

test_cal_dt.cal_dtLEFT

JOIN

test_category

test_kylin_fact.leaf_categ_id

test_category.leaf_categ_id

AND

test_kylin_fact.lstg_site_id

=test_category.site_idLEFT

JOIN

test_sites

test_kylin_fact.lstg_site_id

test_sites.site_idWHERE

test_kylin_fact.seller_id

123456OR

test_kylin_fact.lstg_format_name

’New'GROUP

test_cal_dt.week_beg_dt,

test_category.category_name,

test_category.lvl2_name,

test_category.lvl3_name,test_kylin_fact.lstg_format_name,test_sites.site_nameOLAPToEnumerableConverterOLAPProjectRel(WEEK_BEG_DT=[$0],

category_name=[$1],

CATEG_LVL2_NAME=[$2],CATEG_LVL3_NAME=[$3],LSTG_FORMAT_NAME=[$4],

SITE_NAME=[$5],

GMV=[CASE(=($7,

0),

null,

$6)],

TRANS_CNT=[$8])OLAPAggregateRel(group=[{0,

5}],

agg#0=[$SUM0($6)],

agg#1=[COUNT($6)],

TRANS_CNT=[COUNT()])

OLAPProjectRel(WEEK_BEG_DT=[$13],

category_name=[$21],

CATEG_LVL2_NAME=[$15],

CATEG_LVL3_NAME=[$14],LSTG_FORMAT_NAME=[$5],

SITE_NAME=[$23],

PRICE=[$0])

OLAPFilterRel(condition=[OR(=($3,

123456),

=($5,

’New'))])OLAPJoinRel(condition=[=($2,

$25)],

joinType=[left])OLAPJoinRel(condition=[AND(=($6,

$22),

=($2,

$17))],

joinType=[left])OLAPJoinRel(condition=[=($4,$12)],

joinType=[left])OLAPTableScan(table=[[DEFAULT,

TEST_KYLIN_FACT]],

fields=[[0,1,

10,

11]])OLAPTableScan(table=[[DEFAULT,

TEST_CAL_DT]],

fields=[[0,1]])OLAPTableScan(table=[[DEFAULT,

test_category]],

fields=[[0,1,

8]])OLAPTableScan(table=[[DEFAULT,

TEST_SITES]],

fields=[[0,1,

2]])QueryEngine–KylinExplainP21

Plugin-able

storage

engine

Common

iterator

interface

for

storage

engineIsolate

query

engine

from

underline

storage

Translate

cube

query

into

HBase

table

scan

Columns,

Groups

Cuboid

IDFilters

Scan

Range

(Row

Key)Aggregations

Measure

Columns

(Row

Values)

Scan

HBase

table

and

translate

HBase

result

into

cube

result

HBase

Result

(key

value)

Cube

Result

(dimensions

measures)How

Query

Cube?Storage

EnginePlugin-ablestorageengineCo22

Curse

dimensionality:

dimension

cube

has

cuboid

Full

Cube

vs.

Partial

Cube

Hugh

data

volume

Dictionary

EncodingIncremental

BuildingHow

Optimize

Cube?Cube

OptimizationCurseofdimensionality:Ndi23

Full

Cube

Pre-aggregate

all

dimension

combinations“Curse

dimensionality”:

dimension

cube

has

cuboid.

Partial

Cube

avoid

dimension

explosion,

divide

the

dimensions

intodifferent

aggregation

groups

2N+M+L

For

cube

with

dimensions,

divide

these

dimensions

into

3group,

the

cuboid

number

will

reduce

from

Billion

Thousands

230

210

Tradeoff

between

online

aggregation

and

offline

pre-aggregationHow

Optimize

Cube?Full

Cube

vs.

Partial

CubeFullCubePre-aggregatealld24How

Optimize

Cube?Partial

CubeHowtoOptimizeCube?PartialC25

Data

cube

has

lost

duplicated

dimension

valuesDictionary

maps

dimension

values

into

IDs

that

will

reduce

the

memory

and

storagefootprint.Dictionary

based

TrieHow

Optimize

Cube?Dictionary

EncodingDatacubehaslostofduplica26How

Optimize

Cube?Incremental

BuildHowtoOptimizeCube?Increment27CubeInvertedIndexStorageformatPre-aggregatedcuboidsSharding,columnarstorage,withinvertedindexonrowblocksQuerymethodCuboidscanningMassiveparallelprocessingStrengthPre-aggregatehugehistoricdatatosmallsummariesSwiftresponsetoreal-timedataWeaknessTaketimetobuildSlowatscanninglargedatavolumeStreaming,

ongoing

effort

Cube

great,

but…

Sometimes

want

drill

down

row

level

informationCube

takes

time

build,

how

about

real-time

analysis?

Streaming

with

inverted

indexCubeInvertedIndexStorageformat28streamingKarfkahourly/dailybatchminutes

batch

Inverted

IndexReal-time

StoreKylin

0.8,

Lambda

Architecture

SQL

Query

Hybrid

Storage

Interface

CubeHistoric

StorestreamingKarfkahourly/dailybat29http://kylin.ioAgenda

What’s

Apache

Kylin?Tech

HighlightsPerformanceRoadmapQ&Ahttp://kylin.ioAgendaWhat’sA30Kylin

vs.

Hive#QueryTypeReturn

DatasetQueryOn

Kylin

(s)QueryOn

Hive

(s)Comments1High

LevelAggregation40.129157.4371,217

times23Analysis

QueryDrill

Down

toDetail22,669325,0291.61512.058109.206113.12368

times9

times4Drill

Down

toDetail524,78022.426383.21278

times5Data

Dump972,00249.054N/A100

0200150SQL

#1SQL

#2SQL

#3HiveKylinHighLevelAggregatio

nAnalysis

QueryDrillDownto

DetailLow

LevelAggregatio

nTransactio

LevelBased

12+B

records

caseKylinvs.Hive#QueryTypeReturn31Performance

ConcurrencyLinear

scale

out

with

nodesPerformance--ConcurrencyLine32Performance

Query

Latency90%

queries

<5sGreen

Line:

90%tile

queriesGray

Line:

95%tile

queriesPerformance-QueryLatency90%33http://kylin.ioAgenda

What’s

Apache

Kylin

人人文库> 全部分类> 教育资料 > 辅导培训

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

ApacheKylin在大数据系统中应用课件

文档简介

温馨提示

最新文档

评论

ApacheKylin在大数据系统中应用课件

文档简介

温馨提示

最新文档

评论

相关文档