2022 数据峰会 数据湖 -深入研究数据湖架构_第1页
2022 数据峰会 数据湖 -深入研究数据湖架构_第2页
2022 数据峰会 数据湖 -深入研究数据湖架构_第3页
2022 数据峰会 数据湖 -深入研究数据湖架构_第4页
2022 数据峰会 数据湖 -深入研究数据湖架构_第5页
已阅读5页,还剩25页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

©PythianServicesInc2022|1

FindingtheHiddenValueinDataLakes

JoeyJablonski

Pythian

VPAnalytics

2022-05-18

©PythianServicesInc2022|2

May18,2022

2:00PM-2:45PMEDT

B203:DrillingDownonDataLakeArchitecture

2:00PM-2:45PM

Newwaystostoredataandleverageitindifferentwaysarebeingutilizedbydata-drivenorganizationshungryforflexibilityandscale.

FindingtheHiddenValueinDataLakes

JoeyJablonski

LongassociatedwithHadoop,inacloudworldthedatalakeisoftenignoredinfavorofitsmorefashionablecousins—mesh,fabric,lakehouse,andwarehouse.Butignorethedatalakeatyourperil,asithasanimportantroletoplayinanymodernanalyticsstrategy.Jablonskifocusesontheevolvingroleofthedatalakewithaparticularemphasison:Whyadatalakeisacriticalcomponentofanycloudanalyticsproject;theroleofthedatalakeinthebattleoverETLvsELT;theimportanceofmetadataindataplatformdesign;andhowadatalakehelpsdeliverbusinessvalue,notjusttechnicalsuccess.

JoeyJablonski

VPAnalytics

Austin,TX

Data&Analytics

Applications

420+

ExpertsacrosseveryDataDomain&Technology

400+

GlobalCustomers

25

YearsinBusiness

CloudOps

©PythianServicesInc2022|3

©PythianServicesInc2022|4

Single

Repository

Enablethevisionofsinglelocationtolocate&analyzeallcorporatedata

Scalability

Ensurethattechnologyisneverthelimitingfactortocorporatestrategy.

WhyweBuildDataLakes

DataLakeValueMeasures

Data

Monetization

SupportbusinessobjectivestocreatenewrevenuestreamsandlowerOPEX.

Data

Quality

Simplifiedsetoftoolstomeasuredataquality,reportandimprove.

ModernCloudDataPlatforms

aretheenablerforinsights

(BI),predictions(ML)and

productactivation

(orchestration)andcreation

(AppDev)acrossALL

datasources.

Modern

Applications/SaaS

startwithmodern,oftencloud-nativedatabases.

Operational

Excellence

Howthedataestateisevolving

BusinessTransformation

DEFENSEOFFENSE

Dataisthedriverofinnovationtransformation

Datapowersthesoftwarethat

drivesthebusiness

Traditional

OnPremise

EnterpriseApps

i.e.Oracle,SAPetc

slowlymoving

toCloud,dragging

datawiththem.

and

TraditionalData

Warehousesare

beingreplacedwith

modernclouddata

platforms.

©PythianServicesInc2022|5

©PythianServicesInc2022|6

WhyDataLakeProjectsFail

1

©PythianServicesInc2022|7

Designyour

dataplatform

tosupportyourdata

strategy.

©PythianServicesInc2022|8

Buildyour

governance

programstoenable

yourdatalake.

2

●DataGovernanceensureswedonotcreateadataswamp

●DataGovernanceensuresriskismanagedwithintheplatform&consumerbase

●Governancecreatesstructureformanagingchange

©PythianServicesInc2022|9

DataPlatformFunctionalComponents

Ensureevery

feature&

investment

hasavocal

business

champion.

AdWords,

DoubleClick,

YouTube

GoogleTransferService

Support

●24/7monitoring

●Backup/restore

●DataServices

●Updates

PowerUsers

Tools:Looker,SQL

SAAS

MA,CRM,Social

Media,etc

Encryption

Enterprise

RDBMS,

File/CSV,

JSON

DataScientists

BigQueryML

Data

Marts

Data

Marts

IoTGateway

API

AppEngine

ML

Exploration

Datalab,

CloudML,

TensorFlow,

Keras

Machine

Learning

Operations

DataEngineer

Tools:Spark,Dataflow,SQL

ActivateviaotherSystems

MarketingAutomationsystems,GoogleAdManager,etc

DataScientists

Tools:Python,TensorFlow,R,Notebooks

3

GoogleBigQuery

Real-time/Streaming

Pub/Sub

DataManagement

Integration,

Deduplication,

Transformation,

Cleaning,Encryption,

Scheduling

GoogleDatafloworDataProc,DataFusion

Infrastructure

unstructured

data)

CurationLayer

DataCaptureStaging

GoogleCloudStorage

(structuredand

DataLakeStorage

GoogleCloudStorage

MarketingAnalyticsPlatformonGoogleCloud

Metadata/API

CloudDatastore,KubernetesEngine

CasualUsers

Reports,Dashboards,Excel

4

©PythianServices

Inc2022|10

Makeitmodular–

Iguaranteeyou’ll

bechangingout

componentswithin

ayear.

Avoidanalysis

paralysiswhen

pickingdatalake

components.

5

●Leveragerapidprototyping&pilotstoverifyfunctionalcapabilitiesoftools

●Acceptthatthelifespanoftoolsisgettingshorterandchangeisinevitable

●Thinkofthesedecisionsastwo-waydoors

©PythianServicesInc2022|11

Imagecredit;

AaronColcord

©PythianServicesInc2022|12

Decoupleprocessstepsfrom

technicalimplementations.

6

●AssumethatdifferentBUswillrequiredifferentdataconsumptionmethods

●Assumetechnologywillevolve

●Decouplebusinessprocessanddatatransformationfromunderlyingtechnologiestobetteracceleratefutureevolution

Acceptthatconsumersofdatawillchange&evolve

overtime

Decouplingtransformationfromconsumptionallows

differentconsumerstochoosedifferenttools

EDWshaveaplacefordataconsumptionandhigh

costsfortransformationworkloads

Matchyour

transformation

patternstoyour

flexibility&

operationalneeds.

7

©PythianServicesInc2022

|13

©PythianServicesInc2022|14

Metadata-storeit,

manageit,govern

it,useit!

8

DescriptiveStructural

Showdata

organizationand

relationships.

Usedfor

identification&

discoveryofdata

sets.

AdministrativeReferenceStatistical

Datadistribution,

outliersandrecord

counts.

Businessprocess

detailsabout

creation,destruction,

andSMEs.

Keyreferral

elementsaboutdata

quality&

consumption.

©PythianServicesInc2022|15

DeployMLOps

capabilityadjacent

todataplatformsto

enableexplorative

&productiondata

scienceworkloads.

9

©PythianServicesInc2022|16

DataQuality=

UserTrust=Data

Adoption.

10

©PythianServicesInc2022|17

ThepromiseofDataLakescanbeachieved.

Clos

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论