




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
©PythianServicesInc2022|1
FindingtheHiddenValueinDataLakes
JoeyJablonski
Pythian
VPAnalytics
2022-05-18
©PythianServicesInc2022|2
May18,2022
2:00PM-2:45PMEDT
B203:DrillingDownonDataLakeArchitecture
2:00PM-2:45PM
Newwaystostoredataandleverageitindifferentwaysarebeingutilizedbydata-drivenorganizationshungryforflexibilityandscale.
FindingtheHiddenValueinDataLakes
JoeyJablonski
LongassociatedwithHadoop,inacloudworldthedatalakeisoftenignoredinfavorofitsmorefashionablecousins—mesh,fabric,lakehouse,andwarehouse.Butignorethedatalakeatyourperil,asithasanimportantroletoplayinanymodernanalyticsstrategy.Jablonskifocusesontheevolvingroleofthedatalakewithaparticularemphasison:Whyadatalakeisacriticalcomponentofanycloudanalyticsproject;theroleofthedatalakeinthebattleoverETLvsELT;theimportanceofmetadataindataplatformdesign;andhowadatalakehelpsdeliverbusinessvalue,notjusttechnicalsuccess.
JoeyJablonski
VPAnalytics
Austin,TX
Data&Analytics
Applications
420+
ExpertsacrosseveryDataDomain&Technology
400+
GlobalCustomers
25
YearsinBusiness
CloudOps
©PythianServicesInc2022|3
©PythianServicesInc2022|4
Single
Repository
Enablethevisionofsinglelocationtolocate&analyzeallcorporatedata
Scalability
Ensurethattechnologyisneverthelimitingfactortocorporatestrategy.
WhyweBuildDataLakes
DataLakeValueMeasures
Data
Monetization
SupportbusinessobjectivestocreatenewrevenuestreamsandlowerOPEX.
Data
Quality
Simplifiedsetoftoolstomeasuredataquality,reportandimprove.
ModernCloudDataPlatforms
aretheenablerforinsights
(BI),predictions(ML)and
productactivation
(orchestration)andcreation
(AppDev)acrossALL
datasources.
Modern
Applications/SaaS
startwithmodern,oftencloud-nativedatabases.
Operational
Excellence
Howthedataestateisevolving
BusinessTransformation
DEFENSEOFFENSE
Dataisthedriverofinnovationtransformation
Datapowersthesoftwarethat
drivesthebusiness
Traditional
OnPremise
EnterpriseApps
i.e.Oracle,SAPetc
slowlymoving
toCloud,dragging
datawiththem.
and
TraditionalData
Warehousesare
beingreplacedwith
modernclouddata
platforms.
©PythianServicesInc2022|5
©PythianServicesInc2022|6
WhyDataLakeProjectsFail
1
©PythianServicesInc2022|7
Designyour
dataplatform
tosupportyourdata
strategy.
©PythianServicesInc2022|8
Buildyour
governance
programstoenable
yourdatalake.
2
●DataGovernanceensureswedonotcreateadataswamp
●DataGovernanceensuresriskismanagedwithintheplatform&consumerbase
●Governancecreatesstructureformanagingchange
©PythianServicesInc2022|9
DataPlatformFunctionalComponents
Ensureevery
feature&
investment
hasavocal
business
champion.
AdWords,
DoubleClick,
YouTube
GoogleTransferService
Support
●24/7monitoring
●Backup/restore
●DataServices
●Updates
PowerUsers
Tools:Looker,SQL
SAAS
MA,CRM,Social
Media,etc
Encryption
Enterprise
RDBMS,
File/CSV,
JSON
DataScientists
BigQueryML
Data
Marts
Data
Marts
IoTGateway
API
AppEngine
ML
Exploration
Datalab,
CloudML,
TensorFlow,
Keras
Machine
Learning
Operations
DataEngineer
Tools:Spark,Dataflow,SQL
ActivateviaotherSystems
MarketingAutomationsystems,GoogleAdManager,etc
DataScientists
Tools:Python,TensorFlow,R,Notebooks
3
GoogleBigQuery
Real-time/Streaming
Pub/Sub
DataManagement
Integration,
Deduplication,
Transformation,
Cleaning,Encryption,
Scheduling
GoogleDatafloworDataProc,DataFusion
Infrastructure
unstructured
data)
CurationLayer
DataCaptureStaging
GoogleCloudStorage
(structuredand
DataLakeStorage
GoogleCloudStorage
MarketingAnalyticsPlatformonGoogleCloud
Metadata/API
CloudDatastore,KubernetesEngine
CasualUsers
Reports,Dashboards,Excel
4
©PythianServices
Inc2022|10
Makeitmodular–
Iguaranteeyou’ll
bechangingout
componentswithin
ayear.
Avoidanalysis
paralysiswhen
pickingdatalake
components.
5
●Leveragerapidprototyping&pilotstoverifyfunctionalcapabilitiesoftools
●Acceptthatthelifespanoftoolsisgettingshorterandchangeisinevitable
●Thinkofthesedecisionsastwo-waydoors
©PythianServicesInc2022|11
Imagecredit;
AaronColcord
©PythianServicesInc2022|12
Decoupleprocessstepsfrom
technicalimplementations.
6
●AssumethatdifferentBUswillrequiredifferentdataconsumptionmethods
●Assumetechnologywillevolve
●Decouplebusinessprocessanddatatransformationfromunderlyingtechnologiestobetteracceleratefutureevolution
Acceptthatconsumersofdatawillchange&evolve
overtime
Decouplingtransformationfromconsumptionallows
differentconsumerstochoosedifferenttools
EDWshaveaplacefordataconsumptionandhigh
costsfortransformationworkloads
●
●
●
Matchyour
transformation
patternstoyour
flexibility&
operationalneeds.
7
©PythianServicesInc2022
|13
©PythianServicesInc2022|14
Metadata-storeit,
manageit,govern
it,useit!
8
DescriptiveStructural
Showdata
organizationand
relationships.
Usedfor
identification&
discoveryofdata
sets.
AdministrativeReferenceStatistical
Datadistribution,
outliersandrecord
counts.
Businessprocess
detailsabout
creation,destruction,
andSMEs.
Keyreferral
elementsaboutdata
quality&
consumption.
©PythianServicesInc2022|15
DeployMLOps
capabilityadjacent
todataplatformsto
enableexplorative
&productiondata
scienceworkloads.
9
©PythianServicesInc2022|16
DataQuality=
UserTrust=Data
Adoption.
10
©PythianServicesInc2022|17
ThepromiseofDataLakescanbeachieved.
Clos
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 材料作文“我不由得加快了脚步”(2022年广东省中考满分作文13篇附审题指导)
- 关于医院实习总结6篇
- 幼儿园中班多元文化主题活动计划
- 小学数学智能教学工具应用心得体会
- 怀远县2024中考数学试卷
- 教师岗位职责与“五项管理”承诺书范文
- 工业和信息化统计工作管理办法
- 今年潮州中考数学试卷
- 幼儿园教职工安全操作责任书范文
- 志愿者团队存在问题及整改措施
- 二零二五版军人离婚协议书军人职业发展与子女抚养协议
- 反诈知识宣传课件
- 项目生产工程管理办法
- 保密违法违规行为处分建议办法
- 2025年社区工作者招聘考试(公共基础知识)经典试题及答案
- 薪酬绩效课程培训
- 特种设备安全监察条例培训
- 行政事业单位报销培训
- 2025至2030土豆行业项目调研及市场前景预测评估报告
- 金螳螂培训课件
- 多发伤患者的急救与护理
评论
0/150
提交评论