FFA2024分论坛-核心技术 合辑_第1页
FFA2024分论坛-核心技术 合辑_第2页
FFA2024分论坛-核心技术 合辑_第3页
FFA2024分论坛-核心技术 合辑_第4页
FFA2024分论坛-核心技术 合辑_第5页
已阅读5页,还剩375页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

=====TaskYarn/K8s启动作业耗时:TaskTaskTaskTaskTaskTask•jobmanager.scheduler:adaptive默认(Debug)状态存储AutoScalerStateStore事件通知AutoScalerEventHandlerKubernetesEvent执行扩缩容ScalingRealizerSpecRescaleApiRescaleApi•erval=1h80jobmanager.adaptive-scheduler.[19](前缀)jobmanager.adaptive-scheduler.[20]•rescale-trigger.max-checkpoint-failures:2••AdaptiveSchduler支持resource-stabilizTHANKYOU1./confluence/display/FLINK/FLIP-271%3A+Autoscaling2./flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autoscaler/3./apache/flink-kubernetes-operator4./flink/flink-kubernetes-operator-docs-release-1.10/docs/operations/configuration/#autoscaler-configuration5./confluence/display/FLINK/FLIP-138%3A+Declarative+Resource+management6./confluence/display/FLINK/FLIP-160%3A+Adaptive+Scheduler7./confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Managem8./flink/flink-docs-master/docs/deployment/elastic_scaling/#adaptive-scheduler9./thread/pvfb3fw99mj8r1x8zzyxgvk4dcppwssz10./thread/pr0r8hq8kqpzk3q1zrzkl3rp1lz24v7v11./confluence/display/FLINK/FLIP-12./flink/flink-kubernetes-operator-docs-release-1.10/docs/custom-resource/autoscaler/#autoscaler-standalone13./flink/flink-kubernetes-operator-docs-release-1.10/docs/operations/configuration/#autoscaler-standalone-configuration14./jira/browse/FLINK-3345215./jira/browse/FLINK-3585116./jira/browse/FLINK-3603917./jira/browse/FLINK-3601818./jira/browse/FLINK-3653519./flink/flink-docs-master/docs/deployment/config/#advanced-scheduling-options20./confluence/display/FLINK/FLIP-21./confluence/display/FLINK/FLIP-22./confluence/display/FLINK/FLIP-tiveScheduler23./jira/browse/FLINK-3675324./jira/browse/FLINK-3397725./jira/browse/FLINK-3652726./jira/browse/FLINK-3619227./flink/flink-kubernetes-operator-docs-release-1.10/docs/custom-resource/autotuning/28./jira/browse/FLINK-3581429./confluence/display/FLINK/FLIP-30./jira/browse/FLINK-2225831./jira/browse/FLINK-35765backlog=5backlogbacklog=5backlog=47654543213765454321321credit=0creditcredit=0backlog=1backlogbacklog=1765476547654credit=3credit=0 SchedulerPipelinedShuffleBlockingShuffleTaskManagerMemoryTierTaskManagerConsumerAgentMemoryProducerAgentConsumerAgentMemoryProducerAgentNettyServiceDiskNettyServiceConsmerAgentProducerAgentConsumerAgentRemoteConsumerAgentProducerAgentProducerAgentTier1.内存Tier1Tier2Tier2Tier3.Tier3 Seg3Seg3CelebornClientJobEventJobEventStoreMonitorJMstatusFoundMonitorJMstatusFoundJMfailedCelebornClientCelebornClientJobEventJobEventStore WriteJobEventReliableWriteJobEventReliableStorage(HDFS,Zookeeper…)ReliableStorage(HDFS,ZK...)ReliableReliableStorage(HDFS,ZK...)THANKYOU在多方技术协同发展下,向量化显著提高软件性能,并在大数据领域广泛应用硬件支持(更长更多的寄存器)编译器优化(gcc/llvmgandia)类库支持:xsimd技术方案Databircks闭源Arrow-datafusioncomet/BlazeGluten-Trino/PrestissimoNative算子层PlanConversion层数据转换层Fallback层统一的内存管理层对于SQLtimestamp/decimal类型fallback机制lllGenericRowTHANKYOUIntroducingForStDB,thedisaggregatedstaInputsInputs•FLIP-423存算分离总体介绍CallbacksAsyncStateTask!!BatchingTask!!BatchingStatebackendCP-2CP-1SCP-1SRecordsCallbacks(Async,In(Async,Inbatch) RecordsCallbacksRecords21221211StateExecutorActive1StateExecutorActive1Blocking3 BatchingInput1Input2uuuIn-flightdatasnapshotBarrierTaskTaskuuuuOutputStateBackend TaskTaskStateBackendForStreamingDB=ForStDBForStreamingDB=ForStDB/ververica/ForStMemtableIkUnifiedFileSystemLayerCacheonlocaldiskForStJNIForStJNIStateBackend(Flink)FileCacheStateBackendStateBackend--Throughput(k/s)502.01cache)WordNumber:300MWordNumber:300MJobparallelism:1Throughput(k/s)0Jobparallelism:8Jobparallelism:8THANKYOUBreaktheLimitationofWatermark:ImplementingProgressAwarenessandAutomaticAlignmentin(across)FlinkJobsLeftStateLeftStateJoinJoinRightStateRightStateLookupLookupSourceBSourceBAlignmentEventSourceCoordinatorSourceASourceCoordinatorSourceALookupSinkSourceALookupSinkSourceAPublisher ChannelSubscriberPublisher ChannelSubscriber…………JobJobA,10:08JobJobA,10:08SourceSinkSinkJobManagerOperatorCoordinatorPublisher字段粒度元数据Col1 Col2 Col3 Col…AsyncAsyncLookupSubscriber持续阻塞可能触发AsyncLookupAsyncAsyncLookupSinkREBALANCESourceASinkREBALANCESourceA字段粒度元数据字段粒度元数据Col1JobA,10:08Col2JobA,10:08JobB,10:09Col3JobB,10:09Col…HeartbeatJobA,10:05SinkSinkSource10:08:2510:08:2110:08:2510:08:2110:08:22A1A110:48:5210:47:2110:47:21A210:46:3710:46:37THANKYOUBreakingtheStateRecoveryDilemma:FurtherAddressingStateCompatibilitywithintheFlinkSQLOperator袁奎状态恢复困境在checkpoint/savepoint时RowDataSerializerRowDataRowDataSerializer输入(gender,name,age)状态1234+I[M,Jesse,14]否是是是否是是是44ASYNCASYNCTHANKYOUApacheFury序列化框架简介Fury替换Flink/Kryo序列化BinaryRow,BinaryRow,BinaryStringUDAFState:UDAFState:BinaryRowBinaryRowBinaryRowBinaryRowDataStream作业DataStream任务DataStream任务FuryStringSerializerFuryStringSerializerFlinkStringSerializerFlinkPojoSerializerGeneratedObjectSerializerFlinkKryoSerializerFurySerializerFlinkPojoSerializerGeneratedObjectSerializerFlinkKryoSerializerFurySerializerJavaScala⼤规模数据传输AI在线服务搜索推荐云原⽣中间件JavaScala⼤规模数据传输AI在线服务搜索推荐云原⽣中间件JavaScript计算分析Fury替换Flink/Kryo序列化array,numpy,pyarrow.Tablearray,numpy,pyarrow.Table,JavaJavaheader;•通过代码生成减少内存访问和虚方法调用开销;•支持多态类型优化,运行时生成多份代码,动态分发执行;•ChunkbychunkMapencoding:NodeJSCodegen完成,Java开发中;THANKYOUApacheFlinkContribuStreamGraphStreamGraphAJobGraphJobGraph算子层面对外暴露的信息逐步减少ExecutionGraph计算逻辑计算逻辑均衡数据分发特定算子逻辑优化均衡数据分发特定算子逻辑优化数据传输方式优化并行度数据传输方式(数据传输方式(shuffle)数据划分方式(partition)基于StreamGraph的作业提交JobGraphStreamStreamGraphStreamJobGraphStreamStreamGraphStreamSchedulerSchedulerAdaptiveGraphExecutionPlan:通用的执行计划表达StreamGraphStreamStreamGraphStreamSchedulerAdaptiveGraphimplementpublicpublicinterfaceExecutionPlan{…}动态逻辑执行计划:渐进式构建JobGraphStreamGraphJobGraphHashforwardIntermediateHashforwardIntermediateIntermediate.JobVertex的创建时机:1.Source节点将在作业调度前被创建,为作业的启动提供起点。2.其余节点将在其上游所有节点完成后生成,此时可以获取到准确FLIP-469的输入数据的信息,来帮助做出更好的优化决策。JobEventStreamGraphOptimizerOptimizationStrategyAdaptiveBroadcastJoinStrategyAdaptiveExecutionHandlerJobEventStreamGraphOptimizerOptimizationStrategyAdaptiveBroadcastJoinStrategyAdaptiveBatchSchedulerNotifyjobvertexfinishedTryupdateTryupdateAdaptiveGraphManagerStreamGraphBroadcastBroadcastForwardIntermediateForwardIntermediateRescaleSourceforwardRescaleSourceforwardFilterIntermediateBroadcastNewWebUINewWebUI整体优化流程RuntimeTablePlannerRuntimeExecutionGraphExecutionGraphLogicalRelnodeTree1.TablePlannerLogicalRelnodeTreeProcessorsStreamGraphOptimizatiProcessorsStreamGraphOptimizatiPhysicalRelnodeTreeExecNodeDAGExecNodeDAG3.AdaptiveBatchSchedulStreamGraphAdaptiveJoinForwardHashExchangeStreamGraphAdaptiveJoinForwardHashExchangeTransformationAdaptiveBroadcastJoinOptimizationFilterFilterScanScanScanScanTable2Table1Table1table.optimizer.join.broadcast-threshold:10MBAdaptiveBroadcastJoinOptimizationBroadcastBroadcastBroadcastForwardForwardFilterFilterFilterScanScanScanScanScanScanScanScanFinishedStageFinishedStageFinishedStageFinishedStageTable1Table2Table1Table2Table2Table1Table2Table1table.optimizer.join.broadcast-threshold:10MBtable.optimizer.adaptive-broadcast-join.strategy:AUTOAdaptiveBroadcastJoinOptimizationBroadcastBroadcastBroadcastForwardBroadcastForwardFilterFilter2.小表侧产出的分区是HashPartitioner产出分ScanScanScanScanFinishedStageFinishedStageFinishedStageTable2Table2Table1AdaptiveBroadcastJoinOptimizationSchedulerScanTaskScanTaskBroadcastJoin TaskHashScanTaskdeployForwardsubpartitionsRescalePreferredSchedulerScanTaskScanTaskBroadcastJoin TaskHashScanTaskdeployForwardsubpartitionsRescalePreferredTaskManagerScanScaninputChannelsFinishedStageFinishedStage…FullFilledScanScaninputChannelsFinishedStageFinishedStage…FullFilledTaskInputlocality优化Forward边优化为Rescale边Inputlocality优化SkewedJoinOptimizationVertexVertex1Task1Task2Task3Vertex2Task1Task2Task3Vertex3Task1Task2Task3Vertex2Task1Task2Task3Vertex3Task1Task2Task3SkewedJoinOptimizationVertex1Task1TaskVertex1Task1Task2Task3Vertex1Task1Task2Task3Vertex3(InnerJoin)Task1Task2Vertex2Task1Task2Task3Vertex2Task1Task2Task3Task3Vertex3(InnerJoin)Task1Vertex3(InnerJoin)Task1Task2Vertex2Task1Task2Task3Vertex2Task1Task2Task3Task3Vertex3(InnerJoin)Task1Task2Task3Task4table.optimizer.skewed-join-optimization.strategy:AUTOtable.optimizer.skewed-join-optimization.skewed-factor:4table.optimizer.skewed-join-optimization.skewed-threshold:256mVertexVertexAllToAllAllToAllVertexVertexAllToAllAllToAllVertexVertex1.IntraInputCorrelation(内部关联):同一个keygroupVertexDownstreamDownstreamVertex2.InterInputCorrelation(外部关联):多个Input之间存在数据均衡分发InterInputCorrelation:trueIntraInputCorrelation:true 1 11 11 DownstreamTask1DownstreamDownstreamTask2ALL_TOALL_TO_ALL IntraInputCorrelation:true DownstreamTask33 DownstreamTask331 33 subpartition1subpartition2subpartition3V:theamountofdatauserexpectseachtasktoprocess数据均衡分发InterInputCorrelation:trueIntraInputCorrelation:falseALL_TO_ALL 1 1111 11111111 1111 DownstreamTask1DownstreamDownstreamTask211 IntraInputCorrelation:true 12222222222DownstreamDownstreamTask3 33333 数据均衡分发InterI

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论