版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Deep Dive:How Spark Uses MemorySpark内存使用机制分析议程Memory Usage OverviewMemory ContentionTungsten Memory FormatCache-aware ComputationFuture PlansWhere Spark Uses Memorystorage: memory used to cache data that will be used later.(controlled by memory manager)execution: memory used for computation in shuff
2、les, joins, sorts and aggregations. (controlled by memory manager)others: user data structure, internal metadata, objects created by UDF, etc.4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3, 4, 5,6Iteratortake(3)ExecutionMemoryWhat if I want the sorted values again?4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3,
3、 4, 5,6Iteratortake(3)4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3, 4, 5,6Iteratortake(4).4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3, 4, 5,6IteratorExecutionMemoryCache123456take(3)take(4)take(5)StorageMemoryMemory ContentionHow to arbitrate memory between execution and storage?How to arbitrate memory acr
4、oss tasks running in parallel?How to arbitrate memory across operators running within thesame task?Challenge #1How to arbitrate memory between execution and storage?Easy, static assignment!total available memoryexecutionstorageEasy, static assignment!executionstorageSpill to DiskEasy, static assignm
5、ent!executionstorageEvict LRU blocks to diskInefficient memory usage leads to bad performanceEasy, static assignment!executionstorageExecution can only use a fraction of thememory, even when there is no storage!Easy, static assignment!executionstorageEfficient use of memory required user tuningUnifi
6、ed Memory ManagementexecutionstorageWhat happens if there is alreadystorage?Unified Memory ManagementexecutionstorageEvict LRU blocks to diskUnified Memory ManagementexecutionstorageDesign ConsiderationsWhy evict storage, not execution?Spilled execution data will always be read back from disk, where
7、 as cached data may not.What if the application relies on cache?Unified Memory ManagementexecutionstorageThis is bad!Design ConsiderationsWhy evict storage, not execution?Spilled execution data will always be read back from disk, where as cached data may not.What if the application relies on cache?a
8、llow users to specify a minimum unevictable amount of cacheddata(not a reservation!)Challenge #2How to arbitrate memory across tasks running in parallel?Easy, static assignment!Worker machine has 4 coresEach task gets of the totalmemoryTask1Task 2Task 3Task 4Dynamic AssignmentTask 1The share of each
9、 task depends on thenumber of actively running tasksDynamic AssignmentTask 1The share of each task depends on thenumber of actively running tasksDynamic AssignmentTask 1Now another task comes along, the first task have to spill to free up memoryDynamic AssignmentEach task is now assigned of thetotal
10、 memoryTask 1Task 2Dynamic AssignmentEach task is now assigned of thetotal memoryTask1Task 2Task 3Task 4Dynamic AssignmentTask 3Last remaining task gets all the memoryStatic vs Dynamic AssignmentBoth are fair and starvation freeStatic Assignment is simplerDynamic assignment handles stragglers better
11、Challenge #3How to arbitrate memory across operators running within the same task?SELECT age,AVG(height) FROM students GROUP BY ageORDER BYAVG(height)students.groupBy(age).avg(height).orderBy(avg(height).collect()ScanProjectAggregateSortScanProjectAggregateSortThe task has 6pages of memoryScanProjec
12、tAggregateSortMap / age (total, count)20 (483, 3)21 (935, 5)22 (172, 1)ScanProjectAggregateSortAll 6 pages were used by Aggregate, leaving no memory for Sort!ScanProjectAggregateSortSolution #1Reserve a page for each operatorScanProjectAggregateSortSolution #1Reserve a page for each operatorStarvati
13、on free, but still not fairWhat if there were more operators?ScanProjectAggregateSortSolution #2Cooperative spillingScanProjectAggregateSortSolution #2Cooperative spillingScanProjectAggregateSortSolution #2Cooperative spillingSort forces Aggregate to spill a page to free memoryScanProjectAggregateSo
14、rtSolution #2Cooperative spillingSort needs more memory so it forces Aggregate to spill another page(and so on)ScanProjectAggregateSortSolution #2Cooperative spillingSort finishes with 3 pagesAggregate does not have to spill its remaining pagesRecap: three source of contentionHow to arbitrate memory
15、 between execution and storage?across tasks running in parallel?across operators running with the same task?Instead of statically reserving memory in advance, deal with memory contention when it raises by forcing members to spillHow Spark keep data in memoryPut data as objects on the heap and operat
16、e on theseobjects.Data caching is simply using a list to keep data objects.Data objects? No!It is hard to monitor and control the memory usage whenwe have a lot of objects.Garbage collection will be the killer.Java objects has notable space overhead.High serialization cost when transfer data inside
17、cluster.Keep data as binary and operate onbinarydatasource1010100001010001000100100110100010001111010100100101binary formatCacheManagerMemory Manageroperatorsallocatememorymemory pagesallocatememorymemory pagesJava Objects Based Row FormatBoxedInteger(123)String(“data”)String(“bricks”)RowArray5+ obj
18、ectshigh space overheadslow value accessingexpensive hashCode()Efficient Binary Format“bricks”0 x01233240“data”46null tracking(123, “data”, “bricks”)offset and length ofoffset and length ofdataEfficient Binary Format0 x1220344“data”0 x1220344“data”0 x1062“da”123 JSONfilessubstringHow to process bina
19、ry data more efficient?Understanding CPU CacheMemory is becoming slower and slower than CPU, we should keep the frequently accessed data in CPU cache.Understanding CPU CachePre-fetch data into CPU cache, with cache line boundary.The most 2 important techniques in big data are .Sort and Hash!Naive So
20、rtrecordrecordrecordpointerpointerpointerNaive SortrecordrecordrecordpointerpointerpointerNaive SortrecordrecordrecordpointerpointerpointerNaive SortrecordrecordrecordpointerpointerpointerNaive SortEach comparison needs to access 2 different memory regions, which makes it hard for CPU cache to pre-f
21、etch data, poor cache locality!Cache-aware Sortrecordrecordrecordptrptrptrkey-prefixkey-prefixkey-prefixCache-aware Sortrecordrecordrecordptrptrptrkey-prefixkey-prefixkey-prefixCache-aware Sortrecordrecordrecordptrptrptrkey-prefixkey-prefixkey-prefixCache-aware SortMost of the time, just go through
22、the key-prefixes in alinear fashion, good cache locality!Naive Hash MappointerpointerpointerkeyvaluekeyvaluekeyvalueNaive Hash MappointerpointerpointerkeyvaluekeyvaluekeyvaluekeylookupNaive Hash Mappointerpointerkeyvaluekeyvaluekeyvaluekeypointerhash(key) %sizeNaive Hash Mappointerpointerkeyvaluekey
23、valuekeyvaluekeypointercompare these 2 keysNaive Hash Mappointerpointerpointerkeyvaluekeyvaluekeyvaluekeylinear probingNaive Hash Mappointerpointerpointerkeyvaluekeyvaluekeyvaluekeycompare these 2keysNaive Hash MapEach lookup needs many pointer dereferences and key comparison when hash collision happens, and jumps between 2 memory regions, bad cache locality!ptrptrptrfull hashfull hashfull hashCache-aware Hash Mapkeyvaluekeyvaluekeyvalueptrptrptrfull hashfull hashfull hashCache-aware Hash Mapkeyvaluekeyvaluekeyvaluekeylookupptrptrptrfull hashfull hashfull hashCache-aware Hash Mapkeyv
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025-2030年中国煤炭检测实验分析仪器行业全国市场开拓战略制定与实施研究报告
- 2025-2030年中国燃料电池电堆行业商业模式创新战略制定与实施研究报告
- 新形势下物业管理行业可持续发展战略制定与实施研究报告
- 2025-2030年中国有限服务酒店行业开拓第二增长曲线战略制定与实施研究报告
- 2025-2030年中国温度控制器行业并购重组扩张战略制定与实施研究报告
- 2025-2030年中国汗蒸馆行业开拓第二增长曲线战略制定与实施研究报告
- 市政道路工程监理竣工质量评估报告
- 学校修建综合楼申请报告
- 四年级数学(小数加减运算)计算题专项练习与答案
- 皮草面料知识培训课件
- 银行反恐应急预案及方案
- 关于推某某同志担任教育系统实职领导职务的报告(职务晋升)
- 2023消防安全知识培训
- 邻近铁路营业线施工安全监测技术规程 (TB 10314-2021)
- Exchange配置与规划方案专项方案V
- 三年级上册脱式计算练习200题及答案
- 新生儿腭裂护理查房课件
- 二年级下册科学课程纲要
- 前交叉韧带重建术后康复训练
- 河南近10年中考真题数学含答案(2023-2014)
- 八年级上学期期末家长会课件
评论
0/150
提交评论