




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Deep Dive:How Spark Uses MemorySpark内存使用机制分析议程Memory Usage OverviewMemory ContentionTungsten Memory FormatCache-aware ComputationFuture PlansWhere Spark Uses Memorystorage: memory used to cache data that will be used later.(controlled by memory manager)execution: memory used for computation in shuff
2、les, joins, sorts and aggregations. (controlled by memory manager)others: user data structure, internal metadata, objects created by UDF, etc.4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3, 4, 5,6Iteratortake(3)ExecutionMemoryWhat if I want the sorted values again?4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3,
3、 4, 5,6Iteratortake(3)4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3, 4, 5,6Iteratortake(4).4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3, 4, 5,6IteratorExecutionMemoryCache123456take(3)take(4)take(5)StorageMemoryMemory ContentionHow to arbitrate memory between execution and storage?How to arbitrate memory acr
4、oss tasks running in parallel?How to arbitrate memory across operators running within thesame task?Challenge #1How to arbitrate memory between execution and storage?Easy, static assignment!total available memoryexecutionstorageEasy, static assignment!executionstorageSpill to DiskEasy, static assignm
5、ent!executionstorageEvict LRU blocks to diskInefficient memory usage leads to bad performanceEasy, static assignment!executionstorageExecution can only use a fraction of thememory, even when there is no storage!Easy, static assignment!executionstorageEfficient use of memory required user tuningUnifi
6、ed Memory ManagementexecutionstorageWhat happens if there is alreadystorage?Unified Memory ManagementexecutionstorageEvict LRU blocks to diskUnified Memory ManagementexecutionstorageDesign ConsiderationsWhy evict storage, not execution?Spilled execution data will always be read back from disk, where
7、 as cached data may not.What if the application relies on cache?Unified Memory ManagementexecutionstorageThis is bad!Design ConsiderationsWhy evict storage, not execution?Spilled execution data will always be read back from disk, where as cached data may not.What if the application relies on cache?a
8、llow users to specify a minimum unevictable amount of cacheddata(not a reservation!)Challenge #2How to arbitrate memory across tasks running in parallel?Easy, static assignment!Worker machine has 4 coresEach task gets of the totalmemoryTask1Task 2Task 3Task 4Dynamic AssignmentTask 1The share of each
9、 task depends on thenumber of actively running tasksDynamic AssignmentTask 1The share of each task depends on thenumber of actively running tasksDynamic AssignmentTask 1Now another task comes along, the first task have to spill to free up memoryDynamic AssignmentEach task is now assigned of thetotal
10、 memoryTask 1Task 2Dynamic AssignmentEach task is now assigned of thetotal memoryTask1Task 2Task 3Task 4Dynamic AssignmentTask 3Last remaining task gets all the memoryStatic vs Dynamic AssignmentBoth are fair and starvation freeStatic Assignment is simplerDynamic assignment handles stragglers better
11、Challenge #3How to arbitrate memory across operators running within the same task?SELECT age,AVG(height) FROM students GROUP BY ageORDER BYAVG(height)students.groupBy(age).avg(height).orderBy(avg(height).collect()ScanProjectAggregateSortScanProjectAggregateSortThe task has 6pages of memoryScanProjec
12、tAggregateSortMap / age (total, count)20 (483, 3)21 (935, 5)22 (172, 1)ScanProjectAggregateSortAll 6 pages were used by Aggregate, leaving no memory for Sort!ScanProjectAggregateSortSolution #1Reserve a page for each operatorScanProjectAggregateSortSolution #1Reserve a page for each operatorStarvati
13、on free, but still not fairWhat if there were more operators?ScanProjectAggregateSortSolution #2Cooperative spillingScanProjectAggregateSortSolution #2Cooperative spillingScanProjectAggregateSortSolution #2Cooperative spillingSort forces Aggregate to spill a page to free memoryScanProjectAggregateSo
14、rtSolution #2Cooperative spillingSort needs more memory so it forces Aggregate to spill another page(and so on)ScanProjectAggregateSortSolution #2Cooperative spillingSort finishes with 3 pagesAggregate does not have to spill its remaining pagesRecap: three source of contentionHow to arbitrate memory
15、 between execution and storage?across tasks running in parallel?across operators running with the same task?Instead of statically reserving memory in advance, deal with memory contention when it raises by forcing members to spillHow Spark keep data in memoryPut data as objects on the heap and operat
16、e on theseobjects.Data caching is simply using a list to keep data objects.Data objects? No!It is hard to monitor and control the memory usage whenwe have a lot of objects.Garbage collection will be the killer.Java objects has notable space overhead.High serialization cost when transfer data inside
17、cluster.Keep data as binary and operate onbinarydatasource1010100001010001000100100110100010001111010100100101binary formatCacheManagerMemory Manageroperatorsallocatememorymemory pagesallocatememorymemory pagesJava Objects Based Row FormatBoxedInteger(123)String(“data”)String(“bricks”)RowArray5+ obj
18、ectshigh space overheadslow value accessingexpensive hashCode()Efficient Binary Format“bricks”0 x01233240“data”46null tracking(123, “data”, “bricks”)offset and length ofoffset and length ofdataEfficient Binary Format0 x1220344“data”0 x1220344“data”0 x1062“da”123 JSONfilessubstringHow to process bina
19、ry data more efficient?Understanding CPU CacheMemory is becoming slower and slower than CPU, we should keep the frequently accessed data in CPU cache.Understanding CPU CachePre-fetch data into CPU cache, with cache line boundary.The most 2 important techniques in big data are .Sort and Hash!Naive So
20、rtrecordrecordrecordpointerpointerpointerNaive SortrecordrecordrecordpointerpointerpointerNaive SortrecordrecordrecordpointerpointerpointerNaive SortrecordrecordrecordpointerpointerpointerNaive SortEach comparison needs to access 2 different memory regions, which makes it hard for CPU cache to pre-f
21、etch data, poor cache locality!Cache-aware Sortrecordrecordrecordptrptrptrkey-prefixkey-prefixkey-prefixCache-aware Sortrecordrecordrecordptrptrptrkey-prefixkey-prefixkey-prefixCache-aware Sortrecordrecordrecordptrptrptrkey-prefixkey-prefixkey-prefixCache-aware SortMost of the time, just go through
22、the key-prefixes in alinear fashion, good cache locality!Naive Hash MappointerpointerpointerkeyvaluekeyvaluekeyvalueNaive Hash MappointerpointerpointerkeyvaluekeyvaluekeyvaluekeylookupNaive Hash Mappointerpointerkeyvaluekeyvaluekeyvaluekeypointerhash(key) %sizeNaive Hash Mappointerpointerkeyvaluekey
23、valuekeyvaluekeypointercompare these 2 keysNaive Hash Mappointerpointerpointerkeyvaluekeyvaluekeyvaluekeylinear probingNaive Hash Mappointerpointerpointerkeyvaluekeyvaluekeyvaluekeycompare these 2keysNaive Hash MapEach lookup needs many pointer dereferences and key comparison when hash collision happens, and jumps between 2 memory regions, bad cache locality!ptrptrptrfull hashfull hashfull hashCache-aware Hash Mapkeyvaluekeyvaluekeyvalueptrptrptrfull hashfull hashfull hashCache-aware Hash Mapkeyvaluekeyvaluekeyvaluekeylookupptrptrptrfull hashfull hashfull hashCache-aware Hash Mapkeyv
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 耐火土石矿山环境保护与矿山环境保护标准化推进考核试卷
- 自然保护合同履约金条款
- 物理实验设计与数据分析考核试卷
- 住宅建筑与社区卫生状况考核试卷
- 玻璃保温容器市场营销策略与推广技巧实施实践探索考核试卷
- 特种陶瓷国际贸易与市场营销考核试卷
- 汽车金融公司金融产品风险收益平衡策略考核试卷
- 地理课程标准与教材研究
- 会展活动策划书
- 电梯门机系统的故障预测与健康管理技术考核试卷
- GB 21258-2024燃煤发电机组单位产品能源消耗限额
- 新进(转岗)职工三级安全教育培训表
- GB/T 44347-2024乡村绿化技术规程
- (修订版)粮油质量检验员理论考试复习题库-下多选、判断题
- 保险行业客户画像分析与精准营销方案
- 沪教版小学六年级数学应用题150道及答案
- 北师大版四年级下册小数乘法竖式计算练习100题及答案
- 2024年湖南省长沙市中考地理试卷真题(含答案解析)
- 《中国健康成年人身体活动能量消耗参考值》(编制说明)
- 食堂大米采购招标文件
- CJT 216-2013 给水排水用软密封闸阀
评论
0/150
提交评论