下载本文档
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Deep Dive:How Spark Uses MemorySpark内存使用机制分析议程Memory Usage OverviewMemory ContentionTungsten Memory FormatCache-aware ComputationFuture PlansWhere Spark Uses Memorystorage: memory used to cache data that will be used later.(controlled by memory manager)execution: memory used for computation in shuff
2、les, joins, sorts and aggregations. (controlled by memory manager)others: user data structure, internal metadata, objects created by UDF, etc.4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3, 4, 5,6Iteratortake(3)ExecutionMemoryWhat if I want the sorted values again?4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3,
3、 4, 5,6Iteratortake(3)4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3, 4, 5,6Iteratortake(4).4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3, 4, 5,6IteratorExecutionMemoryCache123456take(3)take(4)take(5)StorageMemoryMemory ContentionHow to arbitrate memory between execution and storage?How to arbitrate memory acr
4、oss tasks running in parallel?How to arbitrate memory across operators running within thesame task?Challenge #1How to arbitrate memory between execution and storage?Easy, static assignment!total available memoryexecutionstorageEasy, static assignment!executionstorageSpill to DiskEasy, static assignm
5、ent!executionstorageEvict LRU blocks to diskInefficient memory usage leads to bad performanceEasy, static assignment!executionstorageExecution can only use a fraction of thememory, even when there is no storage!Easy, static assignment!executionstorageEfficient use of memory required user tuningUnifi
6、ed Memory ManagementexecutionstorageWhat happens if there is alreadystorage?Unified Memory ManagementexecutionstorageEvict LRU blocks to diskUnified Memory ManagementexecutionstorageDesign ConsiderationsWhy evict storage, not execution?Spilled execution data will always be read back from disk, where
7、 as cached data may not.What if the application relies on cache?Unified Memory ManagementexecutionstorageThis is bad!Design ConsiderationsWhy evict storage, not execution?Spilled execution data will always be read back from disk, where as cached data may not.What if the application relies on cache?a
8、llow users to specify a minimum unevictable amount of cacheddata(not a reservation!)Challenge #2How to arbitrate memory across tasks running in parallel?Easy, static assignment!Worker machine has 4 coresEach task gets of the totalmemoryTask1Task 2Task 3Task 4Dynamic AssignmentTask 1The share of each
9、 task depends on thenumber of actively running tasksDynamic AssignmentTask 1The share of each task depends on thenumber of actively running tasksDynamic AssignmentTask 1Now another task comes along, the first task have to spill to free up memoryDynamic AssignmentEach task is now assigned of thetotal
10、 memoryTask 1Task 2Dynamic AssignmentEach task is now assigned of thetotal memoryTask1Task 2Task 3Task 4Dynamic AssignmentTask 3Last remaining task gets all the memoryStatic vs Dynamic AssignmentBoth are fair and starvation freeStatic Assignment is simplerDynamic assignment handles stragglers better
11、Challenge #3How to arbitrate memory across operators running within the same task?SELECT age,AVG(height) FROM students GROUP BY ageORDER BYAVG(height)students.groupBy(age).avg(height).orderBy(avg(height).collect()ScanProjectAggregateSortScanProjectAggregateSortThe task has 6pages of memoryScanProjec
12、tAggregateSortMap / age (total, count)20 (483, 3)21 (935, 5)22 (172, 1)ScanProjectAggregateSortAll 6 pages were used by Aggregate, leaving no memory for Sort!ScanProjectAggregateSortSolution #1Reserve a page for each operatorScanProjectAggregateSortSolution #1Reserve a page for each operatorStarvati
13、on free, but still not fairWhat if there were more operators?ScanProjectAggregateSortSolution #2Cooperative spillingScanProjectAggregateSortSolution #2Cooperative spillingScanProjectAggregateSortSolution #2Cooperative spillingSort forces Aggregate to spill a page to free memoryScanProjectAggregateSo
14、rtSolution #2Cooperative spillingSort needs more memory so it forces Aggregate to spill another page(and so on)ScanProjectAggregateSortSolution #2Cooperative spillingSort finishes with 3 pagesAggregate does not have to spill its remaining pagesRecap: three source of contentionHow to arbitrate memory
15、 between execution and storage?across tasks running in parallel?across operators running with the same task?Instead of statically reserving memory in advance, deal with memory contention when it raises by forcing members to spillHow Spark keep data in memoryPut data as objects on the heap and operat
16、e on theseobjects.Data caching is simply using a list to keep data objects.Data objects? No!It is hard to monitor and control the memory usage whenwe have a lot of objects.Garbage collection will be the killer.Java objects has notable space overhead.High serialization cost when transfer data inside
17、cluster.Keep data as binary and operate onbinarydatasource1010100001010001000100100110100010001111010100100101binary formatCacheManagerMemory Manageroperatorsallocatememorymemory pagesallocatememorymemory pagesJava Objects Based Row FormatBoxedInteger(123)String(“data”)String(“bricks”)RowArray5+ obj
18、ectshigh space overheadslow value accessingexpensive hashCode()Efficient Binary Format“bricks”0 x01233240“data”46null tracking(123, “data”, “bricks”)offset and length ofoffset and length ofdataEfficient Binary Format0 x1220344“data”0 x1220344“data”0 x1062“da”123 JSONfilessubstringHow to process bina
19、ry data more efficient?Understanding CPU CacheMemory is becoming slower and slower than CPU, we should keep the frequently accessed data in CPU cache.Understanding CPU CachePre-fetch data into CPU cache, with cache line boundary.The most 2 important techniques in big data are .Sort and Hash!Naive So
20、rtrecordrecordrecordpointerpointerpointerNaive SortrecordrecordrecordpointerpointerpointerNaive SortrecordrecordrecordpointerpointerpointerNaive SortrecordrecordrecordpointerpointerpointerNaive SortEach comparison needs to access 2 different memory regions, which makes it hard for CPU cache to pre-f
21、etch data, poor cache locality!Cache-aware Sortrecordrecordrecordptrptrptrkey-prefixkey-prefixkey-prefixCache-aware Sortrecordrecordrecordptrptrptrkey-prefixkey-prefixkey-prefixCache-aware Sortrecordrecordrecordptrptrptrkey-prefixkey-prefixkey-prefixCache-aware SortMost of the time, just go through
22、the key-prefixes in alinear fashion, good cache locality!Naive Hash MappointerpointerpointerkeyvaluekeyvaluekeyvalueNaive Hash MappointerpointerpointerkeyvaluekeyvaluekeyvaluekeylookupNaive Hash Mappointerpointerkeyvaluekeyvaluekeyvaluekeypointerhash(key) %sizeNaive Hash Mappointerpointerkeyvaluekey
23、valuekeyvaluekeypointercompare these 2 keysNaive Hash Mappointerpointerpointerkeyvaluekeyvaluekeyvaluekeylinear probingNaive Hash Mappointerpointerpointerkeyvaluekeyvaluekeyvaluekeycompare these 2keysNaive Hash MapEach lookup needs many pointer dereferences and key comparison when hash collision happens, and jumps between 2 memory regions, bad cache locality!ptrptrptrfull hashfull hashfull hashCache-aware Hash Mapkeyvaluekeyvaluekeyvalueptrptrptrfull hashfull hashfull hashCache-aware Hash Mapkeyvaluekeyvaluekeyvaluekeylookupptrptrptrfull hashfull hashfull hashCache-aware Hash Mapkeyv
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 大型礼仪庆典活动策划公司审计师述职报告
- 《老年原发性脑淋巴瘤专科护理|化疗管理 + 全套护理措施》
- 《儿童慢性病居家过敏反应急救专科护理》
- 《低血糖症专科护理》
- 跨境基础及电商1Chapter 4-Customer Service of Cross-border E-commerce
- 均衡生产计划催办函5篇范本
- 互联网企业数据分析报告撰写完备流程
- 梦想起航分享会小学主题班会课件
- 家庭与学校共同塑造孩子的未来小学主题班会课件
- 职业规划与目标设定手册
- 燃气管网安全培训课件
- 《诗经》诗经全文
- 初二下八升九-暑期不躺平初三一定赢-暑假前期末家长会课件
- 【曲臂式高空作业台载荷数值的估值与计算过程案例3200字】
- 2025年度食品厂安全应急演练计划
- 采购付款管理办法
- 消防防排烟系统培训课件
- 小学数学非纸笔测评任务的设计原则与实施策略
- 2025广州市小升初数学分班考试押题卷(学生版+解析版)
- 《比看上去更有意思》(2021年上海市中考满分作文33篇附审题指导)
- JG/T 335-2011混凝土结构防护用成膜型涂料
评论
0/150
提交评论