版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Deep Dive:How Spark Uses MemorySpark内存使用机制分析议程Memory Usage OverviewMemory ContentionTungsten Memory FormatCache-aware ComputationFuture PlansWhere Spark Uses Memorystorage: memory used to cache data that will be used later.(controlled by memory manager)execution: memory used for computation in shuff
2、les, joins, sorts and aggregations. (controlled by memory manager)others: user data structure, internal metadata, objects created by UDF, etc.4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3, 4, 5,6Iteratortake(3)ExecutionMemoryWhat if I want the sorted values again?4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3,
3、 4, 5,6Iteratortake(3)4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3, 4, 5,6Iteratortake(4).4, 3, 5, 1, 6,2IteratorSort1234561, 2, 3, 4, 5,6IteratorExecutionMemoryCache123456take(3)take(4)take(5)StorageMemoryMemory ContentionHow to arbitrate memory between execution and storage?How to arbitrate memory acr
4、oss tasks running in parallel?How to arbitrate memory across operators running within thesame task?Challenge #1How to arbitrate memory between execution and storage?Easy, static assignment!total available memoryexecutionstorageEasy, static assignment!executionstorageSpill to DiskEasy, static assignm
5、ent!executionstorageEvict LRU blocks to diskInefficient memory usage leads to bad performanceEasy, static assignment!executionstorageExecution can only use a fraction of thememory, even when there is no storage!Easy, static assignment!executionstorageEfficient use of memory required user tuningUnifi
6、ed Memory ManagementexecutionstorageWhat happens if there is alreadystorage?Unified Memory ManagementexecutionstorageEvict LRU blocks to diskUnified Memory ManagementexecutionstorageDesign ConsiderationsWhy evict storage, not execution?Spilled execution data will always be read back from disk, where
7、 as cached data may not.What if the application relies on cache?Unified Memory ManagementexecutionstorageThis is bad!Design ConsiderationsWhy evict storage, not execution?Spilled execution data will always be read back from disk, where as cached data may not.What if the application relies on cache?a
8、llow users to specify a minimum unevictable amount of cacheddata(not a reservation!)Challenge #2How to arbitrate memory across tasks running in parallel?Easy, static assignment!Worker machine has 4 coresEach task gets of the totalmemoryTask1Task 2Task 3Task 4Dynamic AssignmentTask 1The share of each
9、 task depends on thenumber of actively running tasksDynamic AssignmentTask 1The share of each task depends on thenumber of actively running tasksDynamic AssignmentTask 1Now another task comes along, the first task have to spill to free up memoryDynamic AssignmentEach task is now assigned of thetotal
10、 memoryTask 1Task 2Dynamic AssignmentEach task is now assigned of thetotal memoryTask1Task 2Task 3Task 4Dynamic AssignmentTask 3Last remaining task gets all the memoryStatic vs Dynamic AssignmentBoth are fair and starvation freeStatic Assignment is simplerDynamic assignment handles stragglers better
11、Challenge #3How to arbitrate memory across operators running within the same task?SELECT age,AVG(height) FROM students GROUP BY ageORDER BYAVG(height)students.groupBy(age).avg(height).orderBy(avg(height).collect()ScanProjectAggregateSortScanProjectAggregateSortThe task has 6pages of memoryScanProjec
12、tAggregateSortMap / age (total, count)20 (483, 3)21 (935, 5)22 (172, 1)ScanProjectAggregateSortAll 6 pages were used by Aggregate, leaving no memory for Sort!ScanProjectAggregateSortSolution #1Reserve a page for each operatorScanProjectAggregateSortSolution #1Reserve a page for each operatorStarvati
13、on free, but still not fairWhat if there were more operators?ScanProjectAggregateSortSolution #2Cooperative spillingScanProjectAggregateSortSolution #2Cooperative spillingScanProjectAggregateSortSolution #2Cooperative spillingSort forces Aggregate to spill a page to free memoryScanProjectAggregateSo
14、rtSolution #2Cooperative spillingSort needs more memory so it forces Aggregate to spill another page(and so on)ScanProjectAggregateSortSolution #2Cooperative spillingSort finishes with 3 pagesAggregate does not have to spill its remaining pagesRecap: three source of contentionHow to arbitrate memory
15、 between execution and storage?across tasks running in parallel?across operators running with the same task?Instead of statically reserving memory in advance, deal with memory contention when it raises by forcing members to spillHow Spark keep data in memoryPut data as objects on the heap and operat
16、e on theseobjects.Data caching is simply using a list to keep data objects.Data objects? No!It is hard to monitor and control the memory usage whenwe have a lot of objects.Garbage collection will be the killer.Java objects has notable space overhead.High serialization cost when transfer data inside
17、cluster.Keep data as binary and operate onbinarydatasource1010100001010001000100100110100010001111010100100101binary formatCacheManagerMemory Manageroperatorsallocatememorymemory pagesallocatememorymemory pagesJava Objects Based Row FormatBoxedInteger(123)String(“data”)String(“bricks”)RowArray5+ obj
18、ectshigh space overheadslow value accessingexpensive hashCode()Efficient Binary Format“bricks”0 x01233240“data”46null tracking(123, “data”, “bricks”)offset and length ofoffset and length ofdataEfficient Binary Format0 x1220344“data”0 x1220344“data”0 x1062“da”123 JSONfilessubstringHow to process bina
19、ry data more efficient?Understanding CPU CacheMemory is becoming slower and slower than CPU, we should keep the frequently accessed data in CPU cache.Understanding CPU CachePre-fetch data into CPU cache, with cache line boundary.The most 2 important techniques in big data are .Sort and Hash!Naive So
20、rtrecordrecordrecordpointerpointerpointerNaive SortrecordrecordrecordpointerpointerpointerNaive SortrecordrecordrecordpointerpointerpointerNaive SortrecordrecordrecordpointerpointerpointerNaive SortEach comparison needs to access 2 different memory regions, which makes it hard for CPU cache to pre-f
21、etch data, poor cache locality!Cache-aware Sortrecordrecordrecordptrptrptrkey-prefixkey-prefixkey-prefixCache-aware Sortrecordrecordrecordptrptrptrkey-prefixkey-prefixkey-prefixCache-aware Sortrecordrecordrecordptrptrptrkey-prefixkey-prefixkey-prefixCache-aware SortMost of the time, just go through
22、the key-prefixes in alinear fashion, good cache locality!Naive Hash MappointerpointerpointerkeyvaluekeyvaluekeyvalueNaive Hash MappointerpointerpointerkeyvaluekeyvaluekeyvaluekeylookupNaive Hash Mappointerpointerkeyvaluekeyvaluekeyvaluekeypointerhash(key) %sizeNaive Hash Mappointerpointerkeyvaluekey
23、valuekeyvaluekeypointercompare these 2 keysNaive Hash Mappointerpointerpointerkeyvaluekeyvaluekeyvaluekeylinear probingNaive Hash Mappointerpointerpointerkeyvaluekeyvaluekeyvaluekeycompare these 2keysNaive Hash MapEach lookup needs many pointer dereferences and key comparison when hash collision happens, and jumps between 2 memory regions, bad cache locality!ptrptrptrfull hashfull hashfull hashCache-aware Hash Mapkeyvaluekeyvaluekeyvalueptrptrptrfull hashfull hashfull hashCache-aware Hash Mapkeyvaluekeyvaluekeyvaluekeylookupptrptrptrfull hashfull hashfull hashCache-aware Hash Mapkeyv
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 高考物理总复习专题七电场第1讲库仑定律、电场强度、电场线练习含答案
- 专业混凝土分包合同范本
- 《平衡计分卡培训》课件
- 高中数学 2.2 直线的方程 2.2.3.1 两条直线相交、平行与重合的条件教案 新人教B版必修2
- 八年级物理下册 第九章 机械和功 第三节 功教案 (新版)北师大版
- 六年级品德与社会上册 科学技术的另一面教案 泰山版
- 高中数学 1.1.2 空间向量的数量积运算教学设计 新人教A版选择性必修第一册
- 2024年六年级品社下册《让科学技术走进生活》教案1 冀教版
- 厨房管理规章制度
- 租借手机的合同(2篇)
- 我的生涯发展
- 银行存款业务课件
- 2024年扬州市职业大学高职单招(英语/数学/语文)笔试历年参考题库含答案解析
- 2024年全国初中数学竞赛试题及答案
- 前交叉韧带重建术后康复训练
- 安防监控系统技术标投标书范本-图文
- 物业费用测算表
- 仓库卫生和清洁要求
- 上海家政行业报告
- 《咖啡培训课程》课件
- 2024年中粮集团中粮贸易有限公司招聘笔试参考题库含答案解析
评论
0/150
提交评论