




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Scheduling in Hadoop,Vivek Ratan,Me,Amazon: 5 months (Bangalore office) Principal in GPS (Global Payments Services),2,The IAAU,3,(Incredible Amazon Acronym Universe),A system that breaks your work into parallel pieces and distributes them across many machines Map-Reduce implementation + file system
2、Runs on thousands of machines, processes petabytes of data At Yahoo: 25,000+ machines, 80+PB of data,4,My talk: Scheduling in Hadoop,Three drivers: business requirements, user interaction, and technical challenges Sometimes, figuring out what to build is as hard as building it Evolution through repe
3、ated iteration,5,Hadoop clusters,6,File,Blocks,Interacting with Hadoop,7,submit,Tasks,Job,broken into,The scheduling problem,What task do you run on what machine, and in what order?,8,Job,Job,Job,In the early days,Business as many users as possible User interaction simple Technical 100s of nodes,9,T
4、he Original Hadoop scheduler,Im free,Data locality,Job,Scheduler,10,Woo Hoo!,Easy!,11,But ,Fairness Priorities Isolation Run-time determinism (SLA),12,Time to iterate!,13,Enter, Hadoop On Demand (HOD),Job,HOD,# machines,14,Woo Hoo!,Isolation Some determinism Reuse,15,But ,Specifying # of nodes for a
5、 job No data locality Bad utilization Hard to understand,16,Time to iterate!,17,Hmmm,18,Lessons learned so far,Task vs. job based scheduling Determinism Fairness,19,What do we build next?,Business SLAs Parts of clusters funded by different groups Users Easy-to-understand, fair Job priorities Technic
6、al Support 3-4K machines,20,What do we build next?,What do we name it? Yahoo scheduler? Treebeard?,21,Enter, the Capacity Scheduler,22,Queues,Retail queue,AWS queue,Misc queue,Job,Retail,23,Capacity Scheduler,Capacities,Cluster capacity (in slots) = maximum number of tasks that can run in parallel N
7、 machines, 4 tasks/machine: capacity = 4*N slots,24,Capacities,25,Retail,AWS,Misc,Fairness: user limits,Each queue has a user limit: Maximum % of the queues capacity available to a single user,User limit = 33%,26,Retail,Putting it all together,Im free,27,Retail,AWS,Misc,Capacity Scheduler,Step 1: Pi
8、cking a queue,Running tasks = 30,(#running)/capacity = 30/50 = 0.6,28,Retail,Running tasks = 28,(#running)/capacity = 28/30 = 0.93,AWS,Running tasks = 15,(#running)/capacity = 15/20 = 0.75,Misc,Putting it all together,29,Im free,Retail,AWS,Misc,Step 2: Picking a job,Retail,User: A,Priority: High,Job
9、: 1,User: A,Priority: Med,Job: 2,User: B,Priority: Med,Job: 3,User: C,Priority: Low,Job: 4,30,Putting it all together,31,Im free,Retail,AWS,Misc,Step 3: Picking a task,Data locality,32,Why this algorithm?,Simple for users to understand Fair,33,Borrowing capacity,34,Im free,Retail,AWS,Misc,Borrowing
10、capacity,Running = 50 Run/cap = 50/50 = 1,35,Retail,Running = 30 Run/cap = 30/30 = 1,AWS,Running = 0 Run/cap = 0/20 = 0,Misc,Running = 60 Run/cap = 60/50 = 1.2,Running = 35 Run/cap = 35/30 = 1.2,Running = 0 Run/cap = 0/20 = 0,Reclaiming capacity,Job,Misc,36,Retail,AWS,Misc,Reclaiming capacity,Runnin
11、g = 50 Run/cap = 50/50 = 1,37,Retail,Running = 30 Run/cap = 30/30 = 1,AWS,Running = 20 Run/cap = 20/20 = 1,Misc,Running = 60 Run/cap = 60/50 = 1.2,Running = 35 Run/cap = 35/30 = 1.2,Running = 0 Run/cap = 0/20 = 0,Running = 55 Run/cap = 55/50 = 1.1,Running = 33 Run/cap = 33/30 = 1.1,Running = 12 Run/
12、cap = 12/20 = 0.6,Killing tasks,38,Engineering challenges,39,Real-time decisions,4000 machines. If each machine asks for a task every 20s, scheduler gets 200 requests/sec. Decide within 1-2 ms.,40,Real-time decisions,Sorting can hurt Memory concerns Some staleness is OK,41,Status,Deployed across Yah
13、oo grids (25K+ machines) Big increases in throughput and utilization Constantly refining algorithms Deploy, measure, tweak, repeat,42,Some more lessons learned,Simplicity is good Keep users happy Simple but consistent model Fair behavior, even if bad OK to have errors, as long as users are told,43,S
14、ome more lessons learned,Design with real user behavior Requirements keep changing. Deal with it. Nobody knows the right answer. Iterate: deploy, measure, tweak, repeat. Ops, QA are also your users. Open source challenges.,44,Recap,Three drivers: business requirements, user interaction, and technica
15、l challenges Sometimes, figuring out what to build is as hard as building it Evolution through repeated iteration,45,Thank you,46,Backup,47,Longer-term Optimizations,Distributed scheduling Give machines a bunch of tasks, let them schedule locally Predictive modeling,48,Open source challenges,49,Othe
16、r lessons learnt,Upgrades on a shared environment are hard You will break some of the 1000s of apps Distributed performance analysis on a shared environment is hard Isolation vs throughput You want to mix lots of low priority work with your high priority work, to saturate the cluster Low priority re
17、search job often crash clusters,50,Other lessons learnt,Open Source development creates some challenges When Yahoo! invests in an area, others shift their focus, making collaboration challenging! Focus on our key concerns and let others innovate in other places Most people make lousy contributors! There is a big investment in training someone to be an effective contributor No dilettantes! To become a committer, you are expected to stay in the community and support your work and the work of others! KISS careful of new features to add,51,Network topology,52,Rack,Other features,Me
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 【正版授权】 ISO 21384-4:2025 EN Uncrewed aircraft systems - Part 4: Vocabulary
- 【正版授权】 IEC 60974-4:2025 EN-FR Arc welding equipment - Part 4: Periodic inspection and testing
- 【正版授权】 ISO/IEC 25422:2025 EN Information technology - 3D Manufacturing Format (3MF) specification suite
- 【正版授权】 ISO/IEC TR 14143-3:2003 EN Information technology - Software measurement - Functional size measurement - Part 3: Verification of functional size measurement methods
- 【正版授权】 IEC 61156-11:2023/AMD1:2025 EN Amendment 1 - Multicore and symmetrical pair/quad cables for digital communications - Part 11: Symmetrical single pair cables with transmissio
- 2025至2030中国电窑行业产业运行态势及投资规划深度研究报告
- 2025至2030中国电池螺帽扳手行业产业运行态势及投资规划深度研究报告
- 2025至2030中国电动摩托车产业行业市场占有率及投资前景评估规划报告
- 2025至2030中国猪饲料预混料行业产业运行态势及投资规划深度研究报告
- 2025至2030中国物流金融行业市场发展现状分析及发展趋势与投资前景报告
- 福建漳州安然燃气有限公司招聘笔试题库2025
- 2025年天津市中考历史试卷(含答案)
- 2025年中国汽车检测行业市场调查研究及投资前景预测报告
- 2025秋初升高衔接新高一物理模拟卷-分班模拟卷(五)
- 公司年终答谢宴策划方案
- T/CIES 035-2024生鲜食用农产品照明光源显色性规范
- 充电站可行性研究报告
- 湖北中考英语真题单选题100道及答案
- 母婴店转让协议书范本
- 2025-2030中国医疗IT行业市场深度调研及竞争格局与投资研究报告
- 2025-2030中国高超音速技术行业市场发展趋势与前景展望战略研究报告
评论
0/150
提交评论