云计算入门PPT课件_第1页
云计算入门PPT课件_第2页
云计算入门PPT课件_第3页
云计算入门PPT课件_第4页
云计算入门PPT课件_第5页
已阅读5页,还剩83页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、云计算入门云计算入门Introduction to Cloud ComputingGESC1001Philippe Fournier-Philippe Fournier-VigerVigerProfessor School of Humanities and Social SFallFall 2019 20191Lecture #3Lecture #32Part 1Introduction and overviewPart 2Distributed and parallel systemsPart 3Cloud infrastructurePart 4Cloud application par

2、adigm (1)PartPart 5 5Cloud application paradigm (2)Cloud application paradigm (2)Part 6Cloud virtualization and resource managementPart 7 & 8Cloud computing storage systems Cloud computing security Final examLast week:Last week: Review ChapterChapter 4 4: Cloud application paradigm (part 1)Today

3、:Today:ChapterChapter 4 4: Cloud applications(part 2) the Map Reduce modelAssignment 13We can discuss immediately after lecturesYou may use the QQ group to contact teaching assistantsMy e-mail: My e-mail: 45We discussed challenges for developing cloud applicationsToday, we will talk about the detail

4、s of how cloud applications are created.To make cloud applicationscloud applications, the MapReduceMapReduce modelmodel is very popular.It is a “programming model” programming model” (编程模型 - a way of developing applications for the cloud).It was proposed by GoogleGoogle in a research paper, publishe

5、d in 2004. MapReduce: Simplified Data Processing on Large Clusters,Jeffrey Dean and Sanjay Ghemawat, OSDI04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 200Why Why MapReduceMapReduce is popular?is popular?Because it is a simple programming mod

6、el. A programmerprogrammer (程序员) can easily write an application application that run on a distributed system (the cloud), without much experience about how distributed systems work.7Cloud Cloud ApplicationApplication(云应用)Programmer (Programmer (程序员程序员) )Cloud Cloud (云)One of the most popular versio

7、n of MapReduceMapReduce is HadoopHadoop.It is an open-source open-source (开放源码) implementation of MapReduceMapReduce.I will explain the main idea.We will also discuss three examplesthree examples.8/The main advantage of the cloud is elasticity elasticity (云的弹性) . Using as many

8、 computers as needed to address the costcost (元) and timingtiming constraints of an application. Sharing the workloadworkload (工作负载) between several computers. It must be divided into sub-taskssub-tasks that can be accomplished in parallel by several computers.But how to do this?9The workload worklo

9、ad should be divided(分配) approximately equally between computers.10WorkloadWorkloadPartitioningPartitioning (分配) the workload the workload is not always easy.Three main typesThree main types ofof workloadsworkloads: modularly divisible modularly divisible (模块化分割 ) workloadworkload: the workload is a

10、lready divided into sub-tasks. arbitrarily divisible arbitrarily divisible (可任意划分) workloadworkload: the workload can be partitioned into an arbitrarily large number of sub-tasks of equal or similar size. Others.11Designed for arbitrarily divisible arbitrarily divisible (可任意划分) workloadsworkloads.It

11、 is used to perform parallel parallel processing processing (并行处理) for data-intensivedata-intensive (数据密集型) applications.It has many applications: e.g. e.g. physics, biology, etc.Once a cloud applications is created using MapReduce, it can run in the cloud on as many computers as needed.12Phase 1 (M

12、ap)Phase 1 (Map)1. Split the data into blocks2. Assign each block to an instance (实例) (e.g. e.g. a computer or virtual machine)3. Run these instances in parallel1313BCADataPhase 1 (Map)Phase 1 (Map)1. Split the data into blocks2. Assign each block to an instance (实例) (e.g. e.g. a computer or virtual

13、 machine)3. Run these instances in parallel1414BCADataDataPart1 DataPart2DataPart3Phase 1 (Map)Phase 1 (Map)1. Split the data into blocks2. Assign each block to an instance (实例) (e.g. e.g. a computer or virtual machine)3. Run these instances in parallel1515BCADataPart1 DataPart2DataPart3DataPart1 Da

14、taPart2DataPart3Phase 1 (Map)Phase 1 (Map)1. Split the data into blocks2. Assign each block to an instance (实例) (e.g. e.g. a computer or virtual machine)3. Run these instances in parallel1616BCADataPart1 DataPart2DataPart3Phase 2 (Reduce)Phase 2 (Reduce)1.Once all the instances have finished their s

15、ub-tasks, they send their results. 2.Results are merged to obtain the final result.17BCADataPart1 DataPart2DataPart3Result 1Result2Result3Phase 2 (Reduce)Phase 2 (Reduce)1.Once all the instances have finished their sub-tasks, they send their results. 2.Results are merged to obtain the final result.1

16、8BCADataPart1 DataPart2DataPart3Result 1Result2Result3Result 1Result2Result3Phase 2 (Reduce)Phase 2 (Reduce)1.Once all the instances have finished their sub-tasks, they send their results. 2.Results are merged to obtain the final result.19BCAResult 1Result2Result3ResultPhase 2 (Reduce)Phase 2 (Reduc

17、e)1.Once all the instances have finished their sub-tasks, they send their results. 2.Results are merged to obtain the final result.20BCAResultA “mastermaster instanceinstance” takes care of splitting the data.MergingMerging the results the results can be done by a set of instances called the “reduci

18、ng reducing instancesinstances”21MasterDataDataPart1 DataPart2DataPart3RI_1Result 1Result2Result3ResultRI_2The input data input data (输入数据) can be any kind of files. But it is converted to a set of pairspairs (键值对). e.g.e.g.: (key= CN, value = Shenzhen) (key= CN, value = Beijing) .22A keykey (键) is

19、some information that is used to group values together. The output data output data ( (输出数据输出数据) ) is a also set of pairs. e.g.e.g.: (key= CN, value = Shenzhen) (key= CN, value = Beijing) .23A keykey (键) is some information that is used to group values together. MapReduceMapReduce is a programming m

20、odel programming model (编程模型) It is inspired by the MapMap and the ReduceReduce operations of the LISPLISP programming language.It is designed to process large large datasets datasets on computing clusters (the cloud cloud 云云).It is often used with the JavaJava language.A programmer has to define ma

21、p() map() and reduce() reduce() functions24Consider that we want to count how many times each word appear in a very large text document.25TextText“Hello world, bye “Hello world, bye world,world,Hello cloud, Hello cloud, goodbye cloud”goodbye cloud”The master instance master instance first splits the

22、 data into M M data blocks.26TextText“Hello world, bye “Hello world, bye world,world,Hello cloud, Hello cloud, goodbye cloud”goodbye cloud”Part1 Part2“Hello world, bye world”“Hello world, bye world”“Hello cloud, goodbye cloud”“Hello cloud, goodbye cloud”CMaster instanceMaster instanceThen, it starts

23、 M M mapping instances and gives a data block to each instance.27TextText“Hello world, bye “Hello world, bye world,world,Hello cloud, Hello cloud, goodbye cloud”goodbye cloud”Part1 Part2“Hello world, bye world”“Hello world, bye world”“Hello cloud, goodbye cloud”“Hello cloud, goodbye cloud”ABCMaster

24、instanceMaster instanceMapping instancesMapping instancesAll instances work in parallel.All instances work in parallel.Consider the Consider the first instancefirst instance. . It reads its data. It creates a pair (键值对) for each word that it reads. A key (键) is a word and the corresponding value (值)

25、 is the number 1.Part1 “Hello world, bye world”“Hello world, bye world”ASome words like “World” appear multiple times in the result.All values that have the same key are grouped together. Part1 “Hello world, bye world”“Hello world, bye world”AConsider the Consider the second instancesecond instance.

26、 . The second instance reads its data. It creates a pair for each word that it reads. A key is a word and the corresponding value is the number 1.Part1 “Hello cloud, goodbye cloud”“Hello cloud, goodbye cloud”BThen, the second instance second instance groups all values that have the same key together

27、. Part1 “Hello cloud, goodbye cloud”“Hello cloud, goodbye cloud”BSo until now, we have: 32BANext, the reducereduce phase phase will combine the local results found by all instances. The master instance master instance will start R R reducing instances for combining results of mapping instances.In th

28、is example, only one reducing instance is used (instance D)33BADReducinginstanceReducinginstance34BADReducinginstanceReducinginstanceThis is the This is the final result!final result!This is the code for this example:35Combine local results36(1) An application starts a master instance and M worker i

29、nstances for the Map phase and, later, R worker instances for the Reduce phase. The The MapReduceMapReduce ProcessProcess37(2) The master split (分配) the input data in M segments (parts).The The MapReduceMapReduce ProcessProcess38(3) Each Map instance reads its input data segment and processes the da

30、taThe The MapReduceMapReduce ProcessProcess39(4) The local results are stored on the local disks of the computers where the Map instances are executed.The The MapReduceMapReduce ProcessProcess40(5) The R reduce instances read the local results and merge the results. The The MapReduceMapReduce Proces

31、sProcess41(6) The final results are written by the Reduce instances to a shared storage (共享存储)The The MapReduceMapReduce ProcessProcess42(7) The master instance monitors the Reduce instances and. When all of them have finished, it is the ENDEND.The The MapReduceMapReduce ProcessProcessThe data is us

32、ually split in blocks of 16 MB to 64 MB (megabytes - 兆字节) ).The number of instances can be a few to hundreds, or thousands of instances.What if some instances crashes? 43Fault-tolerance (Fault-tolerance (容错容错) ): to ensure that a task is accomplished properly even if some machines stop working.The m

33、aster instance master instance asks each worker machine about their statestate (idle 空闲状态, in-progress 正在进行, or completed 完成任务) and identityidentity. If the worker machine does not respond, the master instance considers that this machines sub-task has failed.44MasterWhat is your state?A tasktask in

34、progress in progress (正在进行) on a failed worker failed worker is set to idle idle (空闲状态).The task can then be given to another worker (computer).The master writes takes of note of the tasks that have been completed. The datadata is stored using the GFSGFS (Google File System).45According to the bookA

35、ccording to the book, in 20122012, a typical computer for experimenting with MapReduce has the following characteristics:dual-processor x86 running Linux, 24 GB of memory,Network card: 1001,000 Mbps. Data is stored on IDE 7 disks attached directly to individual machines. The file system uses replica

36、tion (复制)46A cluster consists of hundreds or thousands of machines.It provides availabilityavailability (可利用性) and reliabilityreliability (可靠) using unreliableunreliable hardwarehardware. The inputinput datadata is stored on the local disk local disk of each instance to reduce communication between

37、computers.47TaskTask: analyze a text to count how many words with 1 letters, with 2 letters, with 3 letters, with 4 letters 48TextText“Hello world, bye “Hello world, bye world,world,Hello cloud, Hello cloud, goodbye cloud”goodbye cloud”The master instance master instance first splits the data into M

38、 M data blocks. Here M = 2.49TextText“Hello world, bye “Hello world, bye world,world,Hello cloud, Hello cloud, goodbye cloud”goodbye cloud”Part1 Part2“Hello world, bye world”“Hello world, bye world”“Hello cloud, goodbye cloud”“Hello cloud, goodbye cloud”CMaster instanceMaster instanceThen, it starts

39、 M M instances and gives a data block to each instance.50TextText“Hello world, bye “Hello world, bye world,world,Hello cloud, Hello cloud, goodbye cloud”goodbye cloud”Part1 Part2“Hello world, bye world”“Hello world, bye world”“Hello cloud, goodbye cloud”“Hello cloud, goodbye cloud”ABCMaster instance

40、Master instanceMapping instancesMapping instancesAll instances work in parallel.All instances work in parallel.Consider the Consider the first instancefirst instance. . The first instance reads its data. It creates a pair for each word that it reads. A key is the number of letters in the word and th

41、e value is the word.Part1 “Hello world, bye world”“Hello world, bye world”ASome words like “World” appear multiple times in the result.All values that have the same keykey are grouped together. Part1 “Hello world, bye world”“Hello world, bye world”A NoteNote: value having the same key are automatica

42、lly grouped Consider the Consider the second instancesecond instance. . The second instance reads its data. It creates a pair for each word that it reads, where a key is a number of letters and the corresponding value is a word.Part1 “Hello cloud, goodbye cloud”“Hello cloud, goodbye cloud”BThen, all

43、 values that have the same keykey are grouped together. Part1 “Hello cloud, goodbye cloud”“Hello cloud, goodbye cloud”B NoteNote: value having the same key are automatically grouped So until now, we have: 55BANow, the reduce phase reduce phase will take place to combine the local results found by ea

44、ch instance The master instance master instance starts R R reducing instances for combining results of mapping instances.In this example, only one reducing instance is used (instance D)56BADReducinginstanceReducinginstance 57BADReducinginstanceReducinginstanceThis is the This is the final result!fin

45、al result! The result is shown below. It means that there is one The result is shown below. It means that there is one word containing three letters, six words containing word containing three letters, six words containing five letters, and one word containing seven letters.five letters, and one wor

46、d containing seven letters.Consider a social network social network (社会网络) like like WechatWechat, QQ, LinkedIn , QQ, LinkedIn where you can be friend with other people.If you are a LinkedIn user LinkedIn user and you view the LinkedIn page of a friend, the page will indicates how many how many frie

47、nds you have in common. friends you have in common. Illustration Illustration 5859This is the profile page of one of my former Master This is the profile page of one of my former Master degree students. When I click on his page, I see that we degree students. When I click on his page, I see that we

48、have 10 “friends” in common.have 10 “friends” in common.Suppose that we have a social network with five users: A,B, C, D, E A,B, C, D, E We assume that friendshipfriendship (友谊 ) is a bidirectionalbidirectional relationshiprelationship (双向关系).In other words, if you are a friend of someone, s/he is a

49、lso your friend.Assume that this is the friendship graph:60A AB BC CE ED DAssume that data about friendship between users is stored in a text file as follows:A - B C DB - A C D EC - A B D ED - A B C EE - B C D61A AB BC CE ED DThe data file will be split and sent to various mapping instancesmapping i

50、nstances.Mapping instances will process each line that they receive as follows: 62The first line A - B C D is transformed as:by combining A with each of his friend.63(A B) - B C D(A C) - B C D(A D) - B C DKey Key ValueValueThe second line B - A C D E is transformed as:64(A B) - A C D E(B C) - A C D

51、E(B D) - A C D E(B E) - A C D EKey Key ValueValueThe third line C - A B D E is transformed as:65(A C) - A B D E(B C) - A B D E(C D) - A B D E(C E) - A B D EKey Key ValueValueThe fourth line D - A B C E is transformed as:66(A D) - A B C E(B D) - A B C E(C D) - A B C E(D E) - A B C EKey Key ValueValue

52、The fifth line E - B C D is transformed as:67(B E) - B C D(C E) - B C D(D E) - B C DKey Key ValueValueThe values are then grouped by their key:(A B) - (A C D E) (B C D)(A C) - (A B D E) (B C D)(A D) - (A B C E) (B C D)(B C) - (A B D E) (A C D E)(B D) - (A B C E) (A C D E)(B E) - (A C D E) (B C D)(C

53、D) - (A B C E) (A B D E)(C E) - (A B D E) (B C D)(D E) - (A B C E) (B C D)Furthermore, they are sorted (as above)68This data is then split and sent to reducers(A B) - (A C D E) (B C D)(A C) - (A B D E) (B C D)(A D) - (A B C E) (B C D)(B C) - (A B D E) (A C D E)(B D) - (A B C E) (A C D E)(B E) - (A C

54、 D E) (B C D)(C D) - (A B C E) (A B D E)(C E) - (A B D E) (B C D)(D E) - (A B C E) (B C D)69Each reducer will intersect the reducer will intersect the list of value on each line:list of value on each line:The first linefirst line:(A B) - (A C D E) (B C D)will thus become:(A B) - (C D)70Each reducer

55、will intersect the reducer will intersect the list of value on each line:list of value on each line:The second linesecond line:(A C) - (A B D E) (B C D)will thus become:(A C) - (B D) and and so on.so on.71The final result is:(A B) - (C D)(A C) - (B D)(A D) - (B C)(B C) - (A D E)(B D) - (A C E)(B E)

56、- (C D)(C D) - (A B E)(C E) - (B D)(D E) - (B C)72Key Key ValueValueThe final result is:(A B) - (C D)(A C) - (B D)(A D) - (B C)(B C) - (A D E)(B D) - (A C E)(B E) - (C D)(C D) - (A B E)(C E) - (B D)(D E) - (B C)73Having calculated this information, we know the friends in common between any pairs of

57、persons.For exampleFor example: A and D have the friends B and C in commonKey Key ValueValueIn this example, we have explained how the MapReduceMapReduce framework can be used to calculate common friends in a social network.Why doing this?Why doing this? Big social networks such as LinkedInLinkedIn

58、have a lot of money. By precaculating (预先计算) ) information about common friends, a social network can provide the information more quickly to users. This can be recalculated every day.7475In the last 2000 yearslast 2000 years, science was mostly empirical.In recent decades, recent decades, computati

59、onal science (计算科学 ) has emerged where computerscomputers are used to simulate complex phenomenasimulate complex phenomena.Science may now combine: theory, experiment, and simulation (仿真)764.84.8Generic problems involving data, in Generic problems involving data, in science:science: Collecting exper

60、imental data. Managing very large volumes of data. Building and executing models. Integrating data and literature. Documenting experiments. Sharing the data with others; data preservation for long periods of time.77All these activities require powerful computing systems.All these activities require powe

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论