云计算与云数据管理_第1页
云计算与云数据管理_第2页
云计算与云数据管理_第3页
云计算与云数据管理_第4页
云计算与云数据管理_第5页
已阅读5页,还剩162页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

云计算与云数据管理陆嘉恒中国人民大学《先进数据管理》前沿讲习班2023/12/61主要内容2

云计算概述Google云计算技术:GFS,Bigtable和MapreduceYahoo云计算技术和Hadoop云数据管理的挑战2023/12/62人民大学新开的《分布式系统与云计算》课程3

分布式系统概述分布式云计算技术综述分布式云计算平台分布式云计算程序开发2023/12/63第一篇分布式系统概述4第一章:分布式系统入门第二章:客户-服务器端构架第三章:分布式对象第四章:公共对象请求代理结构(CORBA)2023/12/64第二篇云计算综述5第五章:云计算入门

第六章:云服务第七章:云相关技术比较7.1网格计算和云计算7.2Utility计算(效用计算)和云计算7.3并行和分布计算和云计算7.4集群计算和云计算

2023/12/65第三篇云计算平台6第八章:Google云平台的三大技术第九章:Yahoo云平台的技术第十章:Aneka云平台的技术第十一章:Greenplum云平台的技术第十二章:Amazondynamo云平台的技术2023/12/66第四篇云计算平台开发7第十三章:基于Hadoop系统开发第十四章:基于HBase系统开发第十五章:基于GoogleApps系统开发第十六章:基于MSAzure系统开发第十七章:基于AmazonEC2系统开发2023/12/67Cloudcomputing2023/12/682023/12/69Whyweusecloudcomputing?2023/12/610Whyweusecloudcomputing?Case1:WriteafileSaveComputerdown,fileislostFilesarealwaysstoredincloud,neverlost2023/12/611Whyweusecloudcomputing?Case2:UseIEdownload,install,useUseQQdownload,install,useUseC++download,install,use……Gettheservefromthecloud2023/12/612Whatiscloudandcloudcomputing?CloudDemandresourcesorservicesoverInternetscaleandreliabilityofadatacenter.2023/12/613Whatiscloudandcloudcomputing?

CloudcomputingisastyleofcomputinginwhichdynamicallyscalableandoftenvirtualizedresourcesareprovidedasaserveovertheInternet.Usersneednothaveknowledgeof,expertisein,orcontroloverthetechnologyinfrastructureinthe"cloud"thatsupportsthem.

2023/12/614CharacteristicsofcloudcomputingVirtual.software,databases,Webservers,operatingsystems,storageandnetworkingasvirtualservers.Ondemand.addandsubtractprocessors,memory,networkbandwidth,storage.2023/12/615IaaSInfrastructureasaServicePaaSPlatformasaServiceSaaSSoftwareasaServiceTypesofcloudservice2023/12/616SoftwaredeliverymodelNohardwareorsoftwaretomanageServicedeliveredthroughabrowserCustomersusetheserviceondemandInstantScalabilitySaaS2023/12/617ExamplesYourcurrentCRMpackageisnotmanagingtheloadoryousimplydon’twanttohostitin-house.UseaSaaSprovidersuchasS

Youremailishostedonanexchangeserverinyourofficeanditisveryslow.OutsourcethisusingHostedExchange.SaaS2023/12/618PlatformdeliverymodelPlatformsarebuiltuponInfrastructure,whichisexpensiveEstimatingdemandisnotascience!Platformmanagementisnotfun!PaaS2023/12/619ExamplesYouneedtohostalargefile(5Mb)onyourwebsiteandmakeitavailablefor35,000usersforonlytwomonthsduration.UseCloudFrontfromAmazon.Youwanttostartstorageservicesonyournetworkforalargenumberoffilesandyoudonothavethestoragecapacity…useAmazonS3.PaaS2023/12/620ComputerinfrastructuredeliverymodelAplatformvirtualizationenvironmentComputingresources,suchasstoringandprocessingcapacity.

VirtualizationtakenastepfurtherIaaS2023/12/621ExamplesYouwanttorunabatchjobbutyoudon’thavetheinfrastructurenecessarytorunitinatimelymanner.UseAmazonEC2.

Youwanttohostawebsite,butonlyforafewdays.UseFlexiscale.IaaS2023/12/622Cloudcomputingandothercomputingtechniques2023/12/623The21stCenturyVisionOfComputingLeonardKleinrock,oneofthechiefscientistsoftheoriginalAdvancedResearchProjectsAgencyNetwork(ARPANET)projectwhichseededtheInternet,said:“Asofnow,computernetworksarestillintheirinfancy,butastheygrowupandbecomesophisticated,wewillprobablyseethespreadof‘computerutilities’which,likepresentelectricandtelephoneutilities,willserviceindividualhomesandofficesacrossthecountry.”2023/12/624The21stCenturyVisionOfComputingSunMicrosystemsco-founderBillJoyHealsoindicated“Itwouldtaketimeuntilthesemarketstomaturetogeneratethiskindofvalue.Predictingnowwhichcompanieswillcapturethevalueisimpossible.Manyofthemhavenotevenbeencreatedyet.”2023/12/625The21stCenturyVisionOfComputing2023/12/626DefinitionsCloudGridClusterutility2023/12/627DefinitionsCloudGridClusterutilityUtilitycomputingisthepackagingofcomputingresources,suchascomputationandstorage,asameteredservicesimilartoatraditionalpublicutility2023/12/628DefinitionsCloudGridClusterutilityAcomputerclusterisagroupoflinkedcomputers,workingtogethercloselysothatinmanyrespectstheyformasinglecomputer.2023/12/629DefinitionsCloudGridClusterutilityGridcomputingistheapplicationofseveralcomputerstoasingleproblematthesametime—usuallytoascientificortechnicalproblemthatrequiresagreatnumberofcomputerprocessingcyclesoraccesstolargeamountsofdata2023/12/630DefinitionsCloudGridClusterutilityCloudcomputingisastyleofcomputinginwhichdynamicallyscalableandoftenvirtualizedresourcesareprovidedasaserviceovertheInternet.2023/12/631GridComputing&CloudComputingsharealotcommonalityintention,architectureandtechnology

Differenceprogrammingmodel,businessmodel,computemodel,applications,andVirtualization.2023/12/632GridComputing&CloudComputingtheproblemsaremostlythesamemanagelargefacilities;definemethodsbywhichconsumersdiscover,requestanduseresourcesprovidedbythecentralfacilities;implementtheoftenhighlyparallelcomputationsthatexecuteonthoseresources.2023/12/633GridComputing&CloudComputingVirtualizationGriddonotrelyonvirtualizationasmuchasCloudsdo,eachindividualorganizationmaintainfullcontroloftheirresourcesCloudanindispensableingredientforalmosteveryCloud2023/12/6342023/12/6352023/12/636Anyquestionandanycomments?2023/12/636主要内容37

云计算概述Google云计算技术:GFS,Bigtable和MapreduceYahoo云计算技术和Hadoop云数据管理的挑战2023/12/637GoogleCloudcomputingtechniques2023/12/638TheGoogleFileSystem 2023/12/639TheGoogleFileSystem (GFS)AscalabledistributedfilesystemforlargedistributeddataintensiveapplicationsMultipleGFSclustersarecurrentlydeployed.Thelargestoneshave:1000+storagenodes300+TeraBytesofdiskstorageheavilyaccessedbyhundredsofclientsondistinctmachines2023/12/640IntroductionSharesmanysamegoalsaspreviousdistributedfilesystemsperformance,scalability,reliability,etcGFSdesignhasbeendrivenbyfourkeyobservationofGoogleapplicationworkloadsandtechnologicalenvironment2023/12/641Intro:Observations11.Componentfailuresarethenormconstantmonitoring,errordetection,faulttoleranceandautomaticrecoveryareintegraltothesystem2.Hugefiles(bytraditionalstandards)MultiGBfilesarecommonI/Ooperationsandblockssizesmustberevisited2023/12/642Intro:Observations23.MostfilesaremutatedbyappendingnewdataThisisthefocusofperformanceoptimizationandatomicityguarantees4.Co-designingtheapplicationsandAPIsbenefitsoverallsystembyincreasingflexibility2023/12/643TheDesignClusterconsistsofasinglemasterandmultiplechunkserversandisaccessedbymultipleclients2023/12/644TheMasterMaintainsallfilesystemmetadata.namesspace,accesscontrolinfo,filetochunkmappings,chunk(includingreplicas)location,etc.PeriodicallycommunicateswithchunkserversinHeartBeatmessagestogiveinstructionsandcheckstate2023/12/645TheMasterHelpsmakesophisticatedchunkplacementandreplicationdecision,usingglobalknowledgeForreadingandwriting,clientcontactsMastertogetchunklocations,thendealsdirectlywithchunkserversMasterisnotabottleneckforreads/writes2023/12/646ChunkserversFilesarebrokenintochunks.Eachchunkhasaimmutablegloballyunique64-bitchunk-handle.handleisassignedbythemasteratchunkcreationChunksizeis64MBEachchunkisreplicatedon3(default)servers2023/12/647ClientsLinkedtoappsusingthefilesystemAPI.CommunicateswithmasterandchunkserversforreadingandwritingMasterinteractionsonlyformetadataChunkserverinteractionsfordataOnlycachesmetadatainformationDataistoolargetocache.2023/12/648ChunkLocationsMasterdoesnotkeepapersistentrecordoflocationsofchunksandreplicas.Pollschunkserversatstartup,andwhennewchunkserversjoin/leaveforthis.StaysuptodatebycontrollingplacementofnewchunksandthroughHeartBeatmessages(whenmonitoringchunkservers)2023/12/649OperationLogRecordofallcriticalmetadatachangesStoredonMasterandreplicatedonothermachinesDefinesorderofconcurrentoperationsAlsousedtorecoverthefilesystemstate2023/12/650SystemInteractions:

LeasesandMutationOrderLeasesmaintainamutationorderacrossallchunkreplicasMastergrantsaleasetoareplica,calledtheprimaryTheprimarychosestheserialmutationorder,andallreplicasfollowthisorderMinimizesmanagementoverheadfortheMaster2023/12/651AtomicRecordAppendClientspecifiesthedatatowrite;GFSchoosesandreturnstheoffsetitwritestoandappendsthedatatoeachreplicaatleastonceHeavilyusedbyGoogle’sDistributedapplications.NoneedforadistributedlockmanagerGFSchosestheoffset,nottheclient2023/12/652AtomicRecordAppend:How?FollowssimilarcontrolflowasmutationsPrimarytellssecondaryreplicastoappendatthesameoffsetastheprimaryIfareplicaappendfailsatanyreplica,itisretriedbytheclient.Soreplicasofthesamechunkmaycontaindifferentdata,includingduplicates,wholeorinpart,ofthesamerecord2023/12/653AtomicRecordAppend:How?GFSdoesnotguaranteethatallreplicasarebitwiseidentical.Onlyguaranteesthatdataiswrittenatleastonceinanatomicunit.Datamustbewrittenatthesameoffsetforallchunkreplicasforsuccesstobereported.2023/12/654DetectingStaleReplicasMasterhasachunkversionnumbertodistinguishuptodateandstalereplicasIncreaseversionwhengrantingaleaseIfareplicaisnotavailable,itsversionisnotincreasedmasterdetectsstalereplicaswhenachunkserversreportchunksandversionsRemovestalereplicasduringgarbagecollection2023/12/655GarbagecollectionWhenaclientdeletesafile,masterlogsitlikeotherchangesandchangesfilenametoahiddenfile.Masterremovesfileshiddenforlongerthan3dayswhenscanningfilesystemnamespacemetadataisalsoerasedDuringHeartBeatmessages,thechunkserverssendthemasterasubsetofitschunks,andthemastertellsitwhichfileshavenometadata.Chunkserverremovesthesefilesonitsown2023/12/656FaultTolerance:

HighAvailabilityFastrecoveryMasterandchunkserverscanrestartinsecondsChunkReplicationMasterReplication“shadow”mastersprovideread-onlyaccesswhenprimarymasterisdownmutationsnotdoneuntilrecordedonallmasterreplicas2023/12/657FaultTolerance:

DataIntegrityChunkserversusechecksumstodetectcorruptdataSincereplicasarenotbitwiseidentical,chunkserversmaintaintheirownchecksumsForreads,chunkserververifieschecksumbeforesendingchunkUpdatechecksumsduringwrites2023/12/658Introductionto

MapReduce2023/12/659MapReduce:Insight

”Considertheproblemofcountingthenumberofoccurrencesofeachwordinalargecollectionofdocuments”Howwouldyoudoitinparallel?2023/12/660MapReduceProgrammingModel

InspiredfrommapandreduceoperationscommonlyusedinfunctionalprogramminglanguageslikeLisp.Usersimplementinterfaceoftwoprimarymethods:1.Map:(key1,val1)→(key2,val2)2.Reduce:(key2,[val2])→[val3]

2023/12/661Mapoperation

Map,apurefunction,writtenbytheuser,takesaninputkey/valuepairandproducesasetofintermediatekey/valuepairs.e.g.(doc—id,doc-content)DrawananalogytoSQL,mapcanbevisualizedasgroup-byclauseofanaggregatequery.

2023/12/662Reduceoperation

Oncompletionofmapphase,alltheintermediatevaluesforagivenoutputkeyarecombinedtogetherintoalistandgiventoareducer.Canbevisualizedasaggregatefunction(e.g.,average)thatiscomputedoveralltherowswiththesamegroup-byattribute.2023/12/663Pseudo-codemap(Stringinput_key,Stringinput_value)://input_key:documentname//input_value:documentcontentsforeachwordwininput_value: EmitIntermediate(w,"1");reduce(Stringoutput_key,Iteratorintermediate_values)://output_key:aword//output_values:alistofcountsintresult=0;foreachvinintermediate_values: result+=ParseInt(v);Emit(AsString(result));2023/12/664MapReduce:Executionoverview

2023/12/665MapReduce:Example

2023/12/666MapReduceinParallel:Example

2023/12/667MapReduce:FaultToleranceHandledviare-executionoftasks.TaskcompletioncommittedthroughmasterWhathappensifMapperfails?Re-executecompleted+in-progressmaptasksWhathappensifReducerfails?Re-executeinprogressreducetasksWhathappensifMasterfails?Potentialtrouble!!2023/12/668MapReduce:

WalkthroughofOnemoreApplication2023/12/6692023/12/670MapReduce:PageRank

PageRankmodelsthebehaviorofa“randomsurfer”.C(t)istheout-degreeoft,and(1-d)isadampingfactor(randomjump)The“randomsurfer”keepsclickingonsuccessivelinksatrandomnottakingcontentintoconsideration.Distributesitspagesrankequallyamongallpagesitlinksto.Thedampeningfactortakesthesurfer“gettingbored”andtypingarbitraryURL.2023/12/671PageRank:KeyInsights

Effectsateachiterationislocal.i+1thiterationdependsonlyonithiterationAtiterationi,PageRankforindividualnodescanbecomputedindependently2023/12/672PageRankusingMapReduce

UseSparsematrixrepresentation(M)MapeachrowofMtoalistofPageRank“credit”toassigntooutlinkneighbours.TheseprestigescoresarereducedtoasinglePageRankvalueforapagebyaggregatingoverthem.2023/12/673PageRankusingMapReduceMap:distributePageRank“credit”tolinktargetsReduce:gatherupPageRank“credit”frommultiplesourcestocomputenewPageRankvalueIterateuntilconvergenceSourceofImage:Lin20082023/12/674

Phase1:ProcessHTML

Maptasktakes(URL,content)pairsandmapsthemto(URL,(PRinit,list-of-urls))PRinitisthe“seed”PageRankforURLlist-of-urlscontainsallpagespointedtobyURLReducetaskisjusttheidentityfunction2023/12/675

Phase2:PageRankDistribution

Reducetaskgets(URL,url_list)andmany(URL,val)valuesSumvalsandfixupwithdtogetnewPREmit(URL,(new_rank,url_list))Checkforconvergenceusingnonparallelcomponent2023/12/676MapReduce:SomeMoreAppsDistributedGrep.CountofURLAccessFrequency.Clustering(K-means)GraphAlgorithms.IndexingSystemsMapReduceProgramsInGoogleSourceTree2023/12/677MapReduce:Extensionsandsimilarapps

PIG(Yahoo)Hadoop(Apache)DryadLinq(Microsoft)2023/12/678LargeScaleSystemsArchitectureusingMapReduceUserAppMapReduceDistributedFileSystems(GFS)2023/12/679BigTable:ADistributedStorageSystemforStructuredData2023/12/680IntroductionBigTableisadistributedstoragesystemformanagingstructureddata.DesignedtoscaletoaverylargesizePetabytesofdataacrossthousandsofserversUsedformanyGoogleprojectsWebindexing,PersonalizedSearch,GoogleEarth,GoogleAnalytics,GoogleFinance,…Flexible,high-performancesolutionforallofGoogle’sproducts2023/12/681MotivationLotsof(semi-)structureddataatGoogleURLs:Contents,crawlmetadata,links,anchors,pagerank,…Per-userdata:Userpreferencesettings,recentqueries/searchresults,…Geographiclocations:Physicalentities(shops,restaurants,etc.),roads,satelliteimagedata,userannotations,…ScaleislargeBillionsofURLs,manyversions/page(~20K/version)Hundredsofmillionsofusers,thousandsorq/sec100TB+ofsatelliteimagedata2023/12/682WhynotjustusecommercialDB?ScaleistoolargeformostcommercialdatabasesEvenifitweren’t,costwouldbeveryhighBuildinginternallymeanssystemcanbeappliedacrossmanyprojectsforlowincrementalcostLow-levelstorageoptimizationshelpperformancesignificantlyMuchhardertodowhenrunningontopofadatabaselayer2023/12/683GoalsWantasynchronousprocessestobecontinuouslyupdatingdifferentpiecesofdataWantaccesstomostcurrentdataatanytimeNeedtosupport:Veryhighread/writerates(millionsofopspersecond)EfficientscansoverallorinterestingsubsetsofdataEfficientjoinsoflargeone-to-oneandone-to-manydatasetsOftenwanttoexaminedatachangesovertimeE.g.Contentsofawebpageovermultiplecrawls2023/12/684BigTableDistributedmulti-levelmapFault-tolerant,persistentScalableThousandsofserversTerabytesofin-memorydataPetabyteofdisk-baseddataMillionsofreads/writespersecond,efficientscansSelf-managingServerscanbeadded/removeddynamicallyServersadjusttoloadimbalance2023/12/685BuildingBlocksBuildingblocks:GoogleFileSystem(GFS):RawstorageScheduler:schedulesjobsontomachinesLockservice:distributedlockmanagerMapReduce:simplifiedlarge-scaledataprocessingBigTableusesofbuildingblocks:GFS:storespersistentdata(SSTablefileformatforstorageofdata)Scheduler:schedulesjobsinvolvedinBigTableservingLockservice:masterelection,locationbootstrappingMapReduce:oftenusedtoread/writeBigTabledata2023/12/686BasicDataModelABigTableisasparse,distributedpersistentmulti-dimensionalsortedmap(row,column,timestamp)->cellcontentsGoodmatchformostGoogleapplications2023/12/687WebTableExampleWanttokeepcopyofalargecollectionofwebpagesandrelatedinformationUseURLsasrowkeysVariousaspectsofwebpageascolumnnamesStorecontentsofwebpagesinthecontents:columnunderthetimestampswhentheywerefetched.2023/12/688RowsNameisanarbitrarystringAccesstodatainarowisatomicRowcreationisimplicituponstoringdataRowsorderedlexicographicallyRowsclosetogetherlexicographicallyusuallyononeorasmallnumberofmachines2023/12/689Rows(cont.)Readsofshortrowrangesareefficientandtypicallyrequirecommunicationwithasmallnumberofmachines.Canexploitthispropertybyselectingrowkeyssotheygetgoodlocalityfordataaccess.Example: ,,, VS edu.gatech.math,edu.gatech.phys,edu.uga.math,edu.uga.phys2023/12/690ColumnsColumnshavetwo-levelnamestructure:family:optional_qualifierColumnfamilyUnitofaccesscontrolHasassociatedtypeinformationQualifiergivesunboundedcolumnsAdditionallevelsofindexing,ifdesired2023/12/691TimestampsUsedtostoredifferentversionsofdatainacellNewwritesdefaulttocurrenttime,buttimestampsforwritescanalsobesetexplicitlybyclientsLookupoptions:“ReturnmostrecentKvalues”“Returnallvaluesintimestamprange(orallvalues)”Columnfamiliescanbemarkedw/attributes:“OnlyretainmostrecentKvaluesinacell”“KeepvaluesuntiltheyareolderthanKseconds”2023/12/692Implementation–ThreeMajorComponentsLibrarylinkedintoeveryclientOnemasterserverResponsiblefor:AssigningtabletstotabletserversDetectingadditionandexpirationoftabletserversBalancingtablet-serverloadGarbagecollectionManytabletserversTabletservershandlereadandwriterequeststoitstableSplitstabletsthathavegrowntoolarge2023/12/693Implementation(cont.)Clientdatadoesn’tmovethroughmasterserver.Clientscommunicatedirectlywithtabletserversforreadsandwrites.Mostclientsnevercommunicatewiththemasterserver,leavingitlightlyloadedinpractice.2023/12/694TabletsLargetablesbrokenintotabletsatrowboundariesTabletholdscontiguousrangeofrowsClientscanoftenchooserowkeystoachievelocalityAimfor~100MBto200MBofdatapertabletServingmachineresponsiblefor~100tabletsFastrecovery:100machineseachpickup1tabletforfailedmachineFine-grainedloadbalancing:MigratetabletsawayfromoverloadedmachineMastermakesload-balancingdecisions2023/12/695TabletLocationSincetabletsmovearoundfromservertoserver,givenarow,howdoclientsfindtherightmachine?Needtofindtabletwhoserowrangecoversthetargetrow2023/12/696TabletAssignmentEachtabletisassignedtoonetabletserveratatime.Masterserverkeepstrackofthesetoflivetabletserversandcurrentassignmentsoftabletstoservers.Alsokeepstrackofunassignedtablets.Whenatabletisunassigned,masterassignsthetablettoantabletserverwithsufficientroom.2023/12/697APIMetadataoperationsCreate/deletetables,columnfamilies,changemetadataWrites(atomic)Set():writecellsinarowDeleteCells():deletecellsinarowDeleteRow():deleteallcellsinarowReadsScanner:readarbitrarycellsinabigtableEachrowreadisatomicCanrestrictreturnedrowstoaparticularrangeCanaskforjustdatafrom1row,allrows,etc.Canaskforallcolumns,justcertaincolumnfamilies,orspecificcolumns2023/12/698Refinements:CompressionManyopportunitiesforcompressionSimilarvaluesinthesamerow/columnatdifferenttimestampsSimilarvaluesindifferentcolumnsSimilarvaluesacrossadjacentrowsTwo-passcustomcompressionsschemeFirstpass:compresslongcommonstringsacrossalargewindowSecondpass:lookforrepetitionsinsmallwindowSpeedemphasized,butgoodspacereduction(10-to-1)2023/12/699Refinements:BloomFiltersReadoperationhastoreadfromdiskwhendesiredSSTableisn’tinmemoryReducenumberofaccessesbyspecifyingaBloomfilter.AllowsusaskifanSSTablemightcontaindataforaspecifiedrow/columnpair.SmallamountofmemoryforBloomfiltersdrasticallyreducesthenumberofdiskseeksforreadoperationsUseimpliesthatmostlookupsfornon-existentrowsorcolumnsdonotneedtotouchdisk2023/12/6100Refinements:BloomFiltersReadoperationhastoreadfromdiskwhendesiredSSTableisn’tinmemoryReducenumberofaccessesbyspecifyingaBloomfilter.AllowsusaskifanSSTablemightcontaindataforaspecifiedrow/columnpair.SmallamountofmemoryforBloomfiltersdrasticallyreducesthenumberofdiskseeksforreadoperationsUseimpliesthatmostlookupsfornon-existentrowsorcolumnsdonotneedtotouchdisk2023/12/6101主要内容102

云计算概述

Google云计算技术:GFS,Bigtable和MapreduceYahoo云计算技术和Hadoop云数据管理的挑战2023/12/6102Yahoo!Cloudcomputing2023/12/6103babycenterepicuriousSearchResultsoftheFutureLinkedInwebmdGawkerNewYorkTimes2023/12/6104What’sintheHorizontalCloud?CommonApproachestoQA,ProductionEngineering,PerformanceEngineering,DatacenterManagement,andOptimizationID&AccountManagementMonitoring&QoSSharedInfrastructureMetering,Billing,AccountingHorizontalCloudServicesEdgeContentServicese.g.,YCS,YCPIProvisioning&Virtualizatione.g.,EC2BatchStorage&Processinge.g.,Hadoop&PigOperationalStoragee.g.,S3,MObStor,SherpaOtherServicesMessaging,Workflow,virtualDBs&WebservingSecuritySimpleWebServiceAPI’s2023/12/6105Yahoo!CloudStackProvisioning(Self-serve)HorizontalCloudServices…YCSYCPIBrooklynEDGEMonitoring/Metering/SecurityHorizontalCloudServices…HadoopBATCHHorizontalCloudServices…SherpaMOBStorSTORAGEHorizontalCloudServicesVM/OS…APPHorizontalCloudServicesVM/OSyApacheWEBDataHighwayServingGridPHPAppEngine2023/12/6106WebDataManagementLargedataanalysis(Hadoop)Structuredrecordstorage(PNUTS/Sherpa)Blobstorage(SAN/NAS)ScanorientedworkloadsFocusonsequentialdiskI/O$percpucycleCRUDPointlookupsandshortscansIndexorganizedtableandrandomI/Os$perlatencyObjectretrievalandstreamingScalablefilestorage$perGB2023/12/6107TheWorldHasChangedWebservingapplicationsneed:Scalability!PreferablyelasticFlexibleschemasGeographicdistributionHighavailabilityReliablestorageWebservingapplicationscandowithout:ComplicatedqueriesStrongtransactions2023/12/6108PNUTS/SHERPAToHelpYouScaleYourMountainsofData2023/12/6109Yahoo!ServingStorageProblemSmallrecords–100KBorlessStructuredrecords–lotsoffields,evolvingExtremedatascale-TensofTBExtremerequestscale-Tensofthousandsofrequests/secLowlatencyglobally-20+datacentersworldwideHighAvailability-outagescost$millionsVariableusagepatterns-asapplicationsanduserschange

1102023/12/6110ThePNUTS/SherpaSolutionThenextgenerationglobal-scalerecordstoreRecord-orientation:Routing,datastorageoptimizedforlow-latencyrecordaccessScaleout:Addmachinestoscalethroughput(whilekeepinglatencylow)Asynchrony:Pub-subreplicationtofar-flungdatacenterstomaskpropagationdelayConsistencymodel:ReducecomplexityofasynchronyfortheapplicationprogrammerClouddeploymentmodel:Hosted,managedservicetoreduceapptime-to-marketandenableondemandscaleandelasticity1112023/12/6111E75656CA42342EB42521WC66354WD12352EF15677EWhatisPNUTS/Sherpa?E75656CA42342EB42521WC66354WD12352EF15677ECREATETABLEParts( IDVARCHAR, StockNumberINT, StatusVARCHAR …)ParalleldatabaseGeographicreplicationStructured,flexibleschemaHosted,managedinfrastructureA42342EB42521WC66354WD12352EE75656CF15677E1122023/12/6112WhatWillItBecome?E75656CA42342EB42521WC66354WD12352EF15677EE75656CA42342EB42521WC66354WD12352EF15677EE75656CA42342EB42521WC66354WD12352EF15677ECREATETABLEParts( IDVARCHAR, StockNumberINT, StatusVARCHA

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论