




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
IntroductiontoclustercomputingresourcesforNCNXufengWangElectricalandComputerEngineeringPurdueUniversityWestLafayette,IN47906IntroductionWelcome!ThispresentationisdesignedtohelppeoplegetfamiliarwithNCNcomputationalclusterresources.Youwilllearnwhatiscluster,itscomponents,andothers.2TableofcontentsPrelude:understandclustercomputingfromhumanthinkingClustercomponent#1:clustercomputingnodesClustercomponent#2:PublicBatchSystem(PBS)Clustercomponent#3:front-endmachinesNCNresourcesoverviewReferences3AsimpleproblemProblem“Ihave3redboxeswith10pensineachofthemand4blackboxeswith2pensineachofthem.HowmanypensdoIhaveintotal?”4CriticalelementsofthinkingDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.Writeproblemonapieceofpaper:”3*10+4*2=?”.Problemisthusstoredonthepaper.Myeyesreadtheproblem,”3*10+4*2=?"isstored,orbuffered,inmybrain,readytobecomputed.Mybrainbeginstocompute:3*10+4*2=38Igottheanswer!Result“38”isbufferedinmybrain.Mybrainsignalsmyhandtowritedowntheresult.Resultisthusstoredonthepaper.Icanforgetaboutthebufferedresult“38”inmybrainnow,asitiswrittendownonthepaper.5Criticalelementsofthinking6PaperProblemMathmaticalExpressionMemorypowerofbrainComputingpowerofbrainDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.3. Myeyesreadtheproblem,”3*10+4*2=?"isstored,orbuffered,inmybrain,readytobecomputed.4. Mybrainbeginstocompute:3*10+4*2=385. Igottheanswer!Result“38”isbufferedinmybrain.6. Mybrainsignalsmyhandtowritedowntheresult.Resultisthusstoredonthepaper.2. Writeproblemonapieceofpaper:”3*10+4*2=?”.Problemisthusstoredonthepaper.7. Icanforgetaboutthebufferedresult“38”inmybrainnow,asitiswrittendownonthepaper.Criticalelementsofcomputer’sthinking7ProblemMATLABscriptMemorypowerofcomputerComputingpowerofcomputerDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.FilestoredinharddriveKeycharacteristicsMathmaticalexpression/MATLABscript[ComputerLanguage]Bothareintermediatethattranslateshuman’sabstractthinkingintoalanguageconvenientforcomputationandreadablebyothers.Paper/Filestoredonharddrive[Filestoragesystem]Botharephysicalitemsthatcanrecordinformation.Memorypowerofbrain/computer[RandomAccessMemory]Botharealsophysicalitemsthatcanrecord,butmuchfasterandprecious.Computingpowerofbrain/computer[CPU]Bothcancompute,thatis,processtheinformation.However,itcanonlyprocessinformationfromcertainphysicalmemory.8ComponentsonamodernASUSmotherboard9ProblemMATLABscriptHardDriveConnectorRAMsockets(yellow&black)MountedCPUinsideNBSBUSBNeedforcomputerclustersHereatNCN,weneedcomputingresourcesthatcan:Solvelargeamountofproblemsatthesametime.Servelargeamountofusersatthesametime.Basedonourunderstandingofsinglecorecomputer,howdoweexpandittosuitourneeds?Well,ofcourse,theobviousansweris:IfwesimplygetNsinglecorecomputersystems,wecanallowuptoNuserstosolveNproblemsatthesametime!Let’slookatascenariowhich2usersaretryingtosolve3problemssimultaneously.102userswith3problemsBasedonourpreviousidea,wenowhavethreeindependentandidenticalcomputerssolving3problemsfrom2users.But,isitefficient?11Problem1HardDriveforUserAP_1.mCPURAMProblem2HardDriveforUserAP_2.mCPURAMProblem3HardDriveforUserBP_3.mCPURAMHardDriveStorageExplained“Harddrive”and“RandomAccessMemory”(RAM)bothhasthecapabilitytostoreinformation.Whyweneedtohavetwomemoryunits?What’stheirdifference?12HarddriveRAMUsualsizeInordersofGBorTB8GB–128GBRead/writespeedSlowFastStructurePlatterwitharm“needle”SolidstatetransistorsVolatile?NoYesPriceLowHigh“Harddrive”isthusidealforstoringLargeamountofdata(largesize,lowcost)Datathathaslowread-writedemand(slowI/Orate)Long-termdata(non-volatile)RAMstorageexplainedHowever,whendoingintensivecomputation:thecommunicationbetweenmemorytoCPUshallberapid,veryfastI/Oneeded.onlyusedvariablesarestoredinmemory,thusthememorydoesn’thavetobelarge.memoryistemporary.Volatilememoryisok.RAMisthusidealforsuchsituation,andthatiswhywehavetwoformsofmemorystorageinacomputer.13HarddriveRAMUsualsizeInordersofGBorTB8GB–128GBRead/writespeedSlowFastStructurePlatterwitharm“needle”SolidstatetransistorsVolatile?NoYesPriceLowHighEPluribusUnumMemorystoragecanbesharedamongusers,aslongastheinformationarewellmanagedsousers’fileswon’tmixedup.14Problem1CPUProblem2CPUProblem3CPU1MBof500GBused4GBof8GBusedAdditionalofproblemswithoutIncreasingtheCost?15Problem1CPUProblem2CPUProblem3CPUProblem41.5MBof500GBused6GBof8GBused4problemscannotbeefficientlysolvedon3CPUssimultaneously.Wehowevercansolve3problemsfirstandthentheremainingonewheneveraCPUbecomesfree.It’slikedinningatabusyrestaurant:youneedtotakeyourorderandwaittobeseated.WhenasingleCPUtakesmultiplejobs
IfasingleCPUhasmultipletasksatthesametime(commonscenarioindesktopcomputers),itwillsimplyprocessonetaskforaveryshortmoment,stop,andgoprocessthenexttaskforaveryshortmoment,andsoon.Thisrapidprocessingofalltasksinsuccessiongivesauseranillusionthatalltasksarebeingprocessedatthesametime.Asthenumberofjobsincreases,moretimeisspentonCPUI/Ocommunication.JobswillbecomeslowerduetolongerwaittimetobeservedbyCPUandhigherI/Orequests.16CPUProcess#1Process#2Process#3Process#4Process#5Solving4problemswith3CPUs17Problem1CPUProblem2CPUProblem3CPUProblem41.5MBof500GBused6GBof8GBusedManagewhichjobtobesubmittedtoCPUsPBSScientificcomputationrequiresdedicatedCPU(s)tooneprocess.Thus,amanagementsystemisneededtoensureproperassignmentofCPUtoeachtask.ThisistheconceptofPublicBatchSystem(PBS)Clustercomponents18Problem1CPUProblem2CPUProblem3CPUProblem4PBSUserswrite,edit,andmanagefiles.Storelargeamountoffiles.Preparescriptsforrunning.Manageuser’srequest(numberofCPUs,RAMsize,etc.)CoordinatetaskswithcomputationalresourcesProviderawcomputationpowerFront-endMachinePBSClustersClustersexplained“Compute!Compute!Compute!”Inourdefinition,“clusters”aregroupsofRAMandCPUswiththeirsupportingcomponentstoproviderawcomputationalpower.19CPUCPUCPUOursimpleexamplehere:3CPUssharing1RAMisfarnotenoughtobeacomputationpowerhorse.Howdoweexpandthemtomakeahugeclustertoaccommodatelargeamountofcomputationaljobs?ToPBSAclusternodeRAMiscappedat8GBmaxforourCPUs.ThemoreCPUsattachedtoaRAM,thelessshareofmemoryeachCPUwillhaveinaverage.Inaddition,CPUmanufacturesusuallypack2(dualcore)or4(quadcore)CPUspersocket,with1~2socketssharing1RAM.20CPUCPUCPUSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeFormingasimpleclusterwithnodesOuroriginalgoal:Solvelargeamountofproblemsatthesametime.Servelargeamountofusersatthesametime.WearchivedthegoalbycouplingCPUswithRAMtoformnodes,andexpandthenumberofnodesinservice.Inthissmallmodelcluster,wehave6nodeswith8CPUspernode=48totalCPUsinservice,averaging16GB/8=2GBRAMperCPUateachnode.Roughly,48problemscanbesolvedatthesametime.21NodeNodeNodeNodeNodeNodeToPBSExploitingthecomputationalresources,inagoodway“Ok,clustersseemtomearejustbunchofcomputerssittingtogether.Howcanthatgivethemacomputationaladvantageoversinglecorecomputers?”Answer:TherealpowerofclusterscomesfromthecouplingofCPUswithinanodeandamongthenodesthemselves.Ouroriginalproblem:“Ihave3redboxeswith10pensineachofthemand4blackboxeswith2pensineachofthem.HowmanypensdoIhaveintotal?”Solve: 3*10+4*2=?22Solve3*10+4*2=?23ToPBSSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeCPU#1>>3*10+4*2=?communications3*10=304*2=830+8=38Solve3*10+4*2=?Uncoupledcalculationscanbedonesimultaneouslytosavetime.Exploitparallelism,butnotdowntomachinelevel,i.e.humanpostprocessingneeded.“Embarrassinglyparallelscheme”.24ToPBSSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeTask#1>>3*10=?Task#2>>4*2=?Task#3>>30+8=?Processmanuallycommunications3*10=30communications4*2=8com.30+8=38waitforCPU#1postprocessCPU#0>>CPU#1do:3*10=?CPU#2do:4*2=?Solve3*10+4*2=?25CPU#1>>3*10=?CPU#2>>4*2=?CPU#0>>CPU#1do:30+8=?sendreceiveMasterCPUSlaveCPUsParallelprogramming:MasterandSlaveconfigurationcom.communications3*10=30communications4*2=830+8=38waitforCPU#1receivesendsendcom.receiveThose“actionsofcollaboration”betweenCPUscannotbearchivedbytraditionalprogramminglanguagesuchasC,C++,MATLAB,andetc.MessagePassingInterface(MPI)MessagePassingInterface,commonlyknownasMPI,isintroducedasadditionallibrariestoseveralpopularexistingcomputerlanguages(C,C++,FORTRAN)toarchivescript-levelparallelprogramming.MPIallowsthecodewritertocontrolthecommunicationbetweenCPUs.“Actions”mentionedpreviouslycanbearchivedbywritingspecificMPIsentenceswithintheprogram.Examples: “sendthisvariablefromCPU#0toCPU#1”–MPI_send “addtheresultsgotfromCPU#1andCPU#2”–MPI_addModernscientificcodeswithMPIcanconsumelargeamountofCPUsandhourstosolvecomplicatedproblems.(OMENforexample)26Howcan10,000CPUsworkfor1program?Nodesneedtocommunicatewitheachother,soCPUsfromseveralnodescantalkviaMPI.Physicalconnectionsneeded.Noteverynodeneedtocommunicatewithallothers.Acertainnetworkconfigurationisthusneeded.Interconnectsareachievedthroughcables,anddifferenttypesofcablenetworkwillyielddifferentperformance27NodeNodeNodeNodeNodeNodeToPBSNodesInterconnectNetwork(GigabitEthernet,Infiniband,etc)InterconnectnetworkperformanceMajorfactorsevaluatingtheperformanceofinterconnectcables:Transferrate:howmuchdatacanthecabletransferpersecond?Latency:howmuchdelaydoeachtransferoverthecablehas?ThreekindsofcablesaredeployedonPurdueclustersGigabitEthernet:1GB/secwithlowlatency.(steele,pete,etc.)Infiniband:10GB/secwithultralowlatency.(steele,non-NCN)10GigabitEthernet:10Gb/secwithultralowlatency.(Coates)ThingsworthtomentionSerialprogramsdonotbenefitfromtheseinterconnectcables;MPIprogramsthatneedslotsofI/ObetweenCPUswilldo.UtilizingInfinibandmayrequireextracompilinglibrary.28Clusterssummary29UsertypeSolveproblemsviaofficedesktop/laptopSolveproblemsviaclustersCausalusersShortserialprogramsSlowdownyourcomputer.Unreliable.Fastprocessorsandlargememory.Donotslowdownyourcomputer.IntermediateusersMultiple,long-runserialprogramsRunprogram1by1.Significantlyslowdownyourcomputer.Embarrassinglyparallelyourjobs.FastanddonotslowyourPCdown.AdvancedusersMultiple,long-run,MPIbasedparallelprogramsCannotdoparallelruninsinglecorecomputers.ProgramisdesignedtorunonclusterswithmanyCPUs.TheSteeleclusterClustershavetomeettheneedswithvarioususers,sotheycanbemadetohavedifferentkindsofnodes.30NCNownednodesarealllocatedatSub-Cluster“Steele-A”.NCNalsoownnodesonotherclusterssuchas“Pete”and“Coates”.Detailswillbediscussedlater.Referencesandrecommendations31InterludeMorecompletepictureofentiresystem32FrontendmachineexplainedFront-endmachineisthegatewayforallusers.Itprovidesstorageandallowsuserstocomposite,compile,andmanagetheirfiles.ItisarathercompletecomputeritselfwithitsownCPUsandRAMs.Itisdesignedtoservegreatnumberofusersandstoreextremelyhighvolumeoffiles.33Problem1Problem2Problem3Front-endRAMFront-endCPUSteele’sfront-endmachine34ComparingFront-endmachinetoclusters35Front-endmachineClustersCPURAMCPURAMCharacterSameasclustersSameasfront-endmachineNumberFewAbundantUsercontrolNocontroloverCPUassignmentorRAMsize.TotalcontroloverCPUassignmentandRAMsizeviaPBSParallelcomputingSinglecoreprogramonly.CancompilebutshouldnotrunMPIprograms.MPIprogramscanbecompiledandrunhere.PurposeLightdutyfileediting,management,andcompilingHeavydutycomputationThus,NOcomputationalprogram,ex.MATLAB,onfront-endmachineforheavycalculations.Thisevenincludesdatapost-processing.Forserialjobs,allocatesingleCPUfromclustersviaPBS.FilestoragesolutionsOurmodel“sharedharddrive”isinrealitya“sharednetworkstorage”offeredviaBlueArcsystem.Twotiersofstorageoffering320TBspace.36SharedNetworkStorageNewfilesFibreChanneldisk(fast&expensive)SATAdisk(slow&cheap)RecentfilesOldfilesIfcalledtobeusedIfgetsoldandunusedFortressDXULSystemFortressDXULsystemprovidesasolutiontolong-termstorageforlargefiles.Noactivefilesshallbestoredhere.Nolargecollectionsofsmallfilesshallbestoredhere.Compressthem(viatarballorzip)firstandthenstore.37SharedNetworkStorageFortressDXULSystemLow-costdisksTape/opticaldisksTapecartridgeTapecartridgePrimarycopySecondarycopyForfilessmallerthan0.5MBForfileslargerthan0.5MBFront-endmachinessummary38RegularofficeworkstationFront-endmachinewithBlueArcstorageFortressDXULSystemPrimarystoragesizeDepend(usually100GB-500GB)Largeintotal,butcanbelimitedperperson(1-10GB)Huge,upto5TBperperson.Primarybackup?UsuallynoYesYesSecondarystoragesizeDepend(usuallynosecondharddrive)Scratchdrives(250GB).Large.Second.backup?UsuallynoYesAccessspeedSlow(SATAdrive)Fast(Fibredisk)VeryslowSoftwareavailabilityLimitedAbundantVeryfewPurposeDailyusageGatewaytoclustersLong-termstorageReferencesandrecommendations3
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年车辆抵押贷款信保业务借款协议
- 三年级下册数学教案-第五单元长方形的面积∣北师大版
- 2025年工作室网站合同
- 行业培训外包合同(2篇)
- (高清版)DB45∕T 227-2022 地理标志产品 广西肉桂
- 2011年全国各地高考生物试题分章汇编
- 任务二 高效地下载信息 教学设计 -2023-2024学年桂科版初中信息技术七年级上册
- 第十一课 智能家居教学设计 -2023-2024学年青岛版(2019)初中信息技术第四册
- 第八单元(A卷基础篇)三年级语文下册单元分层训练AB卷(部编版)
- 第六单元-平移、旋转和轴对称(单元测试)-苏教版数学三年级上册(含解析)
- 供应室课件大全
- 银行存管三方协议书
- 2024义务教育道德与法治课程标准(2022版)
- 2024年新人教版化学九年级上册全册课件(新版教材)
- 智能体脂秤市场洞察报告
- 教科版 二年级科学上册第一单元第6课《不同的季节》同步练习(附答案解析)
- 山东省东营市2024年中考英语真题【附真题答案】
- 2024义务教育英语新课标课程标准2022年版考试真题附答案
- 粤港澳宜居城市建设协同发展策略
- 动物防疫服务投标方案(技术方案)
- 2024年新课标全国Ⅰ卷语文高考真题试卷(含答案)
评论
0/150
提交评论