NCN集群计算资源介绍_第1页
NCN集群计算资源介绍_第2页
NCN集群计算资源介绍_第3页
NCN集群计算资源介绍_第4页
NCN集群计算资源介绍_第5页
已阅读5页,还剩45页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

IntroductiontoclustercomputingresourcesforNCNXufengWangElectricalandComputerEngineeringPurdueUniversityWestLafayette,IN47906IntroductionWelcome!ThispresentationisdesignedtohelppeoplegetfamiliarwithNCNcomputationalclusterresources.Youwilllearnwhatiscluster,itscomponents,andothers.2TableofcontentsPrelude:understandclustercomputingfromhumanthinkingClustercomponent#1:clustercomputingnodesClustercomponent#2:PublicBatchSystem(PBS)Clustercomponent#3:front-endmachinesNCNresourcesoverviewReferences3AsimpleproblemProblem“Ihave3redboxeswith10pensineachofthemand4blackboxeswith2pensineachofthem.HowmanypensdoIhaveintotal?”4CriticalelementsofthinkingDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.Writeproblemonapieceofpaper:”3*10+4*2=?”.Problemisthusstoredonthepaper.Myeyesreadtheproblem,”3*10+4*2=?"isstored,orbuffered,inmybrain,readytobecomputed.Mybrainbeginstocompute:3*10+4*2=38Igottheanswer!Result“38”isbufferedinmybrain.Mybrainsignalsmyhandtowritedowntheresult.Resultisthusstoredonthepaper.Icanforgetaboutthebufferedresult“38”inmybrainnow,asitiswrittendownonthepaper.5Criticalelementsofthinking6PaperProblemMathmaticalExpressionMemorypowerofbrainComputingpowerofbrainDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.3. Myeyesreadtheproblem,”3*10+4*2=?"isstored,orbuffered,inmybrain,readytobecomputed.4. Mybrainbeginstocompute:3*10+4*2=385. Igottheanswer!Result“38”isbufferedinmybrain.6. Mybrainsignalsmyhandtowritedowntheresult.Resultisthusstoredonthepaper.2. Writeproblemonapieceofpaper:”3*10+4*2=?”.Problemisthusstoredonthepaper.7. Icanforgetaboutthebufferedresult“38”inmybrainnow,asitiswrittendownonthepaper.Criticalelementsofcomputer’sthinking7ProblemMATLABscriptMemorypowerofcomputerComputingpowerofcomputerDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.FilestoredinharddriveKeycharacteristicsMathmaticalexpression/MATLABscript[ComputerLanguage]Bothareintermediatethattranslateshuman’sabstractthinkingintoalanguageconvenientforcomputationandreadablebyothers.Paper/Filestoredonharddrive[Filestoragesystem]Botharephysicalitemsthatcanrecordinformation.Memorypowerofbrain/computer[RandomAccessMemory]Botharealsophysicalitemsthatcanrecord,butmuchfasterandprecious.Computingpowerofbrain/computer[CPU]Bothcancompute,thatis,processtheinformation.However,itcanonlyprocessinformationfromcertainphysicalmemory.8ComponentsonamodernASUSmotherboard9ProblemMATLABscriptHardDriveConnectorRAMsockets(yellow&black)MountedCPUinsideNBSBUSBNeedforcomputerclustersHereatNCN,weneedcomputingresourcesthatcan:Solvelargeamountofproblemsatthesametime.Servelargeamountofusersatthesametime.Basedonourunderstandingofsinglecorecomputer,howdoweexpandittosuitourneeds?Well,ofcourse,theobviousansweris:IfwesimplygetNsinglecorecomputersystems,wecanallowuptoNuserstosolveNproblemsatthesametime!Let’slookatascenariowhich2usersaretryingtosolve3problemssimultaneously.102userswith3problemsBasedonourpreviousidea,wenowhavethreeindependentandidenticalcomputerssolving3problemsfrom2users.But,isitefficient?11Problem1HardDriveforUserAP_1.mCPURAMProblem2HardDriveforUserAP_2.mCPURAMProblem3HardDriveforUserBP_3.mCPURAMHardDriveStorageExplained“Harddrive”and“RandomAccessMemory”(RAM)bothhasthecapabilitytostoreinformation.Whyweneedtohavetwomemoryunits?What’stheirdifference?12HarddriveRAMUsualsizeInordersofGBorTB8GB–128GBRead/writespeedSlowFastStructurePlatterwitharm“needle”SolidstatetransistorsVolatile?NoYesPriceLowHigh“Harddrive”isthusidealforstoringLargeamountofdata(largesize,lowcost)Datathathaslowread-writedemand(slowI/Orate)Long-termdata(non-volatile)RAMstorageexplainedHowever,whendoingintensivecomputation:thecommunicationbetweenmemorytoCPUshallberapid,veryfastI/Oneeded.onlyusedvariablesarestoredinmemory,thusthememorydoesn’thavetobelarge.memoryistemporary.Volatilememoryisok.RAMisthusidealforsuchsituation,andthatiswhywehavetwoformsofmemorystorageinacomputer.13HarddriveRAMUsualsizeInordersofGBorTB8GB–128GBRead/writespeedSlowFastStructurePlatterwitharm“needle”SolidstatetransistorsVolatile?NoYesPriceLowHighEPluribusUnumMemorystoragecanbesharedamongusers,aslongastheinformationarewellmanagedsousers’fileswon’tmixedup.14Problem1CPUProblem2CPUProblem3CPU1MBof500GBused4GBof8GBusedAdditionalofproblemswithoutIncreasingtheCost?15Problem1CPUProblem2CPUProblem3CPUProblem41.5MBof500GBused6GBof8GBused4problemscannotbeefficientlysolvedon3CPUssimultaneously.Wehowevercansolve3problemsfirstandthentheremainingonewheneveraCPUbecomesfree.It’slikedinningatabusyrestaurant:youneedtotakeyourorderandwaittobeseated.WhenasingleCPUtakesmultiplejobs

IfasingleCPUhasmultipletasksatthesametime(commonscenarioindesktopcomputers),itwillsimplyprocessonetaskforaveryshortmoment,stop,andgoprocessthenexttaskforaveryshortmoment,andsoon.Thisrapidprocessingofalltasksinsuccessiongivesauseranillusionthatalltasksarebeingprocessedatthesametime.Asthenumberofjobsincreases,moretimeisspentonCPUI/Ocommunication.JobswillbecomeslowerduetolongerwaittimetobeservedbyCPUandhigherI/Orequests.16CPUProcess#1Process#2Process#3Process#4Process#5Solving4problemswith3CPUs17Problem1CPUProblem2CPUProblem3CPUProblem41.5MBof500GBused6GBof8GBusedManagewhichjobtobesubmittedtoCPUsPBSScientificcomputationrequiresdedicatedCPU(s)tooneprocess.Thus,amanagementsystemisneededtoensureproperassignmentofCPUtoeachtask.ThisistheconceptofPublicBatchSystem(PBS)Clustercomponents18Problem1CPUProblem2CPUProblem3CPUProblem4PBSUserswrite,edit,andmanagefiles.Storelargeamountoffiles.Preparescriptsforrunning.Manageuser’srequest(numberofCPUs,RAMsize,etc.)CoordinatetaskswithcomputationalresourcesProviderawcomputationpowerFront-endMachinePBSClustersClustersexplained“Compute!Compute!Compute!”Inourdefinition,“clusters”aregroupsofRAMandCPUswiththeirsupportingcomponentstoproviderawcomputationalpower.19CPUCPUCPUOursimpleexamplehere:3CPUssharing1RAMisfarnotenoughtobeacomputationpowerhorse.Howdoweexpandthemtomakeahugeclustertoaccommodatelargeamountofcomputationaljobs?ToPBSAclusternodeRAMiscappedat8GBmaxforourCPUs.ThemoreCPUsattachedtoaRAM,thelessshareofmemoryeachCPUwillhaveinaverage.Inaddition,CPUmanufacturesusuallypack2(dualcore)or4(quadcore)CPUspersocket,with1~2socketssharing1RAM.20CPUCPUCPUSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeFormingasimpleclusterwithnodesOuroriginalgoal:Solvelargeamountofproblemsatthesametime.Servelargeamountofusersatthesametime.WearchivedthegoalbycouplingCPUswithRAMtoformnodes,andexpandthenumberofnodesinservice.Inthissmallmodelcluster,wehave6nodeswith8CPUspernode=48totalCPUsinservice,averaging16GB/8=2GBRAMperCPUateachnode.Roughly,48problemscanbesolvedatthesametime.21NodeNodeNodeNodeNodeNodeToPBSExploitingthecomputationalresources,inagoodway“Ok,clustersseemtomearejustbunchofcomputerssittingtogether.Howcanthatgivethemacomputationaladvantageoversinglecorecomputers?”Answer:TherealpowerofclusterscomesfromthecouplingofCPUswithinanodeandamongthenodesthemselves.Ouroriginalproblem:“Ihave3redboxeswith10pensineachofthemand4blackboxeswith2pensineachofthem.HowmanypensdoIhaveintotal?”Solve: 3*10+4*2=?22Solve3*10+4*2=?23ToPBSSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeCPU#1>>3*10+4*2=?communications3*10=304*2=830+8=38Solve3*10+4*2=?Uncoupledcalculationscanbedonesimultaneouslytosavetime.Exploitparallelism,butnotdowntomachinelevel,i.e.humanpostprocessingneeded.“Embarrassinglyparallelscheme”.24ToPBSSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeTask#1>>3*10=?Task#2>>4*2=?Task#3>>30+8=?Processmanuallycommunications3*10=30communications4*2=8com.30+8=38waitforCPU#1postprocessCPU#0>>CPU#1do:3*10=?CPU#2do:4*2=?Solve3*10+4*2=?25CPU#1>>3*10=?CPU#2>>4*2=?CPU#0>>CPU#1do:30+8=?sendreceiveMasterCPUSlaveCPUsParallelprogramming:MasterandSlaveconfigurationcom.communications3*10=30communications4*2=830+8=38waitforCPU#1receivesendsendcom.receiveThose“actionsofcollaboration”betweenCPUscannotbearchivedbytraditionalprogramminglanguagesuchasC,C++,MATLAB,andetc.MessagePassingInterface(MPI)MessagePassingInterface,commonlyknownasMPI,isintroducedasadditionallibrariestoseveralpopularexistingcomputerlanguages(C,C++,FORTRAN)toarchivescript-levelparallelprogramming.MPIallowsthecodewritertocontrolthecommunicationbetweenCPUs.“Actions”mentionedpreviouslycanbearchivedbywritingspecificMPIsentenceswithintheprogram.Examples: “sendthisvariablefromCPU#0toCPU#1”–MPI_send “addtheresultsgotfromCPU#1andCPU#2”–MPI_addModernscientificcodeswithMPIcanconsumelargeamountofCPUsandhourstosolvecomplicatedproblems.(OMENforexample)26Howcan10,000CPUsworkfor1program?Nodesneedtocommunicatewitheachother,soCPUsfromseveralnodescantalkviaMPI.Physicalconnectionsneeded.Noteverynodeneedtocommunicatewithallothers.Acertainnetworkconfigurationisthusneeded.Interconnectsareachievedthroughcables,anddifferenttypesofcablenetworkwillyielddifferentperformance27NodeNodeNodeNodeNodeNodeToPBSNodesInterconnectNetwork(GigabitEthernet,Infiniband,etc)InterconnectnetworkperformanceMajorfactorsevaluatingtheperformanceofinterconnectcables:Transferrate:howmuchdatacanthecabletransferpersecond?Latency:howmuchdelaydoeachtransferoverthecablehas?ThreekindsofcablesaredeployedonPurdueclustersGigabitEthernet:1GB/secwithlowlatency.(steele,pete,etc.)Infiniband:10GB/secwithultralowlatency.(steele,non-NCN)10GigabitEthernet:10Gb/secwithultralowlatency.(Coates)ThingsworthtomentionSerialprogramsdonotbenefitfromtheseinterconnectcables;MPIprogramsthatneedslotsofI/ObetweenCPUswilldo.UtilizingInfinibandmayrequireextracompilinglibrary.28Clusterssummary29UsertypeSolveproblemsviaofficedesktop/laptopSolveproblemsviaclustersCausalusersShortserialprogramsSlowdownyourcomputer.Unreliable.Fastprocessorsandlargememory.Donotslowdownyourcomputer.IntermediateusersMultiple,long-runserialprogramsRunprogram1by1.Significantlyslowdownyourcomputer.Embarrassinglyparallelyourjobs.FastanddonotslowyourPCdown.AdvancedusersMultiple,long-run,MPIbasedparallelprogramsCannotdoparallelruninsinglecorecomputers.ProgramisdesignedtorunonclusterswithmanyCPUs.TheSteeleclusterClustershavetomeettheneedswithvarioususers,sotheycanbemadetohavedifferentkindsofnodes.30NCNownednodesarealllocatedatSub-Cluster“Steele-A”.NCNalsoownnodesonotherclusterssuchas“Pete”and“Coates”.Detailswillbediscussedlater.Referencesandrecommendations31InterludeMorecompletepictureofentiresystem32FrontendmachineexplainedFront-endmachineisthegatewayforallusers.Itprovidesstorageandallowsuserstocomposite,compile,andmanagetheirfiles.ItisarathercompletecomputeritselfwithitsownCPUsandRAMs.Itisdesignedtoservegreatnumberofusersandstoreextremelyhighvolumeoffiles.33Problem1Problem2Problem3Front-endRAMFront-endCPUSteele’sfront-endmachine34ComparingFront-endmachinetoclusters35Front-endmachineClustersCPURAMCPURAMCharacterSameasclustersSameasfront-endmachineNumberFewAbundantUsercontrolNocontroloverCPUassignmentorRAMsize.TotalcontroloverCPUassignmentandRAMsizeviaPBSParallelcomputingSinglecoreprogramonly.CancompilebutshouldnotrunMPIprograms.MPIprogramscanbecompiledandrunhere.PurposeLightdutyfileediting,management,andcompilingHeavydutycomputationThus,NOcomputationalprogram,ex.MATLAB,onfront-endmachineforheavycalculations.Thisevenincludesdatapost-processing.Forserialjobs,allocatesingleCPUfromclustersviaPBS.FilestoragesolutionsOurmodel“sharedharddrive”isinrealitya“sharednetworkstorage”offeredviaBlueArcsystem.Twotiersofstorageoffering320TBspace.36SharedNetworkStorageNewfilesFibreChanneldisk(fast&expensive)SATAdisk(slow&cheap)RecentfilesOldfilesIfcalledtobeusedIfgetsoldandunusedFortressDXULSystemFortressDXULsystemprovidesasolutiontolong-termstorageforlargefiles.Noactivefilesshallbestoredhere.Nolargecollectionsofsmallfilesshallbestoredhere.Compressthem(viatarballorzip)firstandthenstore.37SharedNetworkStorageFortressDXULSystemLow-costdisksTape/opticaldisksTapecartridgeTapecartridgePrimarycopySecondarycopyForfilessmallerthan0.5MBForfileslargerthan0.5MBFront-endmachinessummary38RegularofficeworkstationFront-endmachinewithBlueArcstorageFortressDXULSystemPrimarystoragesizeDepend(usually100GB-500GB)Largeintotal,butcanbelimitedperperson(1-10GB)Huge,upto5TBperperson.Primarybackup?UsuallynoYesYesSecondarystoragesizeDepend(usuallynosecondharddrive)Scratchdrives(250GB).Large.Second.backup?UsuallynoYesAccessspeedSlow(SATAdrive)Fast(Fibredisk)VeryslowSoftwareavailabilityLimitedAbundantVeryfewPurposeDailyusageGatewaytoclustersLong-termstorageReferencesandrecommendations3

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论