




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
之五兆芳芳创
作IBMPlatformLSF家族装置和配置简介V1.0版马雪洁36363636363636363636363611111111目录1集群结构1单纯LSF情况(命令行提交)1LSF+PAC情况(WEB提交)1LSF+PM情况(PM提交)32LSF装置和根本配置举例333444442.1.7启动/停止LSF进程(三种方法)555661010102.6.4设定Generallimits11121212121313133LSF命令行集成应用示例13CFD++集成(spoolingfile)13CFD++装置和许可证1314添力口CFD++jobstarter14添力口CFDAPPprofile15CFD++命令行提交脚本实例15GAUSSIAN集成方法(spoolingfile)151515Abaqus的脚本集成(bsub命令)15PlatformMPI作业1620ntelMPI作业202021264装置PAC285使用PAC进行应用程序集成29305.2CFD++集成后界面和后台脚本33356装置LicenseScheduler366.2.2映射许可证feature:3737377罕有问题378使用manpage379售后技巧支持371集群结构较大的集群都会设计单独的登录节点,用户只能ssh到登录节点,不克不及直接ssh到集群的任何主节点和计较节点.同时配置用户在计较节点之间的ssh互信,为了并行作业的运行.登录节点也装置LSF,配置为LSF静态Client或MXJ值为0,也即不运行作业的客户端.集群的WEB节点与办公拜访局域网一个网段.如需使用浮动client,主节点网卡需要单纯LSF情况(命令行提交)口LSF主节点(可扩展到3个)作业提交脚本设计流程脚本bsubjobsDesktopLSFFloatClient口LSF主节点(可扩展到3个)作业提交脚本设计流程脚本bsubjobsDesktopLSFFloatClientDesktopLSFFloatClientDesktopLSFFloatClientDesktopLSFFloatClient作业提交脚本设计流程脚本bsubjobs登录节点SFStaticClientri用户隔离计算资源,••中脚本流程中的bsubjobs将作业散到
集群计算节点。LSF+PAC情况(WEB提交)用户通过portal提交作业:管理网络Linux/1口4。0$集群计算网络LSF+PM情况(PM提交)LSF主节点ProcessManagerServerManagerClientManagerClient登录节点(WEBPORTAL)ManagerClientManagerClient2LSF装置和根本配置举例装置前的准备任务NISready;NFS/GPFSready;LSF装置步调Useroottoinstall.GetNISandNFS/GPFSready.取得LSF和PAC装置包lsf8.3_linux2.6glibc2.3x86_64.tar.Zlsf8.3_lsfinstall_linux_x86_64.tar.Zpac8.3_standard_linuxx64.tar.Z解压缩Isfinstall装置脚本文件Putthepackageunder/root/lsf2.2.3首先添加集群办理员Isfadmin.LSF_TOP="/opt/lsf"(装置目录)LSF_ADMINS="lsfadmin"(先创建Isfadmin的用户名)LSF_CLUSTER_NAME="platform"(集群名称,任意指定)LSF_MASTER_LIST="s2s3"(LSF办理节点)LSF_ENTITLEMENT_FILE="/root/lsf/platform_hpc_std_entitlement.dat"(装置源许可证的地址)LSF_TARDIR="/root/lsf/"(装置源文件包的地址)执行装置配置开机自启动hostsetuprhostsetup测试装置装置目录下的/conf目录Addsourceprofile.lsfto/etc/profileLSF_RSH="ssh"启动/停止LSF进程(三种方法)[root@S2conf]#lsfstartup/lsfstop或lsadminlimstatup/limshutdownlsadminresstartup/resshutdownbadminhstartup/hshutdown或lsf_daemonsstart/stop[root@S2conf]#lsidIBMPlatformLSFExpress8.3forIBMPlatformHPC,May10CopyrightPlatformComputingInc.,anIBMCompany,1992.USGovernmentUsersRestrictedRightsUse,duplicationordisclosurerestrictedbyGSAADPScheduleContractwithIBMCorp.MyclusternameisplatformMymasternameiss2Youhavenewmailin/var/spool/mail/root[root@S2conf]#lsloadHOST_NAMEstatusr15sr1mr15mutpglsittmpswpmems2ok0.00.00.01%0.010151G20G61Gs4ok0.00.00.02%0.012183G20G62Gs6okHOST_NAMEstatusr15sr1mr15mutpglsittmpswpmems2ok0.00.00.01%0.010151G20G61Gs4ok0.00.00.02%0.012183G20G62Gs6ok0.00.00.03%0.0123734M2G30Gs5ok0.00.00.05%0.0123468M2G30G测试提交作业bsubsleep100000使能root提交作业enableroottosubmitjob:LSF_ROOT_REX=local重启LSF进程.修改配置文件后reconfig修改lsf.*配置文件后Isadminreconfig修改lsb.*配置文件后badminreconfig部分参数需要重启LSF主调度或其他进程:badminmbdrestart;Isadminlimrestart;Isadminresrestart;badminhrestart日志和debugFindthelogsunderlogdirectory.LSFwillrunmainly3processesoneachnode,onmasternodewillhave2more.Master:lim,res,sbatchd,mbatchd,mbschedCompute:lim,res,sbatchdTurnondebugincommandline:Runlim2directlyonnodetocheckwhylimnotstartup.配置文件说明目录/etc/init.d:目录/apps/platform/8.3/lsf/conf:lsf.conf lsf配置文件lsf.cluster.cluster83集群配置文件lsf.shared 同享资源定义文件./lsbatch/cluster83/configdir/lsb.*调度系统配置文件lsb.users lsf用户与用户组配置文件lsb.queues lsf队列配置文件HOSTSHOSTS=hostGroupCPRIORITYPRIORITY=40Isb.modules Isf模块配置文件经常使用命令bsub:提交作业;bjobs:查抄作业信息;bhist:查抄作业历史;Ishosts:查抄节点静态资源;bhosts,Isload:查抄节点状态和资源信息;bqueues:查抄队列配置;blimits:查抄限制limit信息;lsid:集群版本和主节点;bmod:修改bsuboption;等等.基于资源的调度战略bsub-R"((type==LINUX2.4&&rim<2.0)||(type==AIX&&rim<1.0))”或在队列Isb.queues或或.application文件定义:RES_REQ=select[((type==LINUX2.4&&rim<2.0)||(type==AIX&&rim<i.0))]bsub-R"select[type==any&&swap>=300&&mem>500]order[swap:mem]rusage[swap=300,mem=500]"jobibsub-Rrusage[mem=500:app_lic_v2=i||mem=400:app_lic_vi.5=i]"jobibsub-R"select[type==any&&swp>=300&&mem>500]order[mem]"jobi配置公道竞争调度战略添加轮循调度队列Modifylsb.queues,addfollowingBeginQueueQUEUE_NAME=roundRobinFAIRSHARE=USER_SHARES[[default,1]]#USERS=userGroupADefineyourownusergroupEndQueueRunbadminreconfigtoenablethechange.Runbqueues-ltocheckthequeue’sconfigure添加条理公道竞争战略Addfollowingqueuetoaddhierarchicalsharepolicy:BeginQueueQUEUE_NAME=hierarchicalSharePRIORITY=40USERS=userGroupBuserGroupCFAIRSHARE=USER_SHARES[[userGroupB,7][userGroupC,3]]EndQueue多队列公道竞争战略在Isb.queues中添加下列队列,注意节点组和用户组定义.BeginQueueQUEUE_NAME=verilogDESCRIPTION=masterqueuedefinitioncrossqueuePRIORITY=50FAIRSHARE=USER_SHARES[[user1,100][default,1]]FAIRSHARE_QUEUES=normalshortHOSTS=hostGroupC#resourcecontention#RES_REQ=rusage[verilog=1]EndQueueBeginQueueQUEUE_NAME=shortDESCRIPTION=shortjobsPRIORITY=70#highestRUNLIMIT=510EndQueueBeginQueueQUEUE_NAME=normalDESCRIPTION=defaultqueuePRIORITY=40#lowestHOSTS=hostGroupCEndQueue使能配置badminreconfig提交作业,并查抄队列的用户动态优先级变更:bqueues-rlnormal配置抢占调度战略配置最根本的slots抢占:BeginQueueQUEUE_NAME=shortPRIORITY=70HOSTS=hostGroupC#potentialconflictPREEMPTION=PREEMPTIVE[normal]EndQueueBeginQueueQUEUE_NAME=normalPRIORITY=40HOSTS=hostGroupC#potentialconflictPREEMPTION=PREEMPTABLE[short]EndQueue向两个队列提交作业,查抄被preempt的作业的pending原因.配置全局限制战略限制用户运行的作业数目在Isb.users文件中添加:BeginUserUSER_NAMEMAX_JOBSJL/PTOC\o"1-5"\h\zuser1 4user2 2 1user3 2groupA8groupB@1 1Default2EndUser限制节点运行作业数目在Isb.hosts文件中:BeginHostHOST_NAMEMXJJL/Uhost1 4 2host2 2 1host3!EndHost限制队列作业的运行限制在Isb.queues中添加:BeginQueueQUEUE_NAME=myQueueHJOB_LIMIT=2PJOB_LIMIT=1UJOB_LIMIT=4USERS=userGroupAEndQueue设定Generallimits在Isb.resources文件定义全局generallimits示例:BeginLimitUSERSQUEUESHOSTSSLOTSMEMSWPuser1 hostB20%user2normalhostA20EndLimitBeginLimitNAME=limit1USERS=user1PER_HOST=hostAhostCTMP=30%SWP=50%MEM=10%EndLimitBeginLimitPER_USERQUEUESHOSTSSLOTSMEMSWPTMPJOBSgroupAhgroup1 2user2normal200short 200EndLimit使能配置badminreconfig配置提交控制脚本esub全局esub脚本在作业被提交是调用,可以被自动的或显式的调用从而控制用户作业提交的行动.编辑ject文件在$LSF_SERVERDIR下面(chmod为可执行):#!/bin/shif["_$LSB_SUB_PARM_FILE"!="_"];then.$LSB_SUB_PARM_FILEif["_$LSB_SUB_PROJECT_NAME"=="_"];thenecho"Youmustspecifyaproject!">&2exit$LSB_SUB_ABORT_VALUEfifiexit0在Isf.conf中定义LSB_ESUB_METHOD="project”配置资源办理elim示例报告请示home目录空闲大小编辑elim文件elim.home,放置在$LSF_SERVERDIR下面.chmod为可执行.#!/bin/shwhiletrue;dohome='dfk/home|tail1|awk'{printf"%4.1f",$4/(1024*1024)}'、echo1home$homesleep30done报告请示root进程数目编辑elim.root,放置在$LSF_SERVERDIR下面.chmod为可执行.#!/bin/shwhiletrue;doroot='psef|grepvgrep|grepcAroot'echo1rootprocs$rootsleep30done报告请示应用程序许可证数目#!/bin/shlic_X=0;num=0whiletrue;doonlywantthemastertogatherlic_Xif["$LSF_MASTER"="Y"];thenlic_X='lmstat-a-clic_X.dat|grep…'>&2fionlywanttraining8,training1togathersimptonlicensesif["'hostname'"="training8"\-o"'hostname'"="trainingl"];thennum='lmstat-a-csimpton_lic.dat|grep...'>&2fi#allhostsincludingmaster,willgatherthefollowingroot='ps-efw|grep-vgrep|grep-croot'>>1&2tmp='df-k/var/tmp|grepvar|awk'{print$4/1024}''>&2if["$LSF_MASTER"="Y"];thenecho4lic_X$lic_Xsimpton$numrtprc$roottmp$tmpelseecho3simpton$numrtprc$roottmp$tmpfisleep60done测试elim脚本直接运行./elim.root查抄elim输出是否正确.添加资源定义和资源地图在Isf.shared文件中添加rootprocs定义,并在Isf.clusterresourcesMap中添加资源和节点的映射关系.使能配置:Isadminreconfig;badminreconfig查抄资源数目lsload-l3LSF命令行集成应用示例本节例举几个应用的不合集成方法.使用spooling文件或bsub命令行都可以自由转换.CFD++集成(spooling fi)eCFD++装置和许可证装置路径:ln36204
许可证办事器:ln36204启动许可证办事器:[hpcadmin@mn3650jessi]$sshln36204确认许可证办事器是否正常运行:集成许可证办理elim添加elim办法:(elim全集群只需运行一个,因此只在头节点放置elim脚本便可)在头节点:cd$LSF_SERVERDIR添加如下文件:elim.lic:[root@mn3650jessi]#cd$LSF_SERVERDIR[root@mn3650etc]#pwd修!/b如下的配置文件:totallicences='/gpfs/software/cfdpp/mbin/lmutilImstatac添gp如下一WOtre/cfdpp/mbin/Metacomp.lic|grep"UsersofCFD++_SOLV_Ser"|/bin/cutd''f7'cW_iiketrUeumeric30Y(CFD++License)do在sed0icenms=7g段fs添加如下一行^pp/mbin/lmutillmstatacB/gpfs/sofuWeFe/pfdpp/mbin/Metacomp.lic|/bin/grep"UsersofCFD++_SOLV_Ser"|/bin/cutdfR3SOURCENAME LOCATIONcccf_tic=$((${tOaal]licences}${usedlicences}))echstidlcfd_lic[$ecad_ic}"■■■[roon@s!eep65getc]#lsadminreconfig;badminreconfigdone添加CFD++jobstarter如果使用驷解]晒36可不必添郝/o泓S集成方法使解-starter^MflobsUN=/g可执行文件:e/cfdpp/hpmpi/bin/mpiruncase"$PRESSION"inSINGLE_PRESSION);;DOUBLE_PRESSION);;esacCMD="$*hostfile$LSB_DJOB_HOSTFILE$CFD_CMD"
添力口CFDAPPprofile添加如下配置:BeginApplicationNAME=cfdJOB_STARTER=/opt/lsf/jobstarter/cfd_starterRES_REQ="rusage[cfd_lic=1]"EndApplicationbadmninreconfig使得此文件生效,使用bapp-lcfd查抄是否成功:[root@mn3650bin]#bapplcfdAPPLICATIONNAME:cfdNodescriptionprovided.STATISTICS:NJOBSPENDRUNSSUSPUSUSPRSV12 12 0 0 0 0PARAMETERS:JOB_STARTER:/opt/lsf/jobstarter/cfd_starterRES_REQ:"rusage[cfd_lic=1]"CFD++命令行提交脚本实例#佛后席世<cfd.sh提交作业.#BSUBn12#BSUBappcfd#BIUB:R/g睢缠tfe侬皆sian/cd许可证so砒可豳轨单个作业只能单机运行.#!/bin/sh提交作业:#BSUBqqchem#BSUBn4#BSUBR"span[hosts=1]3#BSUBcwd.s的脚本#!/bin/shJOBNAME='basename"$JOB".comexportg03root=/gpfs/software/GaussianexportGAUSS_SCRDIR=/tmp
exportABAQUS_CMD="/gpfs/software/Abaqus/Commands/abaqus"exportLM_LICENSE_FILE="/gpfs/software/Abaqus/License/abq612.lic"cpunumber,注意要与bsub命令行中n指定的cpu个数一致exportNCPU=16输入文件作业名exportJOB_NAME=abaqus_job3${ABAQUS_CMD}job=$JOB_NAMEcpus=$NCPUinput=\"$INP_INPUT_FILE\"2)通过LSF提交输入数据所在目录,执行bsub命令Amber作业(blaunch集成,可记账)针对intelmpi,编写mpdboot.lsf脚本.变成可执行,放置在$LSF_SERVERDIR下面编写提交作业脚本:[ymei@mnistest]$catnew.sh#!/bin/sh#BSUBqsmall#BSUBn128#BSUBJIMPI#BSUBx#exportPATH=/gpfs01/software/intel/impi/24/intel64/bin:$PATHexportI_MPI_DEVICE=ssm#exportI_MPI_FABRICS=shm:ofa#exportI_MPI_FAST_STARTUP=1#exportI_MPI_DEVICE=rdssmmpdallexit提交作业:PlatformMP作业装置PlatformMPI确认用户无密码拜访sshOK.装置PlatformMPI到同享目录下:shplatform_mpi00320r.x64.shinstalldir=/opt/pmpi-norpm如果缺失CCompiler,执行:yuminstallgccLSF外面验证装置OK设置情况变量:exportMPI_REMSH="sshx"exportMPI_ROOT=/opt/pmpi/opt/ibm/platform_mpi/编译helloworld示例程序:[root@server3help]#/opt/pmpi/opt/ibm/platform_mpi/bin/mpirunf../help/hostswarning:MPI_ROOT/opt/pmpi/opt/ibm/platform_mpi/!=mpirunpath/opt/pmpi/opt/ibm/platform_mpiHelloworld!I'm1of4onserver3Helloworld!I'm0of4onserver3Helloworld!I'm3of4oncomputer007Helloworld!I'm2of4oncomputer007[root@server3help]#cat../help/hostshserver3np2/opt/pmpi/opt/ibm/platform_mpi/help/helloworldhcomputer007np2/opt/pmpi/opt/ibm/platform_mpi/help/helloworld通过LSF提交exportMPI_REMSH=blaunch$mpirunnp4IBV~/helloworld$mpirunnp32IBV~/helloworld$mpirunnp4TCP~/helloworld或[root@server3conf]#bsubo%J.oute.%J.errn4/opt/pmpi/opt/ibm/platform_mpi/bin/mpirunlsb_mcpu_hosts/opt/pmpi/opt/ibm/platform_mpi/help/helloworldJob<210>issubmittedtodefaultqueue<normal>.[root@server3conf]#bjobsJOBIDUSERSTATQUEUEFROM_HOSTEXEC_HOSTJOB_NAMESUBMIT_TIME210rootPENDnormalserver3*elloworldMay910:55[root@server3conf]#cat210.outSender:LSFSystem<jessi@computer007>Subject:Job210:</opt/pmpi/opt/ibm/platform_mpi/bin/mpirunlsb_mcpu_hosts/opt/pmpi/opt/ibm/platform_mpi/help/helloworld>incluster<jessi_cluster>DoneJob</opt/pmpi/opt/ibm/platform_mpi/bin/mpirunlsb_mcpu_hosts/opt/pmpi/opt/ibm/platform_mpi/help/helloworld>wassubmittedfromhost<server3>byuser<root>incluster<jessi_cluster>.Jobwasexecutedonhost(s)<4*computer007>,inqueue<normal>,asuser<root>incluster<jessi_cluster>.</root>wasusedasthehomedirectory.</opt/lsf/conf>wasusedastheworkingdirectory.StartedatThuMay918:49:06ResultsreportedatThuMay918:49:07Yourjoblookedlike:#LSBATCH:Userinput/opt/pmpi/opt/ibm/platform_mpi/bin/mpirunlsb_mcpu_hosts/opt/pmpi/opt/ibm/platform_mpi/help/helloworldSuccessfullycompleted.Resourceusagesummary:CPUtime: 0.23sec.MaxMemory: 2MBAverageMemory: 2.00MBTotalRequestedMemory:DeltaMemory:(Delta:thedifferencebetweentotalrequestedmemoryandactualmaxusage.)TOC\o"1-5"\h\zMaxSwap: 36MBMaxProcesses: 1MaxThreads: 1Theoutput(ifany)follows:Helloworld!I'm2of4oncomputer007Helloworld!I'm0of4oncomputer007Helloworld!I'm1of4oncomputer007Helloworld!I'm3of4oncomputer007PS:Readfile<.210.err>forstderroutputofthisjob.或更多参数$/opt/platform_mpi/bin/mpirunnp120ibvhostlist"cn2cn2cn2cn2cn2cn2cn2cn2cn2cn2"/data/hello_world如果希望MPI作业欠亨过LSF提走运行,修改MPI_USELF情况变量为nOpenmpi作业下载openmpi软件包./configureLIBS=ldlwithlsf=yesprefix=/usr/local/ompi/Openmpi1.3.2之上版本已经于LSFblaunch紧密集成.提交openmpi作业:Intel MP作业Express版本不记账方法如果需要对作业记账,需要使用blaunch的集成方法.exportPATH=/gpfs/software/intel/composerxe/bin/:/gpfs/software/intel/mpi_41_0_024/include:/gpfs/software/intel/mpi_41_0_024/bin64:/gpfs/software/intel/composerxe/mkl:$PATHsource/gpfs/software/intel/composerxe/bin/compilervars.shintel64source/gpfs/software/intel/composerxe/mkl/bin/mklvars.shintel64MPI测试程序#include"mpi.h"#include<stdio.h>#include<math.h>intmain(intargc,char**argv){intmyid,numprocs;intnamelen;charprocessor_name[MPI_MAX_PROCESSOR_NAME];MPI_Init(&argc,&argv);MPI_Comm_rank(MPI_COMM_WORLD,&myid);MPI_Comm_size(MPI_COMM_WORLD,&numprocs);MPI_Get_processor_name(processor_name,&namelen);fprintf(stderr,"HelloWorld!Process%dof%don%s\n",myid,numprocs,processor_name);MPI_Finalize();}命令执行,TCP协议命令执行,旧网络命令执行,Debug模式LSF提交脚本bsub_intelmpi_ib.sh#!/bin/sh#BSUBcwd.#BSUBR"span[ptile=4]"提交作业:bsub<bsub_intelmpi_ib.shExpress版本blaunch记账方法#!/usr/bin/envpython""IImpdbootforLSF[f|hostfilehostfile][i|ifhn=alternate_interface_hostname_of_ip_addressf|hostfilehostfile][h]"""importreimportstringimporttimeimportsysimportgetoptfromtimeimportctimefromos importenviron,pathfromsys importargv,exit,stdoutfrompopen2importPopen4fromsocketimportgethostname,gethostbynamedefmpdboot():#changemeMPI_ROOTDIR="/opt/intel/impi/25"#mpdCmd="%s/bin/mpd"%MPI_ROOTDIRmpdtraceCmd="%s/bin/mpdtrace"%MPI_ROOTDIRmpdtraceCmd2="%s/bin/mpdtracel"%MPI_ROOTDIRnHosts=1host=""ip=""localHost=""localIp=""found=FalseMAX_WAIT=5t1=0hostList=""hostTab={}cols=[]hostArr=[]hostfile=environ.get('LSB_DJOB_HOSTFILE')binDir=environ.get('LSF_BINDIR')ifenviron.get('LSB_MCPU_HOSTS')==None\orhostfile==None\orbinDir==None:print"notrunninginLSF"exit(1)rshCmd=binDir+"/blaunch"p=pile("\S+_\d+\s+\(\d+\.\d+\.\d+\.\d+")#try:opts,args=getopt.getopt(sys.argv[1:],"hf:i:",["help","hostfile=","ifhn="])exceptgetopt.GetoptError,err:printstr(err)usage()sys.exit(1)fileName=Noneifhn=Noneforo,ainopts:ifo=="v":version();sys.exit()elifoin("h","help"):usage()sys.exit()elifoin("f","hostfile"):fileName=aelifoin("i","ifhn"):ifhn=aelse:print"option%sunrecognized"%ousage()sys.exit(1)iffileName==None:ififhn!=None:print"ifhnrequiresahostfilecontaining'hostnameifhn=alternate_interface_hostname_of_ip_address'\n"sys.exit(1)useLSB_DJOB_HOSTFILEfileName=hostfilelocalHost=gethostname()localIp=gethostbyname(localHost)pifhn=pile("\w+\s+\ifhn=\d+\.\d+\.\d+\.\d+")#pifhn=pile("\S+\ifhn=\d+\.\d+\.\d+\.\d+")try:checkthehostfilemachinefile=open(fileName,"r")forlineinmachinefile:ifnotlineorline[0]=='#':continueline=re.split('#',line)[0]line=line.strip()ifnotline:continueifnotpifhn.match(line):#shouldnothaveifhnoptionififhn!=None:print"hostfile%snotvalidforifhn"%(fileName)print"hostfileshouldcontain'hostnameifhn=ip_address'"sys.exit(1)host=re.split(r'\s+',line)[0]ifcmp(localHost,host)==0\orcmp(localIp,gethostbyname(host))==0:continuehostTab[host]=Noneelse:#multipleblaunchescols=re.split(r'\s+\ifhn=',line)host=cols[0]ip=cols[1]ifcmp(localHost,host)==0\orcmp(localIp,gethostbyname(host))==0:continuehostTab[host]=ip#print"line:%s"%(line)machinefile.close()exceptIOError,err:printstr(err)exit(1)launchampdonlocalhostififhn!=None:#cmd=mpdCmd+"ifhn=%s"%(ifhn)cmd="%sn%s%sifhn=%s"%(rshCmd,localHost,mpdCmd,ifhn)else:#cmd=mpdCmdcmd="%sn%s%s"%(rshCmd,localHost,mpdCmd)print"Startinganmpdonlocalhost:",cmdPopen4(cmd,0)waittil5secondsatmaxwhilet1<MAX_WAIT:time.sleep(1)trace=Popen4(mpdtraceCmd2,0)hostname_portnumber(IPaddress)line=trace.fromchild.readline()ifnotp.match(line):t1+=1continuestrings=re.split('\s+',line)(basehost,baseport)=re.split('_',strings[0])#print"host:",basehost,"port:",baseportfound=Truehost=""breakifnotfound:print"Cannotstartmpdonlocalhost"sys.exit(1)else:print"Donestartinganmpdonlocalhost"launchmpdontherestofhostsforhost,ipinhostTab.items():nHosts+=1ifnHosts<2:sys.exit(0)print"Constructinganmpdring..."ififhn!=None:forhost,ipinhostTab.items():#print"host:%sifhn%s\n"%(host,ip)cmd="%s%s%sh%sp%sifhn=%s"%(rshCmd,host,mpdCmd,basehost,baseport,ip)#print"cmd:",cmdPopen4(cmd,0)else:forhost,ipinhostTab.items():#print"host:%sifhn%s\n"%(host,ip)hostArr.append(host+"")hostList=string.join(hostArr)print"hostList:%s"%(hostList)cmd="%sz\'%s\'%sh%sp%s"%(rshCmd,hostList,mpdCmd,basehost,baseport)print"cmd:",cmdPopen4(cmd,0)#waittillallmpdsarestartedMAX_TIMEOUT=300+0.1*(nHosts)t1=0started=Falsewhilet1<MAX_TIMEOUT:time.sleep(1)trace=Popen4(mpdtraceCmd,0)iflen(trace.fromchild.readlines())<nHosts:t1+=1continuestarted=Truebreakifnotstarted:print"Failedtoconstructanmpdring"exit(1)print"Doneconstructinganmpdringat",ctime()defusage():print__doc__ifname=='main':mpdboot()提交作业脚本S;p)o)oil[i[ngfilecpi.sh:#LSBATCH:Userinput#BSUBn2#BSUBPI210105G##BSUBW00:33#BSUBJIMPI#BSUBR'span[ptile=1]'#BSUBx#BSUBm"iquadcore01!rhel55"#BSUBappdjob#exportLSB_DEBUG_CMD="LC_TRACELC_EXECLC_HPC#exportLSB_CMD_LOG_MASK=LOG_DEBUG3exportPATH=/opt/intel/impi/25/bin:$PATH#./usr/share/modules/init/bash#modulepurgesetxmpiexecnp$LSB_DJOB_NUMPROC/tmp/cpi10000mpdallexit提交作业3.7.3Standard版本PAM集成方法[iquadcore01]186%env|grepMPI373.1依照HPC文档配置intelmpi资源Addintelmpiresourcesinlsf.sharedfileandaddintelmpiresourceinlsf.clusterfileforeachhost.ExternalresourcesinIsf.shared:BeginResourceRESOURCE_NMETYPEINTERVALINCREASINGDESCRIPTION*♦*intelmpi Boolean() () (IntelMP工)*♦*EndResourcesYoushouldaddtheintelmpi「e§OLi「cenameundertheRESOURCEScolumnoftheHostsectionOfIsf.cluster.cluster_na.me.Verifywithfollowingcommand:[iquadcore01]189%lshostsHOST_NAMEtypemodelcpufncpusmaxmemmaxswpserverRESOURCESsaspm01X86_64PC6000116.123008M3074MYes(intelmpimpich2mgopenmpi)iquadcore0X86_64Intel_EM60.087974M4094MYes(intelmpimg)(2)修改intelmpi_wrapper中装置路径[saspm01]189%sudovi'whichintelmpi_wrapper'DefinetopdirectoryforIntelMPIMPI_TOPDIR="/scratch/intel/impi/06"DefineMPIcommandsusedinthescriptMPIEXEC_CMD="$MPI_TOPDIR/bin64/mpiexec"MPDEXIT_CMD="$MPI_TOPDIR/bin64/mpdallexit"MPDBOOT_CMD="$MPI_TOPDIR/bin64/mpdboot"CheckIntelMPIversion.Mustbe1.0.2orhigher.checkMPIversion验证MPI在LSF外的可行性[iquadcore01]195%iquadcore01iquadcore01iquadcore01saspm01saspm01saspm01[iquadcore01]196%mpiexecmachinefilep.hostsn4./testHelloworld:rank0of4runningoniquadcore01Helloworld:rank1of4runningoniquadcore01Helloworld:rank2of4runningoniquadcore01Helloworld:rank3of4runningonsaspm01[iquadcore01]197%mpdtraceliquadcore01_42093(00)saspm01_36768(5)3733使用PAM方法提交LSF作业[iquadcore01]200%[iquadcore01]200%bsubIaintelmpin4m"iquadcore01saspm01!"mpirun.lsf./testJob<3814>issubmittedtoqueue<hpc_linux>.<<Waitingfordispatch...>><<Startingonsaspm01>>Helloworld:rank0of4runningonsaspm01Helloworld:rank1of4runningonsaspm01Helloworld:rank2of4runningoniquadcore01Helloworld:rank3of4runningoniquadcore01TIDHOST_NAMECOMMAND_LINESTATUSTERMINATION_TIME00000iquadcore./testDone03/16/20:00:4900001iquadcore./testDone03/16/20:00:4900002saspm01./testDone03/16/20:00:3900003saspm01./testDone03/16/20:00:39[iquadcore01]201%Youcanseethereisno"np4"after"bsubn4mpirun.lsf"3734Debug办法提交命令后添加passDpass3-Tsdebug:bsubIaintelmpin4mpirun.lsf./testpassDpass3TSdebug4装置PAC查抄装置文件,如pac8.3_standard_linuxx64.tar.Z,许可证在装置包中自带,位于NFS同享目录/apps/platform/8.3/pac下.解压缩pac8.3_standard_linuxx64.tar.Z,修改pacexportPAC_TOP="/apps/platform/8.3/pacexportMYSQL_JDBC_DRIVER_JAR="/usr/share/java/mysqlconnectorjava5.1.12.jar"装置mysql,并确认mysql办事启动正常.(yuminstallmysql*y)装置d[6口t和server端,servicemysqldstatus/start/stop(不必执行)修改/opt/lsf/conf/lsbatch/clusterl/configdir/lsb.params参加ENABLE_EVENT_STREAM=ybadminreconfig6)运行pacinstall.sh进行装置(运行之前确认sourceYLSF的情况变量)7) Source换情况变量:(将上面命令添加至U/etc/profile文件结尾,登陆自动source情况)8)使用下面命令启动portal:pmcadminstartperfadminstartall9)使用下面命令查抄否正常启动:#pmcadminlist#perfadminlist10)使用下面地址拜访portal:http://hostipaddress:808011)使用办理员或用户身份登录(NIS用户)12)配置VNC办法,请参考PAC办理员文档.5使用PAC进行应用程序集成PAC集成的概念:配置和设计XML提交页面,在对应的脚本文件中处理XML文件中传递的情况变量.最终生成提交作业的逻辑(/opt/pac/gui/conf/application/published/app.cmd文件的最后):JOB_RESULT='/bin/shc"bsubq$SUB)QUEUES$JOB_NAME_OPT$CWD_OPT${PROJECT_NAME_OPT}${CWD_DIR}${QUEUE_OPT}$NCPU_OPT$LSF_RESREQ$RUNHOST_OPT$APP_PARAMS$EXTRA_PARAMS$OUTPUT_OPT$NASTRAN_CMD$INPUT_OPT$MEMORYARCH_OPT$NASTRAN_PARAMS${NASTRAN_OPTIONS}${MPI_OPTIONS}2>&1"'5.1Gaussian界面集成进程使用Isfadmin登录进入http://hostipaddress:8080/platform/IBMPlatformHPC3.2DashboardDevicesLlnrri^risgEcIDevitESLicsnses-HostProvisioninfgApplies[ionTamplatesRBsourceReports*R&5.0UteeAlertsApplicationTemplatesApplicationCFDpublish尸TemplateNameType匚•ASPCustom□-TAR-3"Built-inIZNWCHEMBuilt-in□MATLAEBuilt-inLS-Ci¥MABuilt-in□HMMERBuilt-inrFLUENTBuilt-in□ECLIPSEBuilt-inClustwWVBuilt-in□CMGLSTARSBuilt-inC\1GLMEXBuilt-in口CMGLGEMBuilt-in选中某现有模板,点击SaveAs为GAUSSIAN模板,进入Modify页面编辑GAUSSIAN模板.选中程序参数部分,点击Add:GMJS吕屿收TemplateName白MSSIAMTypeCuaceitiAppiieationGAUSSIAM,4AddI◎口dMi?.."Edit集等卷射T41mpiHuMamaGAUSSJANTypaCitdamApphuHicnGALJSSLANTypel「二Th%InputTa耳i:DateanriTmeharlinHLTtr-onMU更的InputT41mpiHuMamaGAUSSJANTypaCitdamApphuHicnGALJSSLANTypel「二Th%InputTa耳i:DateanriTmeharlinHLTtr-onMU更的Input「I山SWecd的craflun^DnopdawnItcfSuNexlCancel当作出状态超费时皆知牝
[IDrJnbNalriicalioniJ^jAddL©De<e1e.>Edit强片匏榻文件mm辑入文件* [口」HPUT_FlL£_CtM 加HLocaiFilE|[AddS«rwrFileSubmitTgsiJobSaweAs|CloseSubmitTgsiJob选择DropDownList,然后点击Next,GAUSSiJUJSelectatypefortheheldSimpleBrowse0ArDascr/p^DnUwe1odlowUGErs1os&ledbsing旭riefirfromapn?definecilist设置如下的情况变量和下拉列表值,代表两种gaussian版本.下拉页面点击OK,保管后产生如下:1号213U''二[卜工]耳1却否;=iia-ip£M|Ejh|Wv .Iwh*y・Fp^口 一•丁 -DahvZI«iri-rHi4sii«xF«i e^hMHmJJ[Bmf,TWM-
#D9ijihhurclUt^cm£*nicnIk■'in-l>ni1aslEiWcrtrMid卓%>g*5*鼻部CygiMtfHqIMhiUHEi*I_HAHi—II -BmAiCUIJU3B1AHHm如的加,TriCmtiiM*00“GAvSS^H上3为Mm.胃死髭的iMdjbWe*^lifesfUf!yrruHsitpi'idiMK_[CtJtblkiikaiQi|>E电停■itMF5辱,.文**pDMVT_n_E_c™iAriwoaF*a皿习所中RhLImiItaafcwwiL^iQUIHMO-IQ-HiOHJ1*醇I1M|I•熬刈营皿必一碎即|>HmiIFvsKi+dH«kOh 」.■>,1IE,编辑者可删除或隐藏无用的选择信息.并可以配置默认的下拉列表变量等信息.如下图所示:
DefauItValueG03DefauItValueG03O□HideField4EnableD&penJencies
HfrlpT&kt❷PleaseinputtheGS-VersionTemplateNameGAUSSIANTypeCustomApplicaiionGAUSSIANStihmissionFcirmSubmissionScriptStihmissionFcirmSubmissionScript它送译Gnu的它送译Gnu的3rl版互O[inG3_VrR3IC?N;Cf^Addl-^lDeleteI箴Edit集群域融…一3AcMIdlO-^teIbErir,观南京金…:午」三即[ID-JOB_rJAMF;作作业状态或费时退知去[IDJubHoiifkatimi]0臼[ms@mn3650~]$cat/usr/share/pmc/gui/conf/application/published/GAUSSIAN/GAUSSIAN.cmd#!/bin/sh#numberoftasksperhostSPAN="span[h0sts曲后台脚本,在界面中点击SubmissionScript或直接修改文件为如下:#LSF_RESREQ="select[type==any]"LANG=C#SourceCOMMONfunctons作作业状态或费时退知去[IDJubHoiifkatimi]0臼[ms@mn3650~]$cat/usr/share/pmc/gui/conf/application/published/GAUSSIAN/GAUSSIAN.cmd#!/bin/sh#numberoftasksperhostSPAN="span[h0sts曲后台脚本,在界面中点击SubmissionScript或直接修改文件为如下:#LSF_RESREQ="select[type==any]"LANG=C#SourceCOMMONfunctons口_.${GUI_CONFDIRflpp完后没没提/交测试MO业进行测试.点击AddServer或AddLocalFile添加.com文件.#checkBSUBparametersandcreatefinalbsuboptions点击SubmitTestJob,运行作业.并查抄TestJob运行状况.由于Gaussian设置的执行权限,if["x$JOB_NAME不克不及执行.请使用gaussian用户组用户执
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 教育机构师资管理的法律责任及合规要求
- 教育心理学-指导教师提升教学效果的钥匙
- 2025年中国N-甲基乙酰胺数据监测研究报告
- 智能科技背景下的教育心理学发展趋势
- 探究教育政策变革与教育水平提升的关联性
- 抖音商户主播话术标准执行制度
- 抖音商户市场专员流量渠道拓展制度
- 山东医学高等专科学校《人力资源管理综合实训》2023-2024学年第一学期期末试卷
- 长沙幼儿师范高等专科学校《日本概况》2023-2024学年第一学期期末试卷
- 西安欧亚学院《中国文学经典鉴赏》2023-2024学年第一学期期末试卷
- 国际海域划界测量技术方法
- 市政设施维护服务项目方案
- 横纹肌溶解症课件
- GB/T 23312.1-2009漆包铝圆绕组线第1部分:一般规定
- 交通运输行业建设工程生产安全事故统计调查制度
- SAP联产品生产订单结算过程x
- 2021年呼伦贝尔农垦集团有限公司校园招聘笔试试题及答案解析
- 宫外孕右输卵管妊娠腹腔镜下盆腔粘连分解术、右输卵管妊娠开窗取胚术手术记录模板
- 教科版 科学小学二年级下册期末测试卷及参考答案(基础题)
- 混凝土重力坝设计说明书
- 弱电设备维护保养方案
评论
0/150
提交评论