版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
ParallelProgrammingInstructor:ZhangWeizhe(张伟哲)ComputerNetworkandInformationSecurityTechniqueResearchCenter,SchoolofComputerScienceandTechnology,HarbinInstituteofTechnology2ProgrammingwithOpenMPOutline3WhatIsOpenMP*?OpenMPisasetofextensionstoFortran/C/C++OpenMP是一组Fortran/C/C++的扩展OpenMPcontainscompilerdirectives,libraryroutinesandenvironmentvariables.OpenMP包含编译器指令,库例程和环境变量。Availableonmostsingleaddressspacemachines.在大多数单个地址空间机器上可用。sharedmemorysystems,includingcc-NUMA共享内存系统,包括cc-NUMAChipMultiThreading:ChipMultiProcessing(SunUltraSPARCIV),SimultaneousMultithreading(IntelXeon)芯片多线程:芯片多处理(SunUltraSPARCIV),同时多线程(IntelXeon)notondistributedmemorysystems,classicMPPs,orPCclusters(yet!)不是分布式内存系统,经典MPP或PC集群(还有!)4WhatIsOpenMP*?Compilerdirectivesformultithreadedprogramming用于多线程编程的编译器指令EasytocreatethreadedFortranandC/C++codes容易创建线程Fortran和C/C++代码Supportsdataparallelismmodel支持数据并行模型Incrementalparallelism增量并行Combinesserialandparallelcodeinsinglesource在单个源中组合串行和并行代码5WhatIsOpenMP*?omp_set_lock(lck)#pragmaompparallelforprivate(A,B)#pragmaompcriticalC$OMPparalleldoshared(a,b,c)C$OMPPARALLELREDUCTION(+:A,B)callOMP_INIT_LOCK(ilok)callomp_test_lock(jlok)setenvOMP_SCHEDULE“dynamic”CALLOMP_SET_NUM_THREADS(10)C$OMPDOlastprivate(XX)C$OMPORDEREDC$OMPSINGLEPRIVATE(X)C$OMPSECTIONSC$OMPMASTERC$OMPATOMICC$OMPFLUSHC$OMPPARALLELDOORDEREDPRIVATE(A,B,C)C$OMPTHREADPRIVATE(/ABC/)C$OMPPARALLELCOPYIN(/blk/)Nthrds=OMP_GET_NUM_PROCS()!$OMPBARRIERCurrentspecisOpenMP2.5250Pages(combinedC/C++andFortran)6OpenMPSyntaxMostoftheconstructsinOpenMParecompilerdirectivesorpragmas.
OpenMP中的大多数构造都是编译器指令或编译指示。ForCandC++,thepragmastaketheform:#pragmaompconstruct[clause[clause]…]ForFortran,thedirectivestakeoneoftheforms:C$OMPconstruct[clause[clause]…]!$OMPconstruct[clause[clause]…]*$OMPconstruct[clause[clause]…]Sincetheconstructsaredirectives,anOpenMPprogramcanbecompiledbycompilersthatdon’tsupportOpenMP.由于构造是指令,所以OpenMP程序可以由不支持OpenMP的编译器编译。7OpenMPProgrammingModel
Fork-JoinParallelism:Masterthread
spawns
a
teamofthreads
asneeded.Parallelismisaddedincrementally:i.e.thesequentialprogramevolvesintoaparallelprogram.8OpenMP:HowisOpenMPTypicallyUsed?OpenMPisusuallyusedtoparallelizeloops:OpenMP通常用于并行化循环:
Findyourmosttimeconsumingloops.找到最耗时的循环。Splitthemupbetweenthreads.在线程之间拆分它们。voidmain(){doubleRes[1000];#pragmaompparallelforfor(inti=0;i<1000;i++){do_huge_comp(Res[i]);}}voidmain(){doubleRes[1000];for(inti=0;i<1000;i++){do_huge_comp(Res[i]);}}Split-upthisloopbetweenmultiplethreadsSequentialprogramParallelprogram9OpenMPvs.POSIXThreadsPOSIXthreadsistheotherwidelyusedsharedprogrammingAPI.
POSIX线程是另一个广泛使用的共享编程API。Fairlywidelyavailable,usuallyquitesimpletoimplementontopofOSkernelthreads.相当广泛的可用性,通常在OS内核线程之上实现非常简单。LowerlevelofabstractionthanOpenMP比OpenMP的抽象级别更低libraryroutinesonly,nodirectives只有库程序,没有指令moreflexible,buthardertoimplementandmaintain更灵活,更难实施和维护OpenMPcanbeimplementedontopofPOSIXthreadsOpenMP可以在POSIX线程之上实现Notmuchdifferenceinavailability可用性差异不大notthatmanyOpenMPC++implementations不是那么多的OpenMPC++实现nostandardFortraninterfaceforPOSIXthreads没有标准的Fortran接口用于POSIX线程10OpenMPConstructs构造OpenMP’sconstructsfallinto5categories:OpenMP的构造分为五类:ParallelRegions平行区域Worksharing工作集DataEnvironment数据环境Synchronization同步Runtimefunctions/environmentvariables运行时功能/环境变量OpenMPisbasicallythesamebetweenFortranandC/C++Fortran和C/C++之间的OpenMP基本相同11OpenMP:ParallelRegionsYoucreatethreadsinOpenMPwiththe“ompparallel”pragma.Forexample,Tocreatea4-threadParallelregion:Eachthreadcallspooh(ID,A)forID=0to3doubleA[1000];omp_set_num_threads(4);#pragmaompparallel{intID=omp_get_thread_num();pooh(ID,A);}Eachthreadredundantlyexecutesthecodewithinthestructuredblock12HowManyThreads?Setenvironmentvariablefornumberofthreads setOMP_NUM_THREADS=4ThereisnostandarddefaultforthisvariableManysystems:#ofthreads=#ofprocessors#个线程=#个处理器Intel®compilersusethisdefault1314OpenMP:Work-SharingConstructsSplitsloopiterationsintothreads将循环迭代拆分成线程Mustbeintheparallelregion必须在并行区域Mustprecedetheloop必须在循环之前#pragmaompparallel#pragmaompforfor(I=0;I<N;I++){NEAT_STUFF(I);}Bydefault,thereisabarrierattheendofthe“ompfor”.Usethe“nowait”clausetoturnoffthebarrier.15Work-sharingConstructThreadsareassignedanindependentsetofiterations线程被分配一组独立的迭代Threadsmustwaitattheendofwork-sharingconstruct线程必须在工作共享结构的末尾等待#pragmaompparallel#pragmaompforImplicitbarrieri=1i=2i=3i=4i=5i=6i=7i=8i=9i=10i=11i=12#pragmaompparallel#pragmaompforfor(i=1,i<13,i++)c[i]=a[i]+b[i]16WorkSharingConstructs
Amotivatingexamplefor(i=0;I<N;i++){a[i]=a[i]+b[i];}#pragmaompparallel{intid,i,Nthrds,istart,iend;id=omp_get_thread_num();Nthrds=omp_get_num_threads();istart=id*N/Nthrds;iend=(id+1)*N/Nthrds;for(i=istart;I<iend;i++){a[i]=a[i]+b[i];}}#pragmaompparallel#pragmaompforschedule(static)for(i=0;I<N;i++){a[i]=a[i]+b[i];}OpenMPparallelregionandawork-sharingforconstructSequentialcodeOpenMPParallelRegionOpenMPParallelRegionandawork-sharingforconstruct17AssigningIterationstoThreads
Thescheduleclauseofthefordirectivedealswiththeassignmentofiterationstothreads.for指令的schedule子句处理对线程的迭代的分配。Thegeneralformofthescheduledirectiveisschedule(scheduling_class[,parameter]).OpenMPsupportsfourschedulingclasses:static,dynamic,guided,andruntime.OpenMP支持四个调度类:静态,动态,引导和运行。18AssigningIterationstoThreads:Example/*staticschedulingofmatrixmultiplicationloops*/#pragmaompparalleldefault(private)shared(a,b,c,dim)\num_threads(4)#pragmaompforschedule(static)for(i=0;i<dim;i++){for(j=0;j<dim;j++){c(i,j)=0;for(k=0;k<dim;k++){c(i,j)+=a(i,k)*b(k,j);}}}19ScheduleClauseWhenToUseSTATICPredictableandsimilarworkperiteration每次迭代可预测和类似的工作DYNAMICUnpredictable,highlyvariableworkperiteration不可预测,高度可变的工作每次迭代GUIDEDSpecialcaseofdynamictoreduceschedulingoverhead动态减少调度开销的特殊情况WhichScheduletoUse20ParallelSectionsIndependentsectionsofcodecanexecuteconcurrently代码的独立部分可以同时执行SerialParallel#pragmaompparallelsections{#pragmaompsectionphase1();#pragmaompsectionphase2();#pragmaompsectionphase3();}21DataEnvironmentOpenMPusesashared-memoryprogrammingmodelOpenMP使用共享内存编程模型Mostvariablesaresharedbydefault.大多数变量默认共享。Globalvariablesaresharedamongthreads线程之间共享全局变量C/C++:Filescopevariables,static文件范围变量,静态But,noteverythingisshared...StackvariablesinfunctionscalledfromparallelregionsarePRIVATEAutomaticvariableswithinastatementblockarePRIVATELoopindexvariablesareprivate(withexceptions)C/C+:Thefirstloopindexvariableinnestedloopsfollowinga#pragmaompfor22DataScopeAttributesThedefaultstatuscanbemodifiedwith可以修改默认状态default(shared|none)Scopingattributeclauses作用域属性条款shared(varname,…)private(varname,…)23ThePrivateClauseReproducesthevariableforeachthread复制每个线程的变量Variablesareun-initialized;C++objectisdefaultconstructed变量未初始化;C++对象是默认构造的Anyvalueexternaltotheparallelregionisundefined并行区域外部的任何值未定义void*work(float*c,intN){floatx,y;inti;#pragmaompparallelforprivate(x,y)for(i=0;i<N;i++){ x=a[i];y=b[i]; c[i]=x+y;}}24OpenMP:ReductionAnotherclausethateffectsthewayvariablesareshared:另一个影响变量共享方式的条款:
reduction(op:list)Thevariablesin“list”mustbesharedintheenclosingparallelregion.“列表”中的变量必须在封闭的并行区域中共享。Insideaparalleloraworksharingconstruct:在平行或作业分配结构中:Alocalcopyofeachlistvariableismadeandinitializeddependingonthe“op”(e.g.0for“+”)每个列表变量的本地副本根据“op”(例如0表示“+”)进行初始化,pairwise“op”isupdatedonthelocalvalue成对的“op”更新为本地值Localcopiesarereducedintoasingleglobalcopyattheendoftheconstruct.本地副本的构造末尾还原成一个单一的全局拷贝。25OpenMP:AnReductionExample#include<omp.h>#defineNUM_THREADS2voidmain(){ inti; doubleZZ,func(),sum=0.0;
omp_set_num_threads(NUM_THREADS)
#pragmaompparallelforreduction(+:sum)private(ZZ) for(i=0;i<1000;i++){ ZZ=func(i); sum=sum+ZZ; }}26ImplicitBarriersSeveralOpenMP*constructshaveimplicitbarriers几个OpenMP*构造具有隐含的障碍parallelforsingleUnnecessarybarriershurtperformance不必要的障碍伤害了表现Waitingthreadsaccomplishnowork!等待没有工作的线程!Suppressimplicitbarriers,whensafe,withthenowaitclause使用nowait条款来抑制隐含的障碍27OpenMP:SynchronizationOpenMPhasthefollowingconstructstosupportsynchronization:OpenMP具有以下支持同步的结构:
barrier屏障criticalsection关键部分atomic原子flushordered
singlemaster28BarrierConstructExplicitbarriersynchronization显式屏障同步Eachthreadwaitsuntilallthreadsarrive每个线程等待直到所有线程到达#pragmaompparallelshared(A,B,C)
{
DoSomeWork(A,B);
printf(“ProcessedAintoB\n”);
#pragmaompbarrier
DoSomeWork(B,C);
printf(“ProcessedBintoC\n”);
}29AtomicConstructSpecialcaseofacriticalsection关键部分的特殊情况Appliesonlytosimpleupdateofmemorylocation仅适用于简单更新内存位置#pragmaompparallelforshared(x,y,index,n)for(i=0;i<n;i++){#pragmaompatomicx[index[i]]+=work1(i);y[i]+=work2(i);}
30CriticalandAtomicOnlyonethreadatatimecanenteracriticalsection一次只能有一个线程进入关键部分C$OMPPARALLELDOPRIVATE(B)C$OMP&SHARED(RES)DO100I=1,NITERS B=DOIT(I)C$OMPCRITICAL CALLCONSUME(B,RES)C$OMPENDCRITICAL100CONTINUEC$OMPPARALLELPRIVATE(B) B=DOIT(I)C$OMPATOMIC X=X+BC$OMPENDPARALLELAtomicisaspecialcaseofacriticalsectionthatcanbeusedforcertainsimplestatements:Atomic是一个关键部分的特殊情况,可用于某些简单的语句:3132MasterdirectiveThemasterconstructdenotesastructuredblockthatisonlyexecutedbythemasterthread.Theotherthreadsjustskipit(noimpliedbarriersorflushes).主体结构表示仅由主线程执行的结构化块。其他线程只是跳过它(没有屏障或刷新)。#pragmaompparallelprivate(tmp){do_many_things();#pragmaompmaster{exchange_boundaries();}#pragmabarrierdo_many_other_things();}33SingledirectiveThesingleconstructdenotesablockofcodethatisexecutedbyonlyonethread.单个构造表示仅由一个线程执行的代码块。Abarrierandaflushareimpliedattheendofthesingleblock.在单个块的末尾隐含屏障和刷新。#pragmaompparallelprivate(tmp){ do_many_things();
#pragmaompsingle {exchange_boundaries();} do_many_other_things();}34OpenMP:LibraryroutinesLockroutinesomp_init_lock(),omp_set_lock(),omp_unset_lock(),omp_test_lock()Runtimeenvironmentroutines:Modify/Checkthenumberofthreadsomp_set_num_threads(),omp_get_num_threads(),omp_get_thread_num(),omp_get_max_threads()Turnon/offnestinganddynamicmodeomp_set_nested(),omp_set_dynamic(),omp_get_nested(),omp_get_dynamic()Areweinaparallelregion?omp_in_parallel()Howmanyprocessorsinthesystem?omp_num_procs()35#include<omp.h>main(){intnthreads,tid;/*Forkateamofthreadsgivingthemtheirowncopiesofvariables*/#pragmaompparallelprivate(nthreads,tid){
/*Obtainthreadnumber*/tid=omp_get_thread_num();printf("HelloWorldfromthread=%d\n",tid);
/*Onlymasterthreaddoesthis*/if(tid==0){nthreads=omp_get_num_threads();printf("Numberofthreads=%d\n",nthreads);}}/*Allthreadsjoinmasterthreadanddisband*/}1.HelloWorld!36#include<pthread.h>#include<stdio.h>#defineNUM_THREADS5void*PrintHello(void*threadid){printf("\n%d:HelloWorld!\n",threadid);pthread_exit(NULL);}intmain(intargc,char*argv[]){pthread_tthreads[NUM_THREADS];intrc,t;for(t=0;t<NUM_THREADS;t++){printf("Creatingthread%d\n",t);rc=pthread_create(&threads[t],NULL,PrintHello,(void*)t);if(rc){printf("ERROR;returncodefrompthread_create()is%d\n",rc);exit(-1);}}
pthread_exit(NULL);}ExampleCode-PthreadCreationandTermination
37
PROGRAMREDUCTIONINTEGERI,NREALA(100),B(100),SUM!SomeinitializationsN=100DOI=1,NA(I)=I*1.0B(I)=A(I)ENDDOSUM=0.0!$OMPPARALLELDOREDUCTION(+:SUM)DOI=1,NSUM=SUM+(A(I)*B(I))ENDDOPRINT*,'Sum=',SUMEND2.ParallelLoopReduction383.Matrix-vectormultiplyusingaparallelloopandcriticaldirective/***Spawnaparallelregionexplicitlyscopingallvariables***/#pragmaompparallelshared(a,b,c,nthreads,chunk)private(tid,i,j,k){#pragmaompforschedule(static,chunk)for(i=0;i<NRA;i++){printf("thread=%ddidrow=%d\n",tid,i);for(j=0;j<NCB;j++)for(k=0;k<NCA;k++)c[i][j]+=a[i][k]*b[k][j];}}39Parallelize:Win32API,PIvoidmain(){doublepi;inti;DWORDthreadID;intthreadArg[NUM_THREADS];for(i=0;i<NUM_THREADS;i++)threadArg[i]=i+1;InitializeCriticalSection(&hUpdateMutex);for(i=0;i<NUM_THREADS;i++){thread_handles[i]=CreateThread(0,0,(LPTHREAD_START_ROUTINE)Pi,&threadArg[i],0,&threadID);}WaitForMultipleObjects(NUM_THREADS,thread_handles,TRUE,INFINITE);pi=global_sum*step;printf("piis%f\n",pi);}#include<windows.h>#defineNUM_THREADS2HANDLEthread_handles[NUM_THREADS];CRITICAL_SECTIONhUpdateMutex;staticlongnum_steps=100000;doublestep;doubleglobal_sum=0.0;voidPi(void*arg){inti,start;doublex,sum=0.0;start=*(int*)arg;step=1.0/(double)num_steps;for(i=start;i<=num_steps;i=i+NUM_THREADS){ x=(i-0.5)*step; sum=sum+4.0/(1.0+x*x);}EnterCriticalSection(&hUpdateMutex);global_sum+=sum;LeaveCriticalSection(&hUpdateMutex);}Doublescodesize!40Solution:KeepitsimpleThreadslibraries:线程库:Pro:ProgrammerhascontrolovereverythingPro:程序员掌握了一切Con:ProgrammermustcontroleverythingCon:程序员必须控制一切ProgrammersscaredawayFullcontrolIncreasedcomplexity增加复杂Sometimesasimpleevolutionaryapproachisbetter有时一个简单的进化方法更好41PIProgram:anexamplestaticlongnum_steps=100000;doublestep;voidmain(){ inti;doublex,pi,sum=0.0; step=1.0/(double)num_steps; for(i=1;i<=num_steps;i++){ x=(i-0.5)*step; sum=sum+4.0/(1.0+x*x); } pi=step*sum;}42OpenMPPIProgram:
ParallelRegionexample(SPMDProgram)#include<omp.h>staticlongnum_steps=100000;doublestep;#defineNUM_THREADS2voidmain(){inti;doublex,pi,sum[NUM_THREADS];step=1.0/(double)num_steps;
omp_set_num_threads(NUM_THREADS);#pragmaompparallel{doublex;intid;id=omp_get_thread_num();for(i=id,sum[id]=0.0;i<num_steps;i=i+NUM_THREADS){ x=(i+0.5)*step; sum[id]+=4.0/(1.0+x*x); }}for(i=0,pi=0.0;i<NUM_THREADS;i++)pi+
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年度电子商务平台视觉设计合同2篇
- 2025年度二零二五年度学校后勤服务外包管理合同
- 玉溪师范学院《晶体管和光电器件》2023-2024学年第一学期期末试卷
- 2024年度运动场地租赁与运动服装销售合同3篇
- 2024年知识产权保护与授权许可合同
- 益阳职业技术学院《高等数学A2》2023-2024学年第一学期期末试卷
- 义乌工商职业技术学院《用户研究与产品定义》2023-2024学年第一学期期末试卷
- 宜春职业技术学院《信号与信息处理实验》2023-2024学年第一学期期末试卷
- 餐饮业活动策划专员合同范本3篇
- 金融行业风险管理师合同范本3篇
- 2024智慧城市城市交通基础设施智能监测技术要求
- 《小学美术微课程资源开发与应用的实践研究》结题报告
- 物理诊断学智慧树知到期末考试答案章节答案2024年温州医科大学
- 2024年辅警招聘考试试题库含完整答案(各地真题)
- 《工程建设标准强制性条文电力工程部分2023年版》
- 2023-2024学年广东省深圳市福田区教科版三年级上册期末考试科学试卷
- 多发性骨折的护理
- 2024年人民日报社招聘笔试参考题库附带答案详解
- 2023-2024学年北京市海淀区七年级(上)期末数学试卷(含解析)
- 虚拟电厂总体规划建设方案
- 调试人员微波技术学习课件
评论
0/150
提交评论