并行程序设计中文课件 06 OpenMP多线程程序设计

上传人：o*** IP属地：未知上传时间：2024-12-31 格式：PPT 页数：46 大小：802KB 积分：6 举报 版权申诉

已阅读5页，还剩41页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

ParallelProgrammingInstructor:ZhangWeizhe(张伟哲)ComputerNetworkandInformationSecurityTechniqueResearchCenter,SchoolofComputerScienceandTechnology,HarbinInstituteofTechnology2ProgrammingwithOpenMPOutline3WhatIsOpenMP*?OpenMPisasetofextensionstoFortran/C/C++OpenMP是一组Fortran/C/C++的扩展OpenMPcontainscompilerdirectives,libraryroutinesandenvironmentvariables.OpenMP包含编译器指令，库例程和环境变量。Availableonmostsingleaddressspacemachines.在大多数单个地址空间机器上可用。sharedmemorysystems,includingcc-NUMA共享内存系统，包括cc-NUMAChipMultiThreading:ChipMultiProcessing(SunUltraSPARCIV),SimultaneousMultithreading(IntelXeon)芯片多线程：芯片多处理（SunUltraSPARCIV），同时多线程（IntelXeon）notondistributedmemorysystems,classicMPPs,orPCclusters(yet!)不是分布式内存系统，经典MPP或PC集群（还有！）4WhatIsOpenMP*?Compilerdirectivesformultithreadedprogramming用于多线程编程的编译器指令EasytocreatethreadedFortranandC/C++codes容易创建线程Fortran和C/C++代码Supportsdataparallelismmodel支持数据并行模型Incrementalparallelism增量并行Combinesserialandparallelcodeinsinglesource在单个源中组合串行和并行代码5WhatIsOpenMP*?omp_set_lock(lck)#pragmaompparallelforprivate(A,B)#pragmaompcriticalC$OMPparalleldoshared(a,b,c)C$OMPPARALLELREDUCTION(+:A,B)callOMP_INIT_LOCK(ilok)callomp_test_lock(jlok)setenvOMP_SCHEDULE“dynamic”CALLOMP_SET_NUM_THREADS(10)C$OMPDOlastprivate(XX)C$OMPORDEREDC$OMPSINGLEPRIVATE(X)C$OMPSECTIONSC$OMPMASTERC$OMPATOMICC$OMPFLUSHC$OMPPARALLELDOORDEREDPRIVATE(A,B,C)C$OMPTHREADPRIVATE(/ABC/)C$OMPPARALLELCOPYIN(/blk/)Nthrds=OMP_GET_NUM_PROCS()!$OMPBARRIERCurrentspecisOpenMP2.5250Pages(combinedC/C++andFortran)6OpenMPSyntaxMostoftheconstructsinOpenMParecompilerdirectivesorpragmas.

OpenMP中的大多数构造都是编译器指令或编译指示。ForCandC++,thepragmastaketheform:#pragmaompconstruct[clause[clause]…]ForFortran,thedirectivestakeoneoftheforms:C$OMPconstruct[clause[clause]…]!$OMPconstruct[clause[clause]…]*$OMPconstruct[clause[clause]…]Sincetheconstructsaredirectives,anOpenMPprogramcanbecompiledbycompilersthatdon’tsupportOpenMP.由于构造是指令，所以OpenMP程序可以由不支持OpenMP的编译器编译。7OpenMPProgrammingModel

Fork-JoinParallelism:Masterthread

spawns

teamofthreads

asneeded.Parallelismisaddedincrementally:i.e.thesequentialprogramevolvesintoaparallelprogram.8OpenMP:HowisOpenMPTypicallyUsed?OpenMPisusuallyusedtoparallelizeloops:OpenMP通常用于并行化循环：

Findyourmosttimeconsumingloops.找到最耗时的循环。Splitthemupbetweenthreads.在线程之间拆分它们。voidmain(){doubleRes[1000];#pragmaompparallelforfor(inti=0;i<1000;i++){do_huge_comp(Res[i]);}}voidmain(){doubleRes[1000];for(inti=0;i<1000;i++){do_huge_comp(Res[i]);}}Split-upthisloopbetweenmultiplethreadsSequentialprogramParallelprogram9OpenMPvs.POSIXThreadsPOSIXthreadsistheotherwidelyusedsharedprogrammingAPI.

POSIX线程是另一个广泛使用的共享编程API。Fairlywidelyavailable,usuallyquitesimpletoimplementontopofOSkernelthreads.相当广泛的可用性，通常在OS内核线程之上实现非常简单。LowerlevelofabstractionthanOpenMP比OpenMP的抽象级别更低libraryroutinesonly,nodirectives只有库程序，没有指令moreflexible,buthardertoimplementandmaintain更灵活，更难实施和维护OpenMPcanbeimplementedontopofPOSIXthreadsOpenMP可以在POSIX线程之上实现Notmuchdifferenceinavailability可用性差异不大notthatmanyOpenMPC++implementations不是那么多的OpenMPC++实现nostandardFortraninterfaceforPOSIXthreads没有标准的Fortran接口用于POSIX线程10OpenMPConstructs构造OpenMP’sconstructsfallinto5categories:OpenMP的构造分为五类：ParallelRegions平行区域Worksharing工作集DataEnvironment数据环境Synchronization同步Runtimefunctions/environmentvariables运行时功能/环境变量OpenMPisbasicallythesamebetweenFortranandC/C++Fortran和C/C++之间的OpenMP基本相同11OpenMP:ParallelRegionsYoucreatethreadsinOpenMPwiththe“ompparallel”pragma.Forexample,Tocreatea4-threadParallelregion:Eachthreadcallspooh(ID,A)forID=0to3doubleA[1000];omp_set_num_threads(4);#pragmaompparallel{intID=omp_get_thread_num();pooh(ID,A);}Eachthreadredundantlyexecutesthecodewithinthestructuredblock12HowManyThreads?Setenvironmentvariablefornumberofthreads setOMP_NUM_THREADS=4ThereisnostandarddefaultforthisvariableManysystems:#ofthreads=#ofprocessors＃个线程=＃个处理器Intel®compilersusethisdefault1314OpenMP:Work-SharingConstructsSplitsloopiterationsintothreads将循环迭代拆分成线程Mustbeintheparallelregion必须在并行区域Mustprecedetheloop必须在循环之前#pragmaompparallel#pragmaompforfor(I=0;I<N;I++){NEAT_STUFF(I);}Bydefault,thereisabarrierattheendofthe“ompfor”.Usethe“nowait”clausetoturnoffthebarrier.15Work-sharingConstructThreadsareassignedanindependentsetofiterations线程被分配一组独立的迭代Threadsmustwaitattheendofwork-sharingconstruct线程必须在工作共享结构的末尾等待#pragmaompparallel#pragmaompforImplicitbarrieri=1i=2i=3i=4i=5i=6i=7i=8i=9i=10i=11i=12#pragmaompparallel#pragmaompforfor(i=1,i<13,i++)c[i]=a[i]+b[i]16WorkSharingConstructs

Amotivatingexamplefor(i=0;I<N;i++){a[i]=a[i]+b[i];}#pragmaompparallel{intid,i,Nthrds,istart,iend;id=omp_get_thread_num();Nthrds=omp_get_num_threads();istart=id*N/Nthrds;iend=(id+1)*N/Nthrds;for(i=istart;I<iend;i++){a[i]=a[i]+b[i];}}#pragmaompparallel#pragmaompforschedule(static)for(i=0;I<N;i++){a[i]=a[i]+b[i];}OpenMPparallelregionandawork-sharingforconstructSequentialcodeOpenMPParallelRegionOpenMPParallelRegionandawork-sharingforconstruct17AssigningIterationstoThreads

Thescheduleclauseofthefordirectivedealswiththeassignmentofiterationstothreads.for指令的schedule子句处理对线程的迭代的分配。Thegeneralformofthescheduledirectiveisschedule(scheduling_class[,parameter]).OpenMPsupportsfourschedulingclasses:static,dynamic,guided,andruntime.OpenMP支持四个调度类：静态，动态，引导和运行。18AssigningIterationstoThreads:Example/*staticschedulingofmatrixmultiplicationloops*/#pragmaompparalleldefault(private)shared(a,b,c,dim)\num_threads(4)#pragmaompforschedule(static)for(i=0;i<dim;i++){for(j=0;j<dim;j++){c(i,j)=0;for(k=0;k<dim;k++){c(i,j)+=a(i,k)*b(k,j);}}}19ScheduleClauseWhenToUseSTATICPredictableandsimilarworkperiteration每次迭代可预测和类似的工作DYNAMICUnpredictable,highlyvariableworkperiteration不可预测，高度可变的工作每次迭代GUIDEDSpecialcaseofdynamictoreduceschedulingoverhead动态减少调度开销的特殊情况WhichScheduletoUse20ParallelSectionsIndependentsectionsofcodecanexecuteconcurrently代码的独立部分可以同时执行SerialParallel#pragmaompparallelsections{#pragmaompsectionphase1();#pragmaompsectionphase2();#pragmaompsectionphase3();}21DataEnvironmentOpenMPusesashared-memoryprogrammingmodelOpenMP使用共享内存编程模型Mostvariablesaresharedbydefault.大多数变量默认共享。Globalvariablesaresharedamongthreads线程之间共享全局变量C/C++:Filescopevariables,static文件范围变量，静态But,noteverythingisshared...StackvariablesinfunctionscalledfromparallelregionsarePRIVATEAutomaticvariableswithinastatementblockarePRIVATELoopindexvariablesareprivate(withexceptions)C/C+:Thefirstloopindexvariableinnestedloopsfollowinga#pragmaompfor22DataScopeAttributesThedefaultstatuscanbemodifiedwith可以修改默认状态default(shared|none)Scopingattributeclauses作用域属性条款shared(varname,…)private(varname,…)23ThePrivateClauseReproducesthevariableforeachthread复制每个线程的变量Variablesareun-initialized;C++objectisdefaultconstructed变量未初始化;C++对象是默认构造的Anyvalueexternaltotheparallelregionisundefined并行区域外部的任何值未定义void*work(float*c,intN){floatx,y;inti;#pragmaompparallelforprivate(x,y)for(i=0;i<N;i++){ x=a[i];y=b[i]; c[i]=x+y;}}24OpenMP:ReductionAnotherclausethateffectsthewayvariablesareshared:另一个影响变量共享方式的条款：

reduction(op:list)Thevariablesin“list”mustbesharedintheenclosingparallelregion.“列表”中的变量必须在封闭的并行区域中共享。Insideaparalleloraworksharingconstruct:在平行或作业分配结构中：Alocalcopyofeachlistvariableismadeandinitializeddependingonthe“op”(e.g.0for“+”)每个列表变量的本地副本根据“op”（例如0表示“+”）进行初始化，pairwise“op”isupdatedonthelocalvalue成对的“op”更新为本地值Localcopiesarereducedintoasingleglobalcopyattheendoftheconstruct.本地副本的构造末尾还原成一个单一的全局拷贝。25OpenMP:AnReductionExample#include<omp.h>#defineNUM_THREADS2voidmain(){ inti; doubleZZ,func(),sum=0.0;

omp_set_num_threads(NUM_THREADS)

#pragmaompparallelforreduction(+:sum)private(ZZ) for(i=0;i<1000;i++){ ZZ=func(i); sum=sum+ZZ; }}26ImplicitBarriersSeveralOpenMP*constructshaveimplicitbarriers几个OpenMP*构造具有隐含的障碍parallelforsingleUnnecessarybarriershurtperformance不必要的障碍伤害了表现Waitingthreadsaccomplishnowork!等待没有工作的线程！Suppressimplicitbarriers,whensafe,withthenowaitclause使用nowait条款来抑制隐含的障碍27OpenMP:SynchronizationOpenMPhasthefollowingconstructstosupportsynchronization:OpenMP具有以下支持同步的结构：

barrier屏障criticalsection关键部分atomic原子flushordered

singlemaster28BarrierConstructExplicitbarriersynchronization显式屏障同步Eachthreadwaitsuntilallthreadsarrive每个线程等待直到所有线程到达#pragmaompparallelshared(A,B,C)

{

DoSomeWork(A,B);

printf(“ProcessedAintoB\n”);

#pragmaompbarrier

DoSomeWork(B,C);

printf(“ProcessedBintoC\n”);

}29AtomicConstructSpecialcaseofacriticalsection关键部分的特殊情况Appliesonlytosimpleupdateofmemorylocation仅适用于简单更新内存位置#pragmaompparallelforshared(x,y,index,n)for(i=0;i<n;i++){#pragmaompatomicx[index[i]]+=work1(i);y[i]+=work2(i);}

30CriticalandAtomicOnlyonethreadatatimecanenteracriticalsection一次只能有一个线程进入关键部分C$OMPPARALLELDOPRIVATE(B)C$OMP&SHARED(RES)DO100I=1,NITERS B=DOIT(I)C$OMPCRITICAL CALLCONSUME(B,RES)C$OMPENDCRITICAL100CONTINUEC$OMPPARALLELPRIVATE(B) B=DOIT(I)C$OMPATOMIC X=X+BC$OMPENDPARALLELAtomicisaspecialcaseofacriticalsectionthatcanbeusedforcertainsimplestatements:Atomic是一个关键部分的特殊情况，可用于某些简单的语句：3132MasterdirectiveThemasterconstructdenotesastructuredblockthatisonlyexecutedbythemasterthread.Theotherthreadsjustskipit(noimpliedbarriersorflushes).主体结构表示仅由主线程执行的结构化块。其他线程只是跳过它（没有屏障或刷新）。#pragmaompparallelprivate(tmp){do_many_things();#pragmaompmaster{exchange_boundaries();}#pragmabarrierdo_many_other_things();}33SingledirectiveThesingleconstructdenotesablockofcodethatisexecutedbyonlyonethread.单个构造表示仅由一个线程执行的代码块。Abarrierandaflushareimpliedattheendofthesingleblock.在单个块的末尾隐含屏障和刷新。#pragmaompparallelprivate(tmp){ do_many_things();

#pragmaompsingle {exchange_boundaries();} do_many_other_things();}34OpenMP:LibraryroutinesLockroutinesomp_init_lock(),omp_set_lock(),omp_unset_lock(),omp_test_lock()Runtimeenvironmentroutines:Modify/Checkthenumberofthreadsomp_set_num_threads(),omp_get_num_threads(),omp_get_thread_num(),omp_get_max_threads()Turnon/offnestinganddynamicmodeomp_set_nested(),omp_set_dynamic(),omp_get_nested(),omp_get_dynamic()Areweinaparallelregion?omp_in_parallel()Howmanyprocessorsinthesystem?omp_num_procs()35#include<omp.h>main(){intnthreads,tid;/*Forkateamofthreadsgivingthemtheirowncopiesofvariables*/#pragmaompparallelprivate(nthreads,tid){

/*Obtainthreadnumber*/tid=omp_get_thread_num();printf("HelloWorldfromthread=%d\n",tid);

/*Onlymasterthreaddoesthis*/if(tid==0){nthreads=omp_get_num_threads();printf("Numberofthreads=%d\n",nthreads);}}/*Allthreadsjoinmasterthreadanddisband*/}1.HelloWorld!36#include<pthread.h>#include<stdio.h>#defineNUM_THREADS5void*PrintHello(void*threadid){printf("\n%d:HelloWorld!\n",threadid);pthread_exit(NULL);}intmain(intargc,char*argv[]){pthread_tthreads[NUM_THREADS];intrc,t;for(t=0;t<NUM_THREADS;t++){printf("Creatingthread%d\n",t);rc=pthread_create(&threads[t],NULL,PrintHello,(void*)t);if(rc){printf("ERROR;returncodefrompthread_create()is%d\n",rc);exit(-1);}}

pthread_exit(NULL);}ExampleCode-PthreadCreationandTermination

PROGRAMREDUCTIONINTEGERI,NREALA(100),B(100),SUM!SomeinitializationsN=100DOI=1,NA(I)=I*1.0B(I)=A(I)ENDDOSUM=0.0!$OMPPARALLELDOREDUCTION(+:SUM)DOI=1,NSUM=SUM+(A(I)*B(I))ENDDOPRINT*,'Sum=',SUMEND2.ParallelLoopReduction383.Matrix-vectormultiplyusingaparallelloopandcriticaldirective/***Spawnaparallelregionexplicitlyscopingallvariables***/#pragmaompparallelshared(a,b,c,nthreads,chunk)private(tid,i,j,k){#pragmaompforschedule(static,chunk)for(i=0;i<NRA;i++){printf("thread=%ddidrow=%d\n",tid,i);for(j=0;j<NCB;j++)for(k=0;k<NCA;k++)c[i][j]+=a[i][k]*b[k][j];}}39Parallelize:Win32API,PIvoidmain(){doublepi;inti;DWORDthreadID;intthreadArg[NUM_THREADS];for(i=0;i<NUM_THREADS;i++)threadArg[i]=i+1;InitializeCriticalSection(&hUpdateMutex);for(i=0;i<NUM_THREADS;i++){thread_handles[i]=CreateThread(0,0,(LPTHREAD_START_ROUTINE)Pi,&threadArg[i],0,&threadID);}WaitForMultipleObjects(NUM_THREADS,thread_handles,TRUE,INFINITE);pi=global_sum*step;printf("piis%f\n",pi);}#include<windows.h>#defineNUM_THREADS2HANDLEthread_handles[NUM_THREADS];CRITICAL_SECTIONhUpdateMutex;staticlongnum_steps=100000;doublestep;doubleglobal_sum=0.0;voidPi(void*arg){inti,start;doublex,sum=0.0;start=*(int*)arg;step=1.0/(double)num_steps;for(i=start;i<=num_steps;i=i+NUM_THREADS){ x=(i-0.5)*step; sum=sum+4.0/(1.0+x*x);}EnterCriticalSection(&hUpdateMutex);global_sum+=sum;LeaveCriticalSection(&hUpdateMutex);}Doublescodesize!40Solution:KeepitsimpleThreadslibraries:线程库：Pro:ProgrammerhascontrolovereverythingPro：程序员掌握了一切Con:ProgrammermustcontroleverythingCon：程序员必须控制一切ProgrammersscaredawayFullcontrolIncreasedcomplexity增加复杂Sometimesasimpleevolutionaryapproachisbetter有时一个简单的进化方法更好41PIProgram:anexamplestaticlongnum_steps=100000;doublestep;voidmain(){ inti;doublex,pi,sum=0.0; step=1.0/(double)num_steps; for(i=1;i<=num_steps;i++){ x=(i-0.5)*step; sum=sum+4.0/(1.0+x*x); } pi=step*sum;}42OpenMPPIProgram:

ParallelRegionexample(SPMDProgram)#include<omp.h>staticlongnum_steps=100000;doublestep;#defineNUM_THREADS2voidmain(){inti;doublex,pi,sum[NUM_THREADS];step=1.0/(double)num_steps;

omp_set_num_threads(NUM_THREADS);#pragmaompparallel{doublex;intid;id=omp_get_thread_num();for(i=id,sum[id]=0.0;i<num_steps;i=i+NUM_THREADS){ x=(i+0.5)*step; sum[id]+=4.0/(1.0+x*x); }}for(i=0,pi=0.0;i<NUM_THREADS;i++)pi+

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

并行程序设计中文课件 06 OpenMP多线程程序设计

文档简介

温馨提示

最新文档

评论

并行程序设计 中文课件 06 OpenMP多线程程序设计

文档简介

温馨提示

最新文档

评论

相关文档

并行程序设计中文课件 06 OpenMP多线程程序设计