版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Rajkumar BuyyaSchool of Computer Science and Software EngineeringMonash TechnologyMelbourne, AustraliaEmail: URL: :/ .au/rajkumarConcurrent Programming with ThreadsObjectivesExplain the parallel computing right from architecture, OS, programming paradigm, and applicationsExplain the multithreading p
2、aradigm, and all aspects of how to use it in an applicationCover all basic MT conceptsExplore issues related to MTContrast Solaris, POSIX, Java threadsLook at the APIs in detailExamine some Solaris, POSIX, and Java code examplesDebate on: MPP and Cluster Computing Agenda Overview of Computing Operat
3、ing Systems Issues Threads Basics Multithreading with Solaris and POSIX threads Multithreading in Java Distributed Computing Grand Challenges Solaris, POSIX, and Java example code PPPPPPMicrokernelMulti-Processor Computing SystemThreads InterfaceHardwareOperating SystemProcessProcessorThreadPApplica
4、tionsComputing ElementsProgramming paradigms Architectures Compilers Applications Architectures Compilers Applications P.S.Es SequentialEraParallelEra1940 50 60 70 80 90 2000 2030Two Eras of Computing Commercialization R & D CommodityHistory of Parallel ProcessingPP can be traced to a tablet dated a
5、round 100 BC.Tablet has 3 calculating positions.Infer that multiple positions:Reliability/ SpeedMotivating FactorsdddJust as we learned to fly, not by constructing a machine that flaps its wings like birds, but by applying aerodynamics principles demonstrated by nature. We modeled PP after those of
6、biological species.Aggregated speed with which complex calculations carried out by individual neurons response is slow (ms) - demonstrate feasibility of PPMotivating FactorsWhy Parallel Processing?Computation requirements are ever increasing - visualization, distributed databases, simulations, scien
7、tific prediction (earthquake), etc.Sequential architectures reaching physical limitation (speed of light, thermodynamics)Technical ComputingSolving technology problems usingcomputer modeling, simulation and analysisLife SciencesMechanical Design & Analysis (CAD/CAM)AerospaceGeographicInformationSyst
8、emsNo. of ProcessorsC.P.I.1 2 . . . .Computational Power ImprovementMultiprocessorUniprocessorAgeGrowth5 10 15 20 25 30 35 40 45 . . . . Computational Power ImprovementVerticalHorizontalThe Tech. of PP is mature and can be exploited commercially; significant R & D work on development of tools & envi
9、ronment.Significant development in Networking technology is paving a way for heterogeneous computing.Why Parallel Processing?Hardware improvements like Pipelining, Superscalar, etc., are non-scalable and requires sophisticated Compiler Technology.Vector Processing works well for certain kind of prob
10、lems.Why Parallel Processing?Parallel Program has & needs .Multiple “processes active simultaneously solving a given problem, general multiple processors.Communication and synchronization of its processes (forms the core of parallel programming efforts).Processing Elements ArchitectureSimple classif
11、ication by Flynn: (No. of instruction and data streams)SISD - conventionalSIMD - data parallel, vector computingMISD - systolic arraysMIMD - very general, multiple approaches.Current focus is on MIMD model, using general purpose processors. (No shared memory)Processing ElementsSISD : A Conventional
12、ComputerSpeed is limited by the rate at which computer can transfer information internally.ProcessorData InputData OutputInstructionsEx:PC, Macintosh, WorkstationsThe MISDArchitectureMore of an intellectual exercise than a practical configuration. Few built, but commercially not availableData InputS
13、treamData OutputStreamProcessorAProcessorBProcessorCInstructionStream AInstructionStream BInstruction Stream CSIMD ArchitectureEx: CRAY machine vector processing, Thinking machine cm*Ci no of CPUsP1P2P3timeCPUMultithreading - MultiprocessorsConcurrency Vs Parallelism P1P2P3timeNo of execution proces
14、s = no of CPUsCPUCPUCPUComputational ModelParallel Execution due to :Concurrency of threads on Virtual ProcessorsConcurrency of threads on Physical ProcessorTrue Parallelism :threads : processor map = 1:1User Level ThreadsVirtual ProcessorsPhysical ProcessorsUser-Level Schedule (User)Kernel-Level Sc
15、hedule (Kernel)General Architecture ofThread ModelHides the details of machine architectureMaps User Threads to kernel threadsProcess VM is shared, state change in VM by one thread visible to other.Process Parallelismint add (int a, int b, int & result)/ function stuffint sub(int a, int b, int & res
16、ult)/ function stuffpthread t1, t2;pthread-create(&t1, add, a,b, & r1);pthread-create(&t2, sub, c,d, & r2);pthread-par (2, t1, t2);MISD and MIMD Processingabr1cdr2addsubProcessorDataIS1IS2Processordo“dn/2dn2/+1“dnSortDataISData Parallelismsort( int *array, int count)/./.pthread-t, thread1, thread2;“
17、pthread-create(& thread1, sort, array, N/2);pthread-create(& thread2, sort, array, N/2);pthread-par(2, thread1, thread2);SIMD ProcessingSortProcessorProcessorPurposeThreads ModelProcess ModelStart execution of a new threadCreation of a new threadWait for completion of threadExit and destroy the thre
18、adthr_join()wait( )exec( )exit( )fork ( ) thr_create() builds the new thread and starts the executionthr_create( )thr_exit()Process and Threaded modelsCode ComparisonSegment (Process)main ( )fork ( );fork ( );fork ( );Segment(Thread)main()thread_create(0,0,func(),0,0);thread_create(0,0,func(),0,0);t
19、hread_create(0,0,func(),0,0);Printing ThreadEditing ThreadIndependent Threadsprinting()- - - - - - - - - - - -editing()- - - - - - - - - - - -main()- - - - - - - - - - - -id1 = thread_create(printing);id2 = thread_create(editing);thread_run(id1, id2);- - - - - - - - - - - -Cooperative threads - File
20、 Copyreader()- - - - - - - - - -lock(buffi);read(src,buffi);unlock(buffi);- - - - - - - - - -writer()- - - - - - - - - -lock(buffi);write(src,buffi);unlock(buffi);- - - - - - - - - -buff0buff1Cooperative Parallel Synchronized ThreadsRPC Callfunc()/* Body */RPC(func).ClientServerNetworkServerThreadsM
21、essage PassingFacilityServer ProcessClient ProcessClient ProcessUser ModeKernel ModeMultithreaded ServerCompiler ThreadPreprocessor ThreadMultithreaded CompilerSourceCodeObjectCodeThread Programming models1. The boss/worker model2. The peer model3. A thread pipelinetaskXtaskYtaskZmain ( )WorkersProg
22、ramFilesResourcesDatabasesDisksSpecialDevicesBossInput (Stream)The boss/worker modelExamplemain() /* the boss */ forever get a request;switch( request )case X: pthread_create(.,taskX);case X: pthread_create(.,taskX);.taskX() /* worker */ perform the task, sync if accessing shared resourcestaskY() /*
23、 worker */ perform the task, sync if accessing shared resources.-Above runtime overhead of creating thread can be solved by thread pool* the boss thread creates all worker thread at program initialization and each worker thread suspends itself immediately for a wakeup call from boss The peer modelta
24、skXtaskYWorkersProgramFilesResourcesDatabasesDisksSpecialDevicestaskZInput(static)Examplemain() pthread_create(.,thread1.task1); pthread_create(.,thread2.task2);. signal all workers to start wait for all workers to finish do any cleanuptask1() /* worker */wait for start perform the task, sync if acc
25、essing shared resourcestask2() /* worker */wait for start perform the task, sync if accessing shared resourcesA thread pipelineResourcesFilesDatabasesDisksSpecial DevicesFilesDatabasesDisksSpecial DevicesFilesDatabasesDisksSpecial DevicesStage 1Stage 2Stage 3ProgramFilter ThreadsInput (Stream)Exampl
26、emain() pthread_create(.,stage1);pthread_create(.,stage2);.wait for all pipeline threads to finishdo any cleanupstage1() get next input for the programdo stage 1 processing of the inputpass result to next thread in pipelinestage2()get input from previous thread in pipelinedo stage 2 processing of th
27、e inputpass result to next thread in pipelinestageN()get input from previous thread in pipelinedo stage N processing of the inputpass result to program output.Multithreaded Matrix Multiply.XA= BCC1,1 = A1,1*B1,1+A1,2*B2,1.Cm,n=sum of product of corresponding elements in row of A and column of B.Each
28、 resultant element can be computed independently.Multithreaded Matrix Multiplytypedef struct int id; int size;int row, column;matrix *MA, *MB, *MC; matrix_work_order_t;main() int size = ARRAY_SIZE, row, column;matrix_t MA, MB,MC;matrix_work_order *work_orderp;pthread_t peersize*zize;./*process matri
29、x, by row, column */for( row = 0; row size; row+ ) for( column = 0; column size; column+) id = column + row * ARRAY_SIZE; work_orderp = malloc( sizeof(matrix_work_order_t);/* initialize all members if wirk_orderp */pthread_create(peerid, NULL, peer_mult, work_orderp); /* wait for all peers to exist*
30、/ for( i =0; i size*size;i+)pthread_join( peeri, NULL );Multithreaded Server.void main( int argc, char *argv ) int server_socket, client_socket, clilen; struct sockaddr_in serv_addr, cli_addr; int one, port_id;#ifdef _POSIX_THREADSpthread_t service_thr;#endif port_id = 4000;/* default port_id */if(
31、(server_socket = socket( AF_INET, SOCK_STREAM, 0 ) 0 ) printf(Error: Unable to open socket in parmon server.n);exit( 1 ); memset( (char*) &serv_addr, 0, sizeof(serv_addr); serv_addr.sin_family = AF_INET; serv_addr.sin_addr.s_addr = htonl(INADDR_ANY); serv_addr.sin_port = htons( port_id ); setsockopt
32、(server_socket, SOL_SOCKET, SO_REUSEADDR, (char *)&one, sizeof(one); Multithreaded Server. if( bind( server_socket, (struct sockaddr *)&serv_addr, sizeof(serv_addr) %dn,errno );exit( 1 ); listen( server_socket, 5); while( 1 ) clilen = sizeof(cli_addr);client_socket = accept( server_socket, (struct s
33、ockaddr *)&serv_addr, &clilen );if( client_socket 0 ) IDENTI|FY USER REQUEST .Do NECESSARY Processing .Send Results to Server CLOSE Connect and Terminate THREAD close( client_socket );#ifdef POSIX_THREADS pthread_exit( (void *)0);#endif The Value of MTProgram structureParallelismThroughputResponsive
34、nessSystem resource usageDistributed objectsSingle source across platforms (POSIX)Single binary for any number of CPUsTo thread or not to threadImprove efficiency on uniprocessor systemsUse multiprocessor HardwareImprove ThroughputSimple to implement Asynchronous I/OLeverage special features of the
35、OSTo thread or not to threadIf all operations are CPU intensive do not go far on multithreadingThread creation is very cheap, it is not freethread that has only five lines of code would not be usefulDOS - The Minimal OSUserSpaceKernelSpaceDOSDataStack & Stack PointerProgram CounterUserCodeGlobalData
36、DOSCodeHardwareDOSMultitasking OSsProcessUserSpaceKernelSpaceHardwareUNIXProcess Structure(UNIX, VMS, MVS, NT, OS/2 etc.)Multitasking SystemsHardwareThe KernelP1P2P3P4Processes(Each process is completely independent)Multithreaded ProcessUserCodeGlobalDataThe KernelProcess Structure(Kernel state and
37、address space are shared)T1s SP T3sPC T1sPC T2sPCT1s SPT2s SPKernel StructuresProcess IDUID GID EUID EGID CWD.PrioritySignal MaskRegistersKernel StackCPU StateFile DescriptorsSignal Dispatch TableMemory MapProcess IDUID GID EUID EGID CWD.File DescriptorsSignal Dispatch TableMemory MapTraditional UNI
38、X Process StructureSolaris 2 Process StructureLWP 2LWP 1Scheduling Design Options M:1HP-UNIX 1:1DEC, NT, OS/1, AIX. IRIXM:M2-levelSunOS Two-Level Thread ModelProc 1Proc 2Proc 3Proc 4Proc 5TraditionalprocessUserLWPsKernelthreadsKernelHardwareProcessorsThread Life Cyclemain()main() . pthread_create( f
39、unc, arg); thr_create( .func.,arg.); . . void * func() .pthread_exit()T2T1pthread_create(.func.)POSIXSolarisWaiting for a Thread to Exitmain()main() . pthread_join(T2); thr_join( T2,&val_ptr); . . void * func() .pthread_exit()T2T1pthread_join()POSIXSolarisScheduling States: Simplified View of Thread
40、 State TransitionsRUNNABLESLEEPINGSTOPPEDACTIVEStopContinuePreemptStopStopSleepWakeupPreemptionThe process of rudely interrupting a thread and forcing it to relinquish its LWP (or CPU) to another.CPU2 cannot change CPU3s registers directly. It can only issue a hardware interrupt to CPU3. It is up to
41、 CPU3s interrupt handler to look at CPU2s request and decide what to do.Higher priority threads always preempt lower priority threads.Preemption ! = Time slicingAll of the libraries are preemptiveEXIT Vs. THREAD_EXITThe normal C function exit() always causes the process to exit. That means all of th
42、e process - All the threads.The thread exit functions:UI: thr_exit()POSIX: pthread_exit()OS/2: DosExitThread() and _endthread()NT: ExitThread() and endthread()all cause only the calling thread to exit, leaving the process intact and all of the other threads running. (If no other threads are running,
43、 then exit() will be called.)CancellationCancellation is the means by which a thread can tell another thread that it should exit.main()main()main().pthread_cancel (T1);DosKillThread(T1);TerminateThread(T1)There is no special relation between the killer of a thread and the victim. (UI threads must “r
44、oll their own using signals)(pthread exit)(pthread cancel()T1T2POSIXOS/2Windows NTCancellation State and TypeStatePTHREAD_CANCEL_DISABLE (Cannot be cancelled)PTHREAD_CANCEL_ENABLE (Can be cancelled, must consider type)TypePTHREAD_CANCEL_ASYNCHRONOUS (any time what-so-ever) (not generally used)PTHREA
45、D_CANCEL_DEFERRED(Only at cancellation points)(Only POSIX has state and type)(OS/2 is effectively always “enabled asynchronous)(NT is effectively always “enabled asynchronous)Cancellation is Always Complex!It is very easy to forget a lock thats being held or a resource that should be freed.Use this
46、only when you absolutely require it.Be extremely meticulous in analyzing the possible thread states.Document, document, document!Returning StatusPOSIX and UIA detached thread cannot be “joined. It cannot return status.An undetached thread must be “joined, and can return a status.OS/2Any thread can b
47、e waited forNo thread can return statusNo thread needs to be waited for.NTNo threads can be waited forAny thread can return statusSuspending a Threadmain() . thr_suspend(T1); . thr_continue(T1); .continue()T2T1suspend()Solaris:* POSIX does not support thread suspensionProposed Uses of Suspend/Contin
48、ueGarbage CollectorsDebuggersPerformance AnalysersOther Tools?These all must go below the API, so they dont count.Isolation of VM system “spooling (?!)NT Services specify that a service should b suspendable (Questionable requirement?)Be CarefulDo NOT Think about Scheduling!Think about Resource Avail
49、abilityThink about SynchronizationThink about PrioritiesIdeally, if youre using suspend/ continue, youre making a mistake!SynchronizationWebsters: “To represent or arrange events to indicate coincidence or coexistence.Lewis : “To arrange events so that they occur in a specified order.* Serialized ac
50、cess to controlled resources.Synchronization is not just an MP issue. It is not even strictly an MT issue! Threads Synchronization :On shared memory : shared variables - semaphoresOn distributed memory :within a task : semaphoresAcross the tasks : By passing messagesUnsynchronized Shared Data is a F
51、ormula for DisasterThread1Thread2temp = Your - BankBalance;dividend = temp * InterestRate;newbalance = dividend + temp;Your-Dividend += dividend; Your-BankBalance+= deposit;Your-BankBalance = newbalance;Atomic ActionsAn action which must be started and completed with no possibility of interruption.A
52、 machine instruction could need to be atomic. (not all are!)A line of C code could need to be atomic. (not all are)An entire database transaction could need to be atomic.All MP machines provide at least one complex atomic instruction, from which you can build anything.A section of code which you hav
53、e forced to be atomic is a Critical Section.Critical Section(Good Programmer!)reader()- - - - - - - - - -lock(DISK);.unlock(DISK);- - - - - - - - - -writer()- - - - - - - - - -lock(DISK);.unlock(DISK);- - - - - - - - - -Shared DataT1T2Critical Section(Bad Programmer!)reader()- - - - - - - - - -lock(
54、DISK);.unlock(DISK);- - - - - - - - - -writer()- - - - - - - - - -.- - - - - - - - - -Shared DataT1T2Lock Shared Data!GlobalsShared data structuresStatic variables(really just lexically scoped global variables)Mutexesitem = create_and_fill_item();mutex_lock( &m );item-next = list;list = item;mutex_u
55、nlock(&m);mutex_lock( &m );this_item = list;list = list_next;mutex_unlock(&m); .func(this-item);POSIX and UI : Owner not recorded, block in priority order.OS/2 and NT. Owner recorded, block in FIFO order.Thread 1Thread2Synchronization Variables in Shared Memory (Cross Process)Process 1Process 2SSSha
56、red MemorySS SynchronizationVariableThreadSynchronizationProblemsDeadlockslock( M1 );lock( M2 );lock( M2 );lock( M1 );Thread 1Thread 2Thread1 is waiting for the resource(M2) locked by Thread2 andThread2 is waiting for the resource (M1) locked by Thread1Avoiding DeadlocksEstablish a hierarchy : Alway
57、s lock Mutex_1 before Mutex_2, etc.,.Use the trylock primitives if you must violate the hierarchy. while (1) pthread_mutex_lock (&m2); if( EBUSY |= pthread mutex_trylock (&m1) break; else pthread _mutex_unlock (&m1); wait_around_or_do_something_else(); do_real work();/* Got em both! */ Use lockllint
58、 or some similar static analysis program to scan your code for hierarchy violations.Race ConditionsA race condition is where the results of a program are different depending upon the timing of the events within the program.Some race conditions result in different answers and are clearly bugs.Thread
59、1Thread 2mutex_lock (&m)mutex_lock (&m)v = v - 1;v = v * 2;mutex_unlock (&m)mutex_unlock (&m)- if v = 1, the result can be 0 or 1based on which thread gets chance to enter CR firstOperating System IssuesLibrary GoalsMake it fast!Make it MT safe!Retain UNIX semantics!Are Libraries Safe ?getc() OLD im
60、plementation: extern int get( FILE * p ) /* code to read data */ getc() NEW implementation: extern int get( FILE * p ) pthread_mutex_lock(&m); /* code to read data */pthread_mutex_unlock(&m); ERRNOIn UNIX, the distinguished variable errno is used to hold the error code for any system calls that fail
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 文化传媒行业美工工作总结
- 婚纱店前台接待员总结
- 网络营销实训心得体会和收获
- 2024年物流配送中心智能化升级合作协议3篇
- 班级竞技活动的组织与参与计划
- 幼儿园大班数学课教案《牙签摆图形》及教学反思
- 家具行业采购供应商管理
- 描写描写方法6篇
- 教育行业员工激励策略分享
- 媒体编辑前台接待总结
- 陕西省咸阳市2023-2024学年高一上学期期末考试 物理 含解析
- 程序员个人年终总结
- (正式版)HG∕T 21633-2024 玻璃钢管和管件选用规定
- 蔚来用户运营分析报告-数字化
- 南京市2023-2024高一上学期期末英语试卷及答案
- 《供应链管理》期末考试复习题库(含答案)
- 农村小学生上下学交通安全教育的研究
- 雍琦版法律逻辑学课后习题答案全
- 学校暑期维修方案
- 国家自然科学基金进展报告
- 小车多方式运行的PLC控制——PLC控制系统课程设计
评论
0/150
提交评论