![三个臭皮匠胜过一个诸葛亮_第1页](http://file4.renrendoc.com/view/bc63c90b372d840f6484d6cc3a3de3e2/bc63c90b372d840f6484d6cc3a3de3e21.gif)
![三个臭皮匠胜过一个诸葛亮_第2页](http://file4.renrendoc.com/view/bc63c90b372d840f6484d6cc3a3de3e2/bc63c90b372d840f6484d6cc3a3de3e22.gif)
![三个臭皮匠胜过一个诸葛亮_第3页](http://file4.renrendoc.com/view/bc63c90b372d840f6484d6cc3a3de3e2/bc63c90b372d840f6484d6cc3a3de3e23.gif)
![三个臭皮匠胜过一个诸葛亮_第4页](http://file4.renrendoc.com/view/bc63c90b372d840f6484d6cc3a3de3e2/bc63c90b372d840f6484d6cc3a3de3e24.gif)
![三个臭皮匠胜过一个诸葛亮_第5页](http://file4.renrendoc.com/view/bc63c90b372d840f6484d6cc3a3de3e2/bc63c90b372d840f6484d6cc3a3de3e25.gif)
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
ExploitingThread-LevelParallelismin
GeneralPurposeApplicationsPen-ChungYew游本中
DepartmentofComputerScienceandEngineeringUniversityofMinnesotahttp:///Agassiz2023/1/111PCYew-Taiwan三個臭皮匠勝過一個諸葛亮
三個諸葛亮勝不過一個臭皮匠Pen-ChungYew游本中
DepartmentofComputerScienceandEngineeringUniversityofMinnesotahttp:///Agassiz2023/1/112PCYew-TaiwanImpactofHardwareTechnologyonComputerArchitecturesPerformance
improvementofmicroprocessorssofarhasbeendrivenprimarilybyhigherclockrates:
smallerfeaturesizes
(Moore’sconjecture),higherpowerdensity,highercoolingcostResults?IntelcancelledtwoPentiumprojectsrecentlyVLSItechnologyallowsmorethan1billiontransistorsonasinglechip=>plentyofgates,whattodowiththem?Superscalarishardtoscalebeyond~10instructionsperclockcycle:inherent
ILP(Instruction-LevelParallelism)limitationinapplicationprograms,longwiredelays,highpowerdensityMemorywallisgettinghigherbetweenCPUandstoragedevicesImprovingsingleprogramperformanceisstillveryimportant2023/1/113PCYew-TaiwanParallelProcessingComestotheRescue–Finally?Parallelprocessinghasbeenproposedtosalvageclockratelimitation=>forthepastthirtyyears!!!Finally?=>multiplecoresinIntel’sroadmapaswellasinmostembeddedprocessorstodayWhatisnewhere?
Usethread-levelparallelism(TLP)toimproveinstruction-levelparallelism(ILP)
forgeneral-purposeapplications2023/1/114PCYew-TaiwanILPvs.TLPTimeop1op2op3op4op5op6op7op8op9op10op11op12……………..op21op22op23op24t1t2t3t6Timet1t2t3t6op1op2op3op4op5op6op7op8op9op10op11op12…………op21op22op23op24Th1Th2Th3Th4SuperscalarTLPILP2023/1/115PCYew-TaiwanParallelProcessingComestotheRescue–Finally?IsthereenoughTLPingeneral-purposeapplicationprograms(toimproveILP)?=>muchharderthanscientificapplications(floating-point-intensive)2023/1/116PCYew-Taiwan2023/1/117PCYew-TaiwanTLPChallengesinGeneral-PurposeApplicationsMostlyDo-whileloopsNeedthread-levelcontrolspeculationParallelismexistsmostlyinouterloopsNotgoodforVLIW(i.e.softwarepipelining),orvectorprocessing=>needthread-levelsupportPointerscomplicatealiasanddatadependenceanalysisNeedruntimedisambiguationanddataspeculationManysmallloopsanddoacrossloopsNeedfastandlowoverheadcommunicationSmallbasicblocks–needtoexploitbothILPandTLP
Neednewapproachestoapplyparallelprocessingtosuchapplications!!2023/1/118PCYew-TaiwanOutlineMulti-threadedarchitecturesSpeculationtobreakdependencySpeculativeexecutiononsingle-threadedprocessorsSpeculativeexecutiononmulti-threadedprocessorsProfile-basedanalysesConclusion2023/1/119PCYew-TaiwanMulti-ThreadedArchitectures
Toimprovesingle-programspeedupMultiscalarSuperthreaded
ProrcessorsTraceprocessorMultiprocessoronachipToimproveresourceutilization,throughputSimultaneousMultithreading(SMT)TohidememorylatencyTeracomputer,HyperthreadingTosupportsystem/applicationfunctionalityReference:SpeculativeExecutioninHighPerformanceComputerArchitectures,editedbyKaeliandYew,CRCPress,20052023/1/1110PCYew-TaiwanSuperthreadedArchitectures
Exploitthread-levelparallelismtoenhanceILPMultiplevs.singleinstructionwindows(notforscalabilityasintraditionalparallelprocessing)Controlspeculation(notstoppedbybranchinstructions)Dataspeculation(notstoppedbydatadep’sbetweenthreads)Fastcommunication=>smalltaskgranularityHighcachehitrates,automaticdataprefetchingNeednewhardwareandcompiler/softwaresupportReference:
TheSuperthreadedProcessorArchitecture,Tsai,etal
IEEETrans.OnComputers,Sept19992023/1/1111PCYew-Taiwan
InstructionCache
DataCacheThreadprocessingunitExecutionUnitComm.UnitMemoryBufferWritebackUnitThreadprocessingunitExecutionUnitComm.UnitMemoryBufferWritebackUnitThreadprocessingunitExecutionUnitComm.UnitMemoryBufferWritebackUnitThreadprocessingunitExecutionUnitComm.UnitMemoryBufferWritebackUnit2023/1/1112PCYew-TaiwanSpeculation:
BreakingProgramDependencyControlanddata
dependenceslimitprogramperformanceHowever,MostbrancheshavegoodpredictabilityMostdatadependences
happeninfrequently
atruntimeSpeculationisaneffectiveapproachtobreakdependencesOptimizeprogramexecutionbyignoringinfrequent
datadependences,ortakingpredictedpathsCheck
possibleviolation(mis-speculation)atruntimeRecoverifviolationoccurs2023/1/1113PCYew-TaiwanTypeofSpeculationControlspeculationSpeculateonprogramcontrolflowpathDataspeculationSpeculateonhowlikelymemoryreferencesaretothesamememorylocation(address)ValuespeculationSpeculationontheresultvalueofanoperation2023/1/1114PCYew-TaiwanOutlineMulti-treadedarchitecturesSpeculationtobreakdependencySpeculativeexecutiononsingle-threadedprocessorsSpeculativeexecutioninmulti-threadedprocessorsProfile-basedanalysesConclusion2023/1/1115PCYew-TaiwanSpeculationonIntelIA64BothcontrolanddataspeculationaresupportedonIntelIA64SpecialinstructionsandhardwareareprovidedMemoryloadoperationistargetedforspeculationMemorydelayisusuallythebottleneckofperformanceMemoryloadisusuallythestartofspeculativeoperations2023/1/1116PCYew-TaiwanSpeculatingonDataDependence
MorespeculativeoptimizationsI1:…=*qI2:*p=bI3:…=*qI4:*r=…I5:…=*pI6:*r=…SpeculateonthisdependenceRedundancyeliminationopportunity2023/1/1117PCYew-TaiwanSpeculateonDataDependences
MorespeculativeoptimizationsI1:…=*qI2:*p=bI3:…=*qI4:*r=…I5:…=*pI6:*r=…SpeculateonthisdependenceCopypropagationopportunity2023/1/1118PCYew-TaiwanSpeculateonDataDependences
MorespeculativeoptimizationsI1:…=*qI2:*p=bI3:…=*qI4:*r=…I5:…=*pI6:*r=…SpeculateonthisdependenceDeadstoreeliminationopportunity2023/1/1119PCYew-TaiwanObservationsSpeculativeoptimizationopportunitiesexistinmanyapplications(originally,itwasonlyformemorylatencyhidingduringcodescheduling)AgeneralcompilerframeworkisneededtosupportbothcontrolanddataspeculationinoptimizationsNeedtogeneraterecoverycodeformis-speculationNeedextensivesupportfordatadependence,alias,andvalueprofiling
(nolongerconservativeanalysis)Reference:
ACompilerFrameworkforSpeculativeAnalysisandOptimizations,ACM/SIGPLANConf.OnProgrammingLanguageDesignandImplementation(PLDI),June2003,alsoinACMTrans.OnArchitectureandCodeOptimization(TACO),Vol.1,No.3,Sept.2004,pp.247-2712023/1/1120PCYew-TaiwanACompilerFramework:
IntelOpenResearchCompiler(ORC)2023/1/1121PCYew-TaiwanPerformanceImprovementofSpeculativeRegisterPromotionBasedonaliasprofileandcomparedwith–O3withtype-basedaliasanalysisonIntelORCcompiler2023/1/1122PCYew-TaiwanValueSpeculation
ValueLocality:likelihoodofapreviously-seenvaluerecurringwithinastoragelocationObservedinanystoragelocationsRegistersCachememoryMainmemoryMostworkfocussingonvaluestoredinregisterstobreakpotentialdatadependences:registervaluelocality2023/1/1123PCYew-TaiwanPerformanceofValuePredictorsPredictabilityofDataValues,SazeidesandSmith,Micro-30,1997Lastvaluepredictionvariesfrom23%to61%,averageabout40%Stridepredictionvariesfrom38%to80%,averageabout56%FCMwithanorderof3variesfrom56%toover90%,withanaverageofabout78%ImprovementdiminishesasorderincreasesLesssensitivetodifferenttypesofinstructions2023/1/1124PCYew-TaiwanOutlineIntroductionSpeculationtobreakdependencySpeculativeexecutiononsingle-threadedprocessorsSpeculativeexecutiononmulti-threadedprocessorsProfile-basedanalysesConclusion2023/1/1125PCYew-TaiwanCompilerOptimizationsforSpeculativeThreadsWithoutcompileroptimization,thereislimitedTLPevenunderperfecthardwaresupport.[OplingerPACT99]CompilerhavetodecideWhichloops/regionstobetransformedintothreadUsesynchronizationorspeculationHowtoschedulethecodetoimproveoverlapsWhattransformationstobeusedWhen/HowtogeneraterecoverycodeProfile-basedanalysiscouldbeveryefficient2023/1/1126PCYew-TaiwanLoopSelectionprogramspeedupCarefullyselectedloopscanimproveperformancesignificantly!2023/1/1127PCYew-TaiwanSpeculativeCodeMotion*p=*p=*p==*p=*p=*p*p=*p=*p=
=*p=*p=*pstall
critical
pathother
computation
beforecodemotionaftercodemotion2023/1/1128PCYew-TaiwanOutlineIntroductionSpeculationtobreakdependencySpeculativeexecutiononsingle-threadedprocessorsSpeculativeexecutiononmulti-threadedprocessorsProfile-basedanalysisConclusion2023/1/1129PCYew-TaiwanCrucialConsiderationsinDependenceProfilingProgramcoverage=>needcompiler’ssupportoruseheuristicrulesInputsensitivityProfilingoverhead(spaceandtime)Usingaliasanddatadependenceprofilesisinherentlyspeculative=>needhardwaresupportforcorrectexecution2023/1/1130PCYew-TaiwanAliasProfilingvs.StaticAnalysis
Mostpossibledatadependencereportedbycompilerdonotoccuratruntime2023/1/1131PCYew-Taiwan
DataDependenceProfilingDatadependenceedgesamongmemoryreferencesandfunctioncallsDetailedinformationtype:flow,anti,output,orinputprobability:frequencyofoccurrenceWhenloopsaretargeteddependencedistance:limited2023/1/1132PCYew-TaiwanOverheadofProfiling96110102121120020406080bzip2craftygapgccgzipmcfparserperlbmktwolfvortexvpraverageXtimessloweraliasDDwithoutdistanceDDforinnermostloopsDD4-levelloopsCompiler:ORCversion2.0Machine:Itanium2,900MHzand2GmemoryBenchmarks:SPECCPU2000IntInstrumentationoptimizationhasbeendone2023/1/1133PCYew-TaiwanTechniquestoReduceProfilingOverheadReducethespacerequirementbyhashtableLargergranularityofaddressSmalleriterationcounterSamplingSamplethesnapshotsofproceduresorloopsinsteadofindividualreferencesUseinstrumentation-basedsamplingframeworkSwitchatproceduresorloops2023/1/1134PCYew-TaiwanConclusionsMicroprocessorshavecaughtupwithsupercomputersin’90andhavegonemulti-coreItisnon-trivialtoapplycurrentsupercomputingtechnologiestogeneral-purposeapplicationsNewarchitecturalsupportsuchasthread-levelspeculativeexecution,andnewcompilertechniquessuchasspeculativeoptimizationsusingaliasanddatadependenceprofiling,evendynamicoptimizationatruntime,arecrucial–asalwaysAveryexcitingandneweraforparallelprocessingmighthavearrived(especiallyinembeddedsystems)–finally!2023/1/1135PCYew-TaiwanReferencesJ.Linetal,ACompilerFrameworkforSpeculativeAnalysisandOptimizations,Proc.OfACM/SIGPLANConf.OnProgrammingLanguageDesignandImplementation(PLDI),June2003,alsoinACMTrans.OnArchitectureandCodeOptimization(TACO),Vol.1,No.3,Sept.2004,pp.247-271J.Linetal,RecoveryCodeGenerationforGeneralSpeculativeOptimizations,toappearinACMTrans.OnArchitectureandCodeOptimization(TACO)2005.(3)J.Linetal,SpeculativeRegisterPromotionUsingAdvancedLoadAddressTable(ALAT),Proc.OfIEEE/ACMInt’lSymp.OnCodeGenerationandOptimization(CGO),March2003(4)T.Chenetal,DataDependenceProfilingforSpeculativeOptimizations,Proc.OfInt’lConfonCompilerConstruction(CC),March2004(5)T.Chenetal,AnEmpiricalStudyontheGranularityofPointerAnalysisinCprograms,Proc.15thWorkshoponLanguagesandCompilersforParallelComputing(LCPC),August2002(6)J.Y.Tsaietal,TheSuperthreadedProcessorArchitecture,IEEETransonComputers,specialissueonMultithreadedArchitecture,Vol.48,No.9,Sept19992023/1/1136PCYew-TaiwanControlSpeculationld.s:movetheloadoperationacrossthebarri
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年度影视作品版权转让合同规范
- 2025年度房地产开发贷款合同公证书范本
- 2025年二手房房屋租赁合同范文(2篇)
- 2025年度家政保洁服务与家庭管家一体化合同
- 2025年度个人经营性贷款归还及税收优惠合同范本
- 2025年假离婚协议书制作与婚姻家庭法律事务处理合同
- 2025年度建筑安全检测检验服务合同(升级版)
- 2025年度国际贸易知识产权保护合同商订及执行
- 2025年度建筑工程招投标与施工合同管理实施细则范本
- 2025年度环境工程监理合同协议书
- 长江委水文局2025年校园招聘17人历年高频重点提升(共500题)附带答案详解
- 2025年湖南韶山干部学院公开招聘15人历年高频重点提升(共500题)附带答案详解
- 广东省广州市番禺区2023-2024学年七年级上学期期末数学试题
- 智研咨询发布:2024年中国MVR蒸汽机械行业市场全景调查及投资前景预测报告
- IF钢物理冶金原理与关键工艺技术1
- JGJ46-2024 建筑与市政工程施工现场临时用电安全技术标准
- 烟花爆竹重大危险源辨识AQ 4131-2023知识培训
- 销售提成对赌协议书范本 3篇
- 企业动火作业安全管理制度范文
- 六年级语文老师家长会
- DRG丨DIP病案10项质控指标解读
评论
0/150
提交评论