已阅读5页,还剩17页未读 继续免费阅读




编译器_keil的优化选项问题分类:编译器类2013-01-11 14:12280人阅读评论(0)收藏举报applicationApplicationcompilationcompileroptimizationperformancePerformance最近发现在keil编译的时候,出现莫名的问题,貌似代码有被优化掉的问题,后来查了下相关的资料,貌似懂了点。我选择的是默认的default优化方式,上网看了下,默认的是level2级别优化,最后选择level0就没有问题了下面是网上找的资料,介绍了优化功能介绍Getting the Best Optimized Code for your Embedded ApplicationARM Compilation ToolsThe ARM Compilation Tools are the only compilation tool s co -developed with the ARM processors, and specificallydesigned to optimally support the ARM architecture. They are a result of 20 years of development, and are recognized as theindustry -leading C and C+ compilation tools for the ARM, Thumb, and Thumb -2 instructions sets.The ARM Compilation tools consist of: The ARM Compiler, which enables you to compile C and C+ code. It is an optimizing compiler, and featurescommand - line options to enable you to control the level of optimization Linker and Utilities, which assign addresses and lay out sections of code to form a final image A selection of libraries, including the ISO standard C libraries, and the MicroLIB C library which is optimized forembedded applications Assembler, which generates machine code instructions from ARM, Thumb or Thumb-2 assembly- level source code Compiler Options for Embedded Applications The ARM Compilation Tools include a number of compiler optimizations to help you best target your code for your chosenmicrocontroller device and application area. They can be accessed from within Vision by clicking on Project Options for Target.T he options described this document can be found on the Target an d C/C+ tabs of the Options for Targets dialog. MDK Compiler Optimizations Cross- Module Optimization takes information from a prior build and uses it to place UNUSED functions into theirown ELF section in the corresponding object file. This option is also known as Linker Feedback, and requires you tobuild your application twice to take adv antage of it for reduced code size. Cross-Module Optimization has been shown to reduce code size, by removing unused functions from your application. Itcan also improve the performance of your application, by allowing modules to share inline code. The M icroLIB C libraryhas been optimized to reduce the size of embedded applications. It is a subset of the ISOstandard C runtime library, and offers a tradeoff between functionality and code size. Some of the standard C libraryfunctions such as memcpy() are slower, while some features of the default library are not supported. Unsupportedfeatures include: o Operating system functions e.g. abort(), exit(), time(), system(), getenv(), o Wide character and multi-byte support e.g. mbtowc(), wctomb()o The stdio file I/O function, with the exception of stdin, stdout and stderr o Position-independent and thread -safe code Use the MicroLIB C library for applications where overall performance can be traded off against the need to reduce codesize and memory cost. Link- Time Code Generation instructs the compiler to create objects in an intermediate format so that the linker canperform further code optimizations. This gives the code generator visibility into cross - file dependencies of all objectssimultaneously, allowing it t o apply a higher level of optimizations. Link -time code generation can reduce code size, andallow your application to run faster. Optimization Levels can also be adjusted. The different levels of optimization allow you to trade off between the levelof debug information available in the compiled code, and the performance of the code. The following optimization levelsare available:o - O0 applies minimum optimizations. Most optimizations are switched off, and the code generated has the best debug view. o - O1 applies restricted optimization. For example, unused inline functions and unused static functions are removed. At this level of optimization, thecompiler also applies automatic optimizations such as removing redundant code and re -ordering instructions s oas to avoid an interlock situation. The code generated is reasonably optimized, with a good debug view. o - O2 applies high optimization (This is the default setting). Optimizations applied at this level take advantage of ARMs in-depth knowledge of the processor architecture,to exploit processor -specific behavio r of the given target. It generates well optimized code, but with limiteddebug view. o - O3 applies th e most aggressive optimization. The optimization is in accordance with the users Ospace/- Otime choice . By default, multi - file compilation isenabled, which leads to a longer compile time, but gives the highest levels of optimization. The Optimize for Time checkbox causes the compiler to optimize with a greater focus on achieving the bestperformance when checked ( - O time) or the smallest code siz e when unchecked ( -O space). Unchecking Optimize for Time selects the Ospace option which instructs the compiler to perform optimizations toreduce the image size at the expense of a poss ible increase i n execution time. F or example, using out -of -line functioncalls instead of inline code for large structure copies. This is the default option. When running the compiler from thecommand line, this option is invoked using -OspaceChecking Optimize for Time selects the Otime option which instructs the compiler to optimize the code for the fastestexecution time, at the risk of an increase in the image size. It is recommended that you compile the time -critical parts ofyour code with Otime, and the rest us ing the Ospace directive . Split Load and Store Multiples instructs the compiler to split LDM and STM instructions involving a large number ofregisters into a series of loads/stores of fewer multiple registers. This means that an LDM of 16 registers can be split into4 separate LDMs of 4 registers each. This option helps to reduce the interrupt latency on ARM systems which do nothave a cache or write buffer, and systems which use zero - wait state 32-bit memory. For example, the ARM7 and ARM9 processor s t ake can only take an exception on an instruction boundary. If anexception occurs at the start of an LDM of 16 registers in a cacheless ARM7 /ARM9 system, the system will finishmaking 16 accesses to memory before taking the exception. Depending on the memory arbitration system, this can resultin a very high interrupt latency. Breaking the LDM into 4 individual LDMs for 4 registers means that the processor willtake the exception after loading a maximum of 4 registers, thereby greatly reducing the interrupt latency.Selecting this option improves the overall performance of the system. The One ELF Section per Function option tells the compiler to put all functions into their own individual ELFsections. This allows the linker to remove unused functions. An ELF code section typically contains the code for a number of functions. The linker is normally only able to removeunused ELF sections, not unused functions. An ELF section can only be removed if all its contents are unused.Therefore, splitting each function into its own ELF section allows the compiler to easily identify which ones are unused,and remove them. Selecting this option increases the time required to compile your code, but results in improved performance . The combination of options applied will depend on your optimization goal whether you are optimizing for smallest codesize, or best performance.The next section illustrates the best optimization options for each of these goals.Optimizing for Smallest Code SizeTo optimize your code for the smallest size, the best options to apply are: The MicroLIB C library Cross- module optimization Optimization level 2 ( -O2)Compile the Measure example without any optimizations The Measure example uses analog and digital inputs to simulate a data l ogger. File - Open ProjectC: Keil ARMBoards Keil MCBSTM32MeasureMeasure.uv2Click the Options for Target button In the Target tab: Uncheck Cross- Module Optimization Uncheck Use MicroLIB Uncheck Use Link- Time Code Generation In the C/C+ tab: Set Optimization Level to ZeroThen click OK to save your changes. Project Build target Without any compiler optimizations applied, the initial code size is 13,656 Bytes.MDK Compiler Optimizations Optimize the Measur e example for Size Apply the compiler optimizations in turn, and re-compile each time to see their effect in reducing the code size for theexample. Options for Target Target tab: Use the MicroLIB C library Options for Target Target tab: Use cross - mod ule optimization - Remember to compile twice Options for Target C/C+ tab: Enable Optimization level 2 ( -O2)Optimization Applied Compile Size Size Reduction Improvement MicroLIB C library 8,960 Bytes 4,696 Bytes 34% smallerCross- Module Compilation 13,500 Bytes 156 Bytes 1.1% smallerOptimization level O2 12,936 Bytes 720 Bytes 5.3% smallerAll 3 optimization options 8,116 Bytes 5,540 Bytes 40.6% smaller Applying all the optimizations will reduce the code size down to 8,116 Bytes.The fully optimized code is 5,540 Bytes smaller, a total code size reduction of 40.6%MDK Compiler Optimizations Optimizing for Best PerformanceTo optimize your code for performance, the best options to apply are: Cross- module optimization Optimization level 3 ( -O3) Optimize for timeRun the Dhrystone benchmark without any optimizationsThe Dhrystone benchmark is used to measure and compare the performance of different computers, or the efficiency of thecode generated for the same computer by different compilers. File Open Project C: Keil ARMExamples DHRY DHRY.uv2 Click the Options for Target button Turn off optimization settings in the Target and C/C+ tabs , then click OK Project Build target Enter D ebug mode View Se rial Windows UART #1 Open the UART #1 window View Analysis Windows Performance Analyzer Open the Performance Analyzer Debug Run Start running the application When prompted: Enter 50000 in the UART#1 window and press EnterIn the Performance Analyzer window, note that The drhy_1 loop took 2.829s The dhry_2 took 2.014sIn the UAR T #1 window, note that It took 138.0 ms for 1 run through Dhrystone The application is executing 7246.4 Dhrystones per second Optimize the Dhrystone example for PerformanceRe-compile the example with all three of the following optimizations applied: Options f or Target Target tab: Cross - module optimization Remember to compile twice Options for Target C/C+ tab: Optimization level 3 ( -O3) Options for Target C/C+ tab: Optimize for TimeRe-run the application, and examine the performance. Measurement Without optimizations With Optimizations Improvement dhry_1 2.829s 1.695s 40.1% fasterdhry_2 2.014s 1.011s 49.8% fasterMicroseconds for 1 runthrough Dhrystone138.0 70 49.3% fasterDhrystones per second 7246.4 14,285.7 97.1% more The fu lly optimize d code achieves approximate ly 2x the performance of the un -optimized code.SummaryThe ARM Compilation Tools offer a range of options to apply when compiling your code. These options can be combined tooptimize your code for best performance, for smallest code size, or for any performance point between these two extremes, tobest suit your targeted microcontroller device and market. When optimizing your code, MDK- ARM makes it easy and convenient to measure the effect of the different optimizationsett ings on your application. The code size is clearly displayed after compilation, and a range of analysis tools such as thePerformance Analyzer enable you to measure performance. The optimization options in the ARM Compilation Tools, together with the easy- to - use analysis tools in MDK - ARM, helpyou to easily optimize your application to meet your specific requirements.获得最佳优化的代码为您的嵌入式应用ARM编译工具ARM编译工具是唯一的编译工具与ARM处理器共同开发,并专门最佳支持ARM架构。他们是20多年的发展,被确认为业界领先的C和C编译工具的手臂,拇指和拇指-2指令集。ARM编译工具包括:ARM编译器,它使您能够编译C和C代码。这是一个优化的编译器,功能命令 - 行选项,使您能够控制的优化级别连接器和实用程序,分配地址和代码段,形成最终的图像库的选择,包括ISO标准C库,以及新增加的microlib这是优化的C库嵌入式应用汇编器,生成机器代码指令的ARM,Thumb或Thumb-2汇编级源代码用于嵌入式应用的编译器选项ARM编译工具包括编译器优化,以帮助您最好针对您的代码,您所选择的一些微控制器的设备和应用领域。他们可以从Vision访问点击项目 - 目标选项。他选择本文档描述的目标,C / C + +目标“对话框的选项标签上可以找到。MDK编译优化跨模块优化信息从之前的构建,并使用它来将未使用的功能集成到他们相应的对象文件的ELF节。该选项也被称为链接器反馈,并且需要您在建立你的应用程序,两次采取副词antage的减少代码大小。跨模块优化已经证明,以减少代码大小,从应用程序中删除未使用的功能。它还可以提高应用程序的性能,允许内嵌代码模块共享。的M icroLIB的C库已优化的嵌入式应用,以减少大小。它的一个子集的ISO标准C运行时库,并提供了功能和代码大小之间的权衡。有些标准C库memcpy()函数的功能,如速度较慢,而默认的库不支持某些功能。不支持功能包括:o操作系统的功能,例如退出中止(),(),(),(),用getenv()o宽字符和多字节支持,例如wctomb mbtowc()()stdio的文件I / O功能,除标准输入,标准输出和标准错误O位置独立的线程安全的代码使用新增加的microlib C库的整体性能的应用场合需要减少代码可以进行交易抵销大小和内存成本。链接时代码生成指示编译器创建的对象中的中间格式,使连接器可以进行进一步的优化代码。这使代码生成器的可视性 - 文件中的所有对象的依赖同时,以申请更高级别的优化。链接时代码生成,可以减少代码大小,让应用程序运行得更快。优化级别,也可以进行调整。不同层次的优化,让您取舍之间的水平调试信息可以在编译的代码,代码的性能。下面的优化水平可供选择:O - O0适用最低的优化。最优化关闭,生成的代码具有最佳的调试视图。O - O1适用于受限制的优化。例如,未使用的内联函数和未使用的静态函数将被删除。在这个层面上的优化,编译器也适用于自动优化,如去除冗余代码,并重新排序指令,所以以避免的联锁情况。生成的代码优化合理,具有良好的调试视图。O - O2适用于高优化(这是默认设置)。在这个级别应用的优化利用ARM的处理器架构的深入了解,利用给定的目标的特定处理器的行为。它产生很好的优化代码,但有限的调试视图。邻 - O3适用于日最积极的优化。的优化是根据与用户的的 - Ospace / - Otime进行选择。默认情况下,多 - 文件汇编启用,这导致更长的编译时间,但给出了最高级别的优化。时间“复选框的优化,使编译器将更加注重优化达到最佳性能检查( - O时间)或最小的代码尺寸未选中时(-O空间等)。取消选中优化时间选择 - Ospace编译选项指示编译器执行优化,以降低图像的大小,以牺牲一个POSS IBLE的执行时间增加。 F或例如,使用在线功能大型结构副本,而不是内联代码调用。这是默认的选项。当运行编译器命令行中,该选项被调用使用的-Ospace检查时间的优化选择的 - Otime选项指示编译器优化代码以最快的执行时间,图像尺寸增加的风险。建议您编译时间的关键部分您的代码 - Otime时,的其余我们ING的 - Ospace指令。拆分负载和存储倍数指示编译器LDM和STM指令涉及了大量的分割一系列的寄存器加载/存储多个寄存器较少。这意味着,可以分割成16个寄存器的LDM4个独立的4个寄存器的LDM。这个选项有助于减少中断延迟的ARM系统上不有一个缓存或写入缓冲区,系统使用零等待状态 - 32位内存。例如,ARM7和ARM9处理器ST阿克只能采取一个指令边界上的一个例外。如果异常发生时的LDM的开始的16个寄存器,在没有高速缓存的ARM7 / ARM9系统,该系统将完成16的内存访问异常。根据存储器仲裁制度,这可能会导致在一个非常高的中断延迟。也就是说处理器将打破4个寄存器分为4个独立的LDM LDM采取异常后最多可装载4个寄存器,从而大大降低了中断延迟。选择此选项可提高系统的整体性能。一个ELF节每个功能选项告诉编译器将所有功能集成到自己的个人ELF的章节。这允许链接器删除未使用的功能。一个ELF代码段通常包含多项功能的代码。链接器通常只能够删除未使用的ELF节,而不是未使用的功能。一个ELF节只能所有内容都被删除,如果未使用。因此,每个功能拆分到它自己的ELF节使编译器可以很容易地识别哪些是未使用的,并删除它们。选择此选项会增加编译代码所需的时间,但在提高性能的结果。应用选项的组合将取决于你的优化目标 - 无论你是最小的代码优化的大小,或者最佳的性能。下一节将说明这些目标的最优化选择。最小的代码大小优化要优化你的代码的最小尺寸,适用的最佳选择是:新增加的microlib C库跨模块优化优化级别2(O2)没有任何优化编译测量示例测量例如使用模拟数据升ogger的模拟和数字输入。“文件” - “打开项目”C: KEIL ARM 板 KEIL MCBSTM32 测量 Measure.uv2上单击“目标”选项按钮“在“目标”选项卡:取消选中“跨模块优化取消使用microlib中取消选中“使用链接时代码生成在C / C + +选项卡:优化级别设置到零然后点击“确定”保存更改。项目 - 构建目标没有任何编译器优化应用,最初的代码大小是13,656字节。MDK编译优化尺寸优化的MEASUR例子反过来,编译器优化应用并重新编译每次看他们的效果,减少代码大小例子。目标选项“ - ”目标“选项卡:使用新增加的microlib C库目标选项“ - ”目标“选项卡:使用交叉 - MOD ULE优化 - 请记住,两次编译目标选项 - C / C选项卡:启用优化级别2(O2)优化应用编译尺寸大小减少改善microlib中C库8,960字节4,696字节小34跨模块编译13,500字节156字节小1.1优化级别 - O2 12,936字节720字节小5.3所有的优化选项8,116字节5,540字节小40.6应用的所有优化将会减少代码大小8,116字节。全面优化的代码是5,540字节小,总的代码大小减少40.6MDK编译优化优化最佳性能要优化你的代码的性能,最好的选择,适用于:跨模块优化优化级别3(O3)优化时间没有任何优化,运行Dhrystone基准Dhrystone基准是用来衡量和比较不同的计算机的性能或效率的由不同的编译器生成的代码在同一台计算机。“文件” - “打开项目”C: KEIL ARM 示例 DHRY DHRY.uv2的单击“目标”选项按钮“关闭优化设定目标和C / C + +选项卡,然后单击“确定”项目 - 构建目标输入D ebug模式视图 - SE现实的Windows - UART1打开UART1窗口景观 - 分析的Windows - 性能分析器打开性能分析器调试“ - ”运行“开始运行的应用程序当系统提示:在UART1窗口中输入50000,然后按Enter在性能分析器窗口,请注意drhy_1的循环用了2.829sdhry_2了2.014s在阿联T1窗口,请注意,花了138.0毫秒1通过运行Dhrystone的应用程序执行每秒7246.4根据Dhrystones性能优化Dhrystone示例重新编译应用下列优化所有三个例子:选项f或目标 - 目标“选项卡: - 跨模块优化 - 记住两次编译目标选项 - C / C标签:优化级别3(O3)目标 - C / C标签选项:优化时间重新运行应用程序,并检查其性能。没有优化优化改进的测量快dhry_1 2.829s 1.695s 40.1快dhry_2 2.014s 1.011s 49.8微秒1运行通过Dhrystone的快138.0 70 49.3根据Dhrystones每秒7246.4 14,285.7 97.1福LLY优化二维码达到近似两倍的性能未优化的代码。总结ARM编译工具提供一系列的选项编译代码时适用。这些选项可以组合优化你的代码以获得最佳性能,最小的代码大小,或在这两个极端之间的任何性能的角度,最适合您的针对性的单片机和市场。当MDK-ARM优化你的代码,使得它容易和方便地测量了不同的优化效果SETT INGS您的应用程序。在编译后的代码大小清楚地显示,一系列的分析工具,如性能分析器允许你来衡量绩效。ARM编译工具的优化选项,再加上容易 - 在MDK使用分析工具 - ARM,帮助您可以轻松地优化你的应用程序,以满足您的特定需求。级别说明0常数合并:编译器预先计算结果,尽可能用常数代替表达式。包括运行地址计算。优化简单访问:编译器优化访问8051系统的内部数据和位地址。跳转优化:编译器总是扩展跳转到最终目标,多级跳转指令被删除。1死代码删除:没用的代码段被删除。拒绝跳转:严密的检查条件跳转,以确定是否可以倒置测试逻辑来改进或删除。2数据覆盖:适合静态覆盖的数据和位段被确定,并内部标识。BL51连接/定位器可以通过全局数据流分析,选择可被覆盖的段。3窥孔优化:清除多余的MOV指令。这包括不必要的从存储区加载和常数加载操作。当存储空间或执行时间可节省时,用简单操作代替复杂操作。KEIL C 优化详细分析2011-01-


  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。


