编译器keil的优化选项针对ARM

编译器_keil的优化选项问题

分类：编译器类2013-01-11 14:12 280人阅读评论(0) 收藏举报applicationApplicationcompilationcompileroptimizationperformancePerformance

最近发现在keil编译的时候，出现莫名的问题，貌似代码有被优化掉的问题，后来查了下相关的资料，貌似懂了点。

我选择的是默认的default优化方式，上网看了下，默认的是level2级别优化，最后选择level0就没有问题了

下面是网上找的资料，介绍了优化功能介绍

Getting the Best Optimized Code for your Embedded Application

ARM Compilation Tools

The ARM Compilation Tools are the only compilation tool s co -developed with the ARM processors, and specifically

designed to optimally support the ARM architecture. They are a result of 20 years of development, and are recognized as the

industry -leading C and C++ compilation tools for the ARM, Thumb, and Thumb -2 instructions sets.

The ARM Compilation tools consist of:

? The ARM Compiler, which enables you to compile C and C++ code. It is an optimizing compiler, and features

command - line options to enable you to control the level of optimization

? Linker and Utilities, which assign addresses and lay out sections of code to form a final image

? A selection of libraries, including the ISO standard C libraries, and the MicroLIB C library which is optimized for

embedded applications

? Assembler, which generates machine code instructions from ARM, Thumb or

Thumb-2 assembly- level source code

Compiler Options for Embedded Applications

The ARM Compilation Tools include a number of compiler optimizations to help you best

target your code for your chosen

microcontroller device and application area.

They can be accessed from within μVision by clicking on Project –Options for Target. T he options described this document can be found on the Target an

d C/C++ tabs of th

e Options for Targets dialog.

MDK Compiler Optimizations

? Cross- Module

Optimization takes information from a prior build and uses it to place UNUSED functions into their

own ELF section in the corresponding object file. This option is also known as Linker Feedback, and requires you to

build your application twice to take adv antage of it for reduced code size.

Cross-Module Optimization has been shown to reduce code size, by removing unused functions from your application. It

can also improve the performance of your application, by allowing modules to share inline code.

? The M icroLIB C library

has been optimized to reduce the size of embedded applications. It is a subset of the ISO

standard C runtime library, and offers a tradeoff between functionality and code size. Some of the standard C library

functions such as memcpy() are slower, while some features of the default library are not supported. Unsupported

features include:

o Operating system functions e.g. abort(), exit(), time(), system(), getenv(),

o Wide character and multi-byte support e.g. mbtowc(), wctomb()

o The stdio file I/O function, with the exception of stdin, stdout and stderr

o Position-independent and thread -safe code

Use the MicroLIB C library for applications where overall performance can be traded off against the need to reduce code

size and memory cost.

? Link- Time Code Generation instructs the compiler to create objects in an intermediate format so that the linker can

perform further code optimizations. This gives the code generator visibility into cross - file dependencies of all objects

simultaneously, allowing it t o apply a higher level of optimizations. Link -time code generation can reduce code size, and

allow your application to run faster.

? Optimization Levels can also be adjusted. The different levels of optimization allow you to trade off between the level

of debug information available in the compiled code, and the performance of the code. The following optimization levels

are available:

o - O0 applies minimum optimizations.

Most optimizations are switched off, and the code generated has the best debug view.

o - O1 applies restricted optimization.

For example, unused inline functions and unused static functions are removed. At this level of optimization, the

compiler also applies automatic optimizations such as removing redundant code and re -ordering instructions s o

as to avoid an interlock situation. The code generated is reasonably optimized, with a good debug view.

o - O2 applies high optimization (This is the default setting).

Optimizations app lied at this level take advantage of ARM’s in-depth knowledge of the processor architecture,

to exploit processor -specific behavio r of the given target. It generates well optimized code, but with limited

debug view.

o - O3 applies th e most aggressive optimization.

The optimization is in accordance with the user’s – Ospace/- Otime choice . By default, multi - file compilation is

enabled, which leads to a longer compile time, but gives the highest levels of

optimization.

? The Optimize for Time checkbox causes the compiler to optimize with a greater focus on achieving the best

performance when checked ( - O time) or the smallest code siz e when unchecked ( -O space).

Unchecking Optimize for Time selects the – Ospace option which instructs the compiler to perform optimizations to

reduce the image size at the expense of a poss ible increase i n execution time. F or example, using out -of -line function

calls instead of inline code for large structure copies. This is the default option. When running the compiler from the

command line, this option is invoked using ‘ -Ospace’

Checking Optimize for Time selects the – Otime option which instructs the compiler to optimize the code for the fastest

execution time, at the risk of an increase in the image size. It is recommended that you compile the time -critical parts of

your code with – Otime, and the rest us ing the – Ospace directive .

? Split Load and Store Multiples instructs the compiler to split LDM and STM instructions involving a large number of

registers into a series of loads/stores of fewer multiple registers. This means that an LDM of 16 registers can be split into

4 separate LDMs of 4 registers each. This option helps to reduce the interrupt latency on ARM systems which do not

have a cache or write buffer, and systems which use zero - wait state 32-bit memory.

For example, the ARM7 and ARM9 processor s t ake can only take an exception on an instruction boundary. If an

exception occurs at the start of an LDM of 16 registers in a cacheless ARM7

/ARM9 system, the system will finish

making 16 accesses to memory before taking the exception. Depending on the memory arbitration system, this can result

in a very high interrupt latency. Breaking the LDM into 4 individual LDMs for 4 registers means that the processor will

take the exception after loading a maximum of 4 registers, thereby greatly reducing the

interrupt latency.

Selecting this option improves the overall performance of the system.

? The One ELF Section per Function option tells the compiler to put all functions into their own individual ELF

sections. This allows the linker to remove unused functions.

An ELF code section typically contains the code for a number of functions. The linker is normally only able to remove

unused ELF sections, not unused functions. An ELF section can only be removed if all its contents are unused.

Therefore, splitting each function into its own ELF section allows the compiler to easily identify which ones are unused,

and remove them.

Selecting this option increases the time required to compile your code, but results in improved performance .

The combination of options applied will depend on your optimization goal – whether you are optimizing for smallest code

size, or best performance.

The next section illustrates the best optimization options for each of these goals.

Optimizing for Smallest Code Size

To optimize your code for the smallest size, the best options to apply are:

? The MicroLIB C library

? Cross- module optimization

? Optimization level 2 ( -O2)

Compile the Measure example without any optimizations

The Measure example uses analog and digital inputs to simulate a data l ogger.

File -- Open Project

C: \Keil \ARM\Boards \Keil \MCBSTM32\Measure\Measure.uv2

Click the Options for Target button

In the Target tab:

? Uncheck Cross- Module Optimization

? Uncheck Use MicroLIB

? Uncheck Use Link- Time Code Generation

In the C/C++ tab:

? Set Optimization Level to Zero

Then click OK to save your changes.

Project – Build target

Without any compiler optimizations applied, the initial code size is 13,656 Bytes.

MDK Compiler Optimizations

Optimize the Measur e example for Size

Apply the compiler optimizations in turn, and re-compile each time to see their effect in reducing the code size for the

example.

? Options for Target – Target tab: Use the MicroLIB C library

? Options for Target – Target tab: Use cross - mod ule optimization - Remember to compile twice

? Options for Target –C/C++ tab: Enable Optimization level 2 ( -O2)

Optimization Applied Compile Size Size Reduction Improvement

MicroLIB C library 8,960 Bytes 4,696 Bytes 34% smaller

Cross- Module Compilation 13,500 Bytes 156 Bytes 1.1% smaller

Optimization level – O2 12,936 Bytes 720 Bytes 5.3% smaller

All 3 optimization options 8,116 Bytes 5,540 Bytes 40.6% smaller

Applying all the optimizations will reduce the code size down to 8,116 Bytes.

The fully optimized code is 5,540 Bytes smaller, a total code size reduction of 40.6%

MDK Compiler Optimizations

Optimizing for Best Performance

To optimize your code for performance, the best options to apply are:

? Cross- module optimization

? Optimization level 3 ( -O3)

? Optimize for time

Run the Dhrystone benchmark without any optimizations

The Dhrystone benchmark is used to measure and compare the performance of different computers, or the efficiency of the

code generated for the same computer by different compilers.

File –Open Project

C: \Keil \ARM\Examples \DHRY \DHRY.uv2

Click the Options for Target button

Turn off optimization settings in the Target and C/C++ tabs , then

click OK

Project – Build target

Enter D ebug mode

View –Se rial Windows –UART #1

Open the UART #1 window

View –Analysis Windows –Performance Analyzer

Open the Performance Analyzer

Debug –Run

Start running the application

When prompted:

Enter 50000 in the UART#1 window and press Enter

In the Performance Analyzer window, note that

? The drhy_1 loop took 2.829s

? The dhry_2 took 2.014s

In the UAR T #1 window, note that

? It took 138.0 ms for 1 run through Dhrystone

? The application is executing 7246.4 Dhrystones per second

Optimize the Dhrystone example for Performance

Re-compile the example with all three of the following optimizations applied:

? Options f or Target – Target tab: Cross - module optimization –Remember to compile twice

? Options for Target –C/C++ tab: Optimization level 3 ( -O3)

? Options for Target –C/C++ tab: Optimize for Time

Re-run the application, and examine the performance.

Measurement Without optimizations With Optimizations Improvement

dhry_1 2.829s 1.695s 40.1% faster

dhry_2 2.014s 1.011s 49.8% faster

Microseconds for 1 run

through Dhrystone

138.0 70 49.3% faster

Dhrystones per second 7246.4 14,285.7 97.1% more

The fu lly optimize d code achieves approximate ly 2x the performance of the un

-optimized code.

Summary

The ARM Compilation Tools offer a range of options to apply when compiling your code. These options can be combined to

optimize your code for best performance, for smallest code size, or for any performance point between these two extremes, to

best suit your targeted microcontroller device and market.

When optimizing your code, MDK- ARM makes it easy and convenient to measure the effect of the different optimization

sett ings on your application. The code size is clearly displayed after compilation, and a range of analysis tools such as the

Performance Analyzer enable you to measure performance.

The optimization options in the ARM Compilation Tools, together with the easy- to - use analysis tools in MDK - ARM, help

you to easily optimize your application to meet your specific requirements.

获得最佳优化的代码为您的嵌入式应用

ARM编译工具

ARM编译工具是唯一的编译工具与ARM处理器共同开发，并专门

最佳支持ARM架构。他们是20多年的发展，被确认为

业界领先的C和C编译工具的手臂，拇指和拇指-2指令集。

ARM编译工具包括：

?ARM编译器，它使您能够编译C和C代码。这是一个优化的编译器，功能

命令- 行选项，使您能够控制的优化级别

?连接器和实用程序，分配地址和代码段，形成最终的图像

?库的选择，包括ISO标准C库，以及新增加的microlib这是优化的C库

嵌入式应用

?汇编器，生成机器代码指令的ARM，Thumb或Thumb-2汇编级源代码

用于嵌入式应用的编译器选项

ARM编译工具包括编译器优化，以帮助您最好针对您的代码，您所选择的一些

微控制器的设备和应用领域。

他们可以从μVision访问点击项目- 目标选项。

他选择本文档描述的目标，C / C + +目标“对话框的选项标签上可以找到。

MDK编译优化

?跨模块

优化信息从之前的构建，并使用它来将未使用的功能集成到他们

相应的对象文件的ELF节。该选项也被称为链接器反馈，并且需要您在

建立你的应用程序，两次采取副词antage的减少代码大小。

跨模块优化已经证明，以减少代码大小，从应用程序中删除未使用的功能。它还可以提高应用程序的性能，允许内嵌代码模块共享。

?的M icroLIB的C库

已优化的嵌入式应用，以减少大小。它的一个子集的ISO

标准C运行时库，并提供了功能和代码大小之间的权衡。有些标准C库memcpy（）函数的功能，如速度较慢，而默认的库不支持某些功能。不支持功能包括：

o操作系统的功能，例如退出中止（），（），（），（），用getenv（）

o宽字符和多字节支持，例如wctomb mbtowc（）（）

○stdio的文件I / O功能，除标准输入，标准输出和标准错误

O位置独立的线程安全的代码

使用新增加的microlib C库的整体性能的应用场合需要减少代码可以进行交易抵销大小和内存成本。

?链接时代码生成指示编译器创建的对象中的中间格式，使连接器可以

进行进一步的优化代码。这使代码生成器的可视性- 文件中的所有对象的依赖

同时，以申请更高级别的优化。链接时代码生成，可以减少代码大小，

让应用程序运行得更快。

?优化级别，也可以进行调整。不同层次的优化，让您取舍之间的水平

调试信息可以在编译的代码，代码的性能。下面的优化水平

可供选择：

O - O0适用最低的优化。

最优化关闭，生成的代码具有最佳的调试视图。

O - O1适用于受限制的优化。

例如，未使用的内联函数和未使用的静态函数将被删除。在这个层面上的优化，编译器也适用于自动优化，如去除冗余代码，并重新排序指令，所以

以避免的联锁情况。生成的代码优化合理，具有良好的调试视图。

O - O2适用于高优化（这是默认设置）。

在这个级别应用的优化利用ARM的处理器架构的深入了解，

利用给定的目标的特定处理器的行为。它产生很好的优化代码，但有限的

调试视图。

邻- O3适用于日最积极的优化。

的优化是根据与用户的的- Ospace / - Otime进行选择。默认情况下，多- 文件汇编启用，这导致更长的编译时间，但给出了最高级别的优化。

?时间“复选框的优化，使编译器将更加注重优化达到最佳

性能检查（- O时间）或最小的代码尺寸ê未选中时（-O空间等）。

取消选中优化时间选择- Ospace编译选项指示编译器执行优化，以

降低图像的大小，以牺牲一个POSS IBLE的执行时间增加。F或例如，使用在线功能

大型结构副本，而不是内联代码调用。这是默认的选项。当运行编译器

命令行中，该选项被调用使用'的-Ospace'

检查时间的优化选择的- Otime选项指示编译器优化代码以最快的

执行时间，图像尺寸增加的风险。建议您编译时间的关键部分

您的代码- Otime时，的其余我们ING的- Ospace指令。

?拆分负载和存储倍数指示编译器LDM和STM指令涉及了大量的分割

一系列的寄存器加载/存储多个寄存器较少。这意味着，可以分割成16个寄存器的LDM

4个独立的4个寄存器的LDM。这个选项有助于减少中断延迟的ARM系统上不

有一个缓存或写入缓冲区，系统使用零等待状态- 32位内存。

例如，ARM7和ARM9处理器ST阿克只能采取一个指令边界上的一个例外。如果

异常发生时的LDM的开始的16个寄存器，在没有高速缓存的ARM7 / ARM9系统，该系统将完成

16的内存访问异常。根据存储器仲裁制度，这可能会导致

在一个非常高的中断延迟。也就是说处理器将打破4个寄存器分为4个独立的LDM LDM

采取异常后最多可装载4个寄存器，从而大大降低了中断延迟。

选择此选项可提高系统的整体性能。

?一个ELF节每个功能选项告诉编译器将所有功能集成到自己的个人ELF

的章节。这允许链接器删除未使用的功能。

一个ELF代码段通常包含多项功能的代码。链接器通常只能够删除

未使用的ELF节，而不是未使用的功能。一个ELF节只能所有内容都被删除，如果未使用。因此，每个功能拆分到它自己的ELF节使编译器可以很容易地识别哪些是未使用的，

并删除它们。

选择此选项会增加编译代码所需的时间，但在提高性能的结果。

应用选项的组合将取决于你的优化目标- 无论你是最小的代码优化

的大小，或者最佳的性能。

下一节将说明这些目标的最优化选择。

最小的代码大小优化

要优化你的代码的最小尺寸，适用的最佳选择是：

新增加的microlib C库

?跨模块优化

?优化级别2（O2）

没有任何优化编译测量示例

测量例如使用模拟数据升ogger的模拟和数字输入。

“文件”- “打开项目”

C：\ KEIL \ ARM \板\ KEIL \ MCBSTM32 \测量\ Measure.uv2上单击“目标”选项按钮“

在“目标”选项卡：

?取消选中“跨模块优化

?取消使用microlib中

?取消选中“使用链接时代码生成

在C / C + +选项卡：

?优化级别设置到零

然后点击“确定”保存更改。

项目- 构建目标

没有任何编译器优化应用，最初的代码大小是13,656字节。

MDK编译优化

尺寸优化的êMEASUR例子

反过来，编译器优化应用并重新编译每次看他们的效果，减少代码大小

例子。

?目标选项“- ”目标“选项卡：使用新增加的microlib C库

?目标选项“- ”目标“选项卡：使用交叉- MOD ULE优化- 请记住，两次编译?目标选项- C / C选项卡：启用优化级别2（O2）

优化应用编译尺寸大小减少改善

microlib中C库8,960字节4,696字节小34％

跨模块编译13,500字节156字节小1.1％

优化级别- O2 12,936字节720字节小5.3％

所有的优化选项8,116字节5,540字节小40.6％

应用的所有优化将会减少代码大小8,116字节。

全面优化的代码是5,540字节小，总的代码大小减少40.6％

MDK编译优化

优化最佳性能

要优化你的代码的性能，最好的选择，适用于：

?跨模块优化

?优化级别3（O3）

?优化时间

没有任何优化，运行Dhrystone基准

Dhrystone基准是用来衡量和比较不同的计算机的性能或效率的由不同的编译器生成的代码在同一台计算机。

“文件”- “打开项目”

C：\ KEIL \ ARM \示例\ DHRY \ DHRY.uv2的

单击“目标”选项按钮“

关闭优化设定目标和C / C + +选项卡，然后单击“确定”项目- 构建目标

输入D ebug模式

视图- SE现实的Windows - UART＃1

打开UART＃1窗口

景观- 分析的Windows - 性能分析器

打开性能分析器

调试“- ”运行“

开始运行的应用程序

当系统提示：

在UART＃1窗口中输入50000，然后按Enter

在性能分析器窗口，请注意

?drhy_1的循环用了2.829s

?dhry_2了2.014s

在阿联T＃1窗口，请注意，

?花了138.0毫秒1通过运行Dhrystone的

?应用程序执行每秒7246.4根据Dhrystones

性能优化Dhrystone示例

重新编译应用下列优化所有三个例子：

?选项f或目标- 目标“选项卡：- 跨模块优化- 记住两次编译?目标选项- C / C标签：优化级别3（O3）

?目标- C / C标签选项：优化时间

重新运行应用程序，并检查其性能。

没有优化优化改进的测量

快dhry_1 2.829s 1.695s 40.1％

快dhry_2 2.014s 1.011s 49.8％

微秒1运行

通过Dhrystone的

快138.0 70 49.3％

根据Dhrystones每秒7246.4 14,285.7 97.1％

福LLY优化二维码达到近似两倍的性能未优化的代码。

总结

ARM编译工具提供一系列的选项编译代码时适用。这些选项可以组合

优化你的代码以获得最佳性能，最小的代码大小，或在这两个极端之间的任何性能的角度，最适合您的针对性的单片机和市场。

当MDK-ARM优化你的代码，使得它容易和方便地测量了不同的优化效果

SETT INGS您的应用程序。在编译后的代码大小清楚地显示，一系列的分析工具，如

性能分析器允许你来衡量绩效。

ARM编译工具的优化选项，再加上容易- 在MDK使用分析工具- ARM，帮助

您可以轻松地优化你的应用程序，以满足您的特定需求。

级别说明

0常数合并：编译器预先计算结果，尽可能用常数代替表达式。包括运行地址计算。

优化简单访问：编译器优化访问8051系统的内部数据和位地址。

跳转优化：编译器总是扩展跳转到最终目标，多级跳转指令被删除。

1死代码删除：没用的代码段被删除。

拒绝跳转：严密的检查条件跳转，以确定是否可以倒置测试逻辑来改进或删除。

2数据覆盖：适合静态覆盖的数据和位段被确定，并内部标识。BL51连接/定位器可以通过

全局数据流分析，选择可被覆盖的段。

3窥孔优化：清除多余的MOV指令。这包括不必要的从存储区加载和常数加载操作。当存

储空间或执行时间可节省时，用简单操作代替复杂操作。

KEIL C 优化详细分析

2011-01-26 | 阅：818 转：9 | 分享

Keil C51总线外设操作问题的深入分析

阅读了《单片机与嵌入式系统应用》2005年第10期杂志《经验交流》栏目的一篇文章《Ke C51对同一端口的连续读取方法》(原文)后,笔者认为该文并未就此问题进行深入准确的分析章中提到的两种解决方法并不直接和简单。笔者认为这并非是Keil C51中不能处理对一个端进行连续读写的问题,而是对Kei1 C51的使用不够熟悉和设计不够细致的问题,因此特撰写本文本文中对原文提到的问题,提出了三种不同于原文的解决方法。每种方法都比原文中提到方法更直接和简单,设计也更规范。(无意批评,请原文作者见谅)

1 问题回顾和分析

原文中提到：在实际工作中遇到对同一端口反复连续读取,Keil C51编译并未达到预期的果。原文作者对C编译出来的汇编程序进行分析发现,对同一端口的第二次读取语句并未被编译但可惜原文作者并未分析没有被编译的原因,而是匆忙地采用一些不太规范的方法试验出了两解决办法。

对此问题,翻阅Keil C51的手册很容易发现：KeilC51的编译器有一个优化设置,不同的优化设置,会产生不同的编译结果。一般情况缺省编译优化设置被设定为8级优化,实际最高可设为9级优化：

1. Dead code elimination。

2.Data overlaying。

3.Peephole optimization。

4.Register variables。

https://www.360docs.net/doc/5316050507.html,mon subexpression elimination。

6.Loop rotation。

7.Extended Index Access Optimizing。

8.Reuse Common Entry Code。

https://www.360docs.net/doc/5316050507.html,mon Block Subroutines。

而以上的问题,正是由于Keil C51编译优化产生的。因为在原文程序中将外设地址直接按下定义：

unsigned char xdata MAX197 _at_ 0x8000

采用_at_将变量MAX197定义到外部扩展RAM 指定地址0x8000。因此,Keil C51优化编译所当然认为重复读第二次是没有用的,直接用第一次读取的结果就可以了,因此编译器跳过了第二条读取语句。至此,问题就一目了然了。

2 解决方法

由以上分析很容易就能提出很好的解决办法。

2．1 最简单最直接的办法

程序一点都不用修改,将Keil C51的编译优化选择设置为0(不优化)就可以了。选择proje 窗口的Target,然后打开“Options for Target”设置对话框,选择“C51”选项卡,将“C ode Optimiztaion”中的“Level”选择为“0:Costant folding”。再次编译后,大家会发现编译果为：

CLR MAXHBEN

MOV DPTR,#MAX197

MOVX A,@DPTR

MOV R7,A

MOV down8,R7

SETB MAXHBEN

MOV DPTR,#MAX197

MOVX A,@DPTR

MOV R7,A

MOV up4,R7

两次读取操作都被编译出来了。

2．2 最好的方法

告诉Keil C51,这个地址不是一般的扩展RAM,而是连接的设备,具有“挥发”特性,每次读都是有意义的。可以修改变量定义,增加“volatile”关键字说明其特征：

unsigned char volatile xdata MAX197 _at_ 0x8000；

也可以在程序中包含系统头文件；“#include”,然后在程序中修改变量,定义直接地址：

#define MAX197 XBYTE[0x8000]

这样,Keil C51的设置仍然可以保留高级优化,且编译结果中,同样两次读取并不会被优化过。

2 3 硬件解决方法

原文中将MAX197的数据直接连接到数据总线,而对地址总线并未使用,采用一根端口线选