版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Arm Custom Instructions for Armv8-M: Enabling Innovation and Greater Flexibility on ArmToday, the Arm architecture enables developers to write high-performance portable code, capable of running unmodified on billions of devices. In contrast, the decline of Moores law, combined with ever increasing d
2、emands for computational performance at the edge, lead to a need for product customization and specialization. To address these challenges, Arm Custom Instructions for Armv8-M offer implementers the ability to make tradeoffs specific to their own goals.This white paper introduces Arm Custom Instruct
3、ions for Armv8-M, which enable further innovation with Arm-based systemdesigns, highlighting their features and benefits.What are Arm Custom Instructions for Armv8-M?Arm Custom Instructions for Armv8-M enable chip designers to push performance and efficiency further by adding application domain-spec
4、ific features in small embedded processors, while maintaining the ecosystem advantages of Arm processors. In this context, Arm Custom Instructions for Armv8-M aim to:Enable differentiation by giving you the power to innovate within the proven Arm architecture, a worldwide standard.Reduce time to mar
5、ket when exploring new classes of user-defined instructions for emerging algorithms and applications.Develop a domain-specific architecture by allowing you to implement a customized accelerator with an Arm architecturally compliant CPU as a container.One way to categorize accelerators based on the c
6、onnection to the CPU is:Memory-mapped accelerators, such as a GPU, directly connected to the memory bus.Coprocessor interface, recently introduced on the Arm Cortex-M33 processor, enabling you to build closely coupled accelerators under the direct control of the CPU.Arm Custom Instructions for Armv8
7、-M further expand this view of hardware accelerators by enabling tightly coupled accelerators with even closer coupling with the datapath of the processor.1. Memory-mapped2. Coprocessor3. Tightly coupledDecoupled from CPUCan have wide register size and direct access to memoryNot limited by memory ba
8、ndwidth of CPURuns in parallel to CPUIntegrated with CPUAdditional customer registersHigh-throughput interface with CPU (64 bits per cycle)Ideal for medium latency operations (3 cycles or more)Runs in parallel to CPUTightly Integrated with CPUAccess to standard registersIdeal for low-latency operati
9、ons(1 to 3 cycles)Instructions interleaved with Arm standard instructionsFigure 1. Arm Custom Instructions for Armv8-M introduce the abilityto add tightly coupled accelerators to Arm Cortex-M CPUsHow do Arm Custom Instructions for Armv8-M Work?Arm redefines the coprocessor Instruction Set Architecur
10、e (ISA) encoding space to enable custom instructions using the Arm architectural registers and flags. You can use this encoding space to add your own, differentiating data processing instructions without compromising performance.Arm Custom Instructions for Armv8-M add a customizable module inside th
11、e processor. This module is driven by the pre-decoded instructions and shares the same interface as the standard Arithmetic Logic Unit (ALU) of the CPU. This configuration space enables you to design your own operations where:The CPU manages control and dependencies.Instructions are either single-cy
12、cle or multi-cycle and can be pipelined.Implementations might contain one configuration space for the CPU and its general purpose registers, and one configuration space for the Floating-Point Unit (FPU) and its floating-point registers.There are multiple regions of the encoding space available for c
13、ustomization. You can choose how many regions to use, up to eight, based on the type of instructions you want to implement. For the regions that are not used, Arm decodes the instruction either for the coprocessor interface, if present and enabled, or as a NOCP exception, if not present or not acces
14、sible.Figure 2. Arm Custom Instructions for Armv8-M configuration spaceAdding custom instructions to a customizable CPU requires two steps:Providing a configuration file that lists the regions you want to use for adding your own custom instructions.Building the datapath for your own custom instructi
15、ons and integrating it into the configuration space.Decoding logic is automatically configured to decode your custom instructions and control your custom datapath. On top of the decoder, the CPU resolves all required control signals, instruction interlocking, and data dependencies.The custom ALUs fo
16、llow the execution resources available on the customizable CPU: Operating out of the extended register file, that is registers in the Floating-Point Unit (FPU) or M-profile Vector Extension (MVE), is only possible on a CPU thatimplements the extended register file.The support for multi-cycle instruc
17、tions matches the supported latencies in the customizable CPU.Arm provides all required control signals and operands, and writes results into the register file for the custom datapath. Arm control logic handles all hazarding logic. As a result, any declared required operand or flag, and any declared
18、 result write, requires the appropriate hazarding to be handled, even if not used by the custom instructions.For CPUs with a coprocessor interface, the encoding space for each coprocessorcan be dedicated to either the external coprocessor or the customizable ISA extension, with mutual exclusion.Base
19、d on the configuration file you provided, Arm configures the instruction decoder and provides all control logic to drive your custom datapath. Arm also verifies all the control logic interlocks and forwarding. You design and verify the custom datapath.Arm provides a set of assertions and properties
20、to check compliance with the custom datapath interface protocol.Arm provides a testbench to verify the integration of the custom instructions into the customizable CPU. You develop the integration test suites to execute the custom instructions and check correctness.Which Custom Instructions are Now
21、Enabled?Arm introduces 2 3 classes of instruction extension in the coprocessor instruction space:Three classes operate on the general-purpose register file, including the condition code flags APSR_nzcv.Three classes operate on the floating-point/Single Instruction Multiple Data (SIMD) register file
22、only. The three classes are defined by the following instruction patterns: , , , The destination register or the destination register pair of an instruction might be read, as well as written (non-accumulator and accumulator variants).The operation code can be split between a true operation code in t
23、he custom datapath and an immediate value used in the custom datapath.Immediate consequences of the above are:No operations on the floating-point registers can set condition codes. There are no operations using registers from both register files.Operations on the general-purpose register file operat
24、e on 32-bit registers, or a dual-register consisting of a 64-bit value constructed from an even-numbered, general-purpose register and its immediately following odd pair.Table 1. General-purposeregisters and NZCV flagsTable 2. FPU/M- Profile Vector Extension (MVE) registersFor each class, the imm fi
25、eld can be partitioned between what operation is being requested and a constant immediate value that the operation can consume.Example in PracticeConsider a population count popcount(uint32_t x) int n = 0;for (int i = 0; i i) & 1;return n;A standard, hand-optimized Arm implementation wo
26、uld look like this:r0, LSR #4#0 xF0F0F0F0r0#24r0,r0,r1,r0,#0 x55555555r1, r0, LSR #1r0, r1#0 x33333333 r1, r0, LSR #2r0, #0 xCCCCCCCCr1#0 x01010101r1,r1,r0,r1,r1,r0,r0,r1,r0,r0,r0,r0,MOV.WAND.W SUBS MOV.W AND.W BIC ADD MOV.W ADD.W BIC MULS LSRSThis could be replaced with a single user-defined instru
27、ction that can be implemented in one cycle:CX1A p0, r0, #0 / population in r0, return r0Such an instruction can be used either directly in a specialized library or be added as an intrinsic instruction within C code. All standard compilers conform to a future release of Arm C Language Extension (ACLE
28、) will support intrinsic functions for user- defined instructions.What are the Benefits of Arm Custom Instructions for Armv8-M?The possibility to add built-in instructions into the CPU provides significant benefits for some use cases.Arm introduces an extended datapath and instructions pre-decoded b
29、y the decode function. The CPU manages Read/Write registers, as well as control and dependencies.The set of custom instructions is defined in the Arm architecture, therefore, all major compilers support it. Also, because the encoding of the custom instructions is integrated with tools, such as compi
30、lers and debuggers, you do not need to develop or distribute any special version of the tools.Arm Custom Instructions for Armv8-M bring three main benefits: Performance boost for low-latency instructions.Easy integration with the existing software ecosystem. Scalability across Arm Cortex-M CPUs.Although a strictly compatible interface across multiple CPUs might not be achievable, Arm aims to maintain consistency between different implementations to simplify porting accelerators across the Arm Cortex-M portfolio.Innovate w
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024届新疆维吾尔自治区吐鲁番市高昌区第二中学高三3月网络考试数学试题
- 2024届上海市杨思高中高三下学期六次月考数学试题试卷
- 2024年安徽客运驾驶员试题
- 2024年广元客运从业资格证2024年考试题
- 32层高层剪力墙结构住宅施工组织设计
- 2025届云南省楚雄市生物高一第一学期期末复习检测试题含解析
- 2025届安徽省黄山市屯溪第二中学英语高三第一学期期末质量跟踪监视模拟试题含解析
- 陕西师范大学附中2025届高一生物第一学期期末统考模拟试题含解析
- 2025届江西省赣州市大余县新城中学高二数学第一学期期末调研模拟试题含解析
- 2025届山东省枣庄第八中学东校区数学高一上期末联考试题含解析
- 全国职业院校技能大赛高职组(供应链管理赛项)备赛试题库(含答案)
- 2024湖南长沙市人力社保局所属事业单位招聘历年(高频重点复习提升训练)共500题附带答案详解
- 防洪监理实施细则
- HG∕T 2469-2011 立式砂磨机 标准
- 化工企业重大事故隐患判定标准培训考试卷(后附答案)
- 河南省南阳市2023-2024学年高一上学期期中考试英语试题
- 上海市信息科技学科初中学业考试试卷及评分标准
- 2023辽宁公务员考试《行测》真题(含答案及解析)
- 冀教版数学七年级上下册知识点总结
- 高中英语校本教材《高中英语写作指导》校本课程纲要
- 2024年九年级化学上册 实验3《燃烧的条件》教学设计 (新版)湘教版
评论
0/150
提交评论