版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Arm Custom Instructions for Armv8-M: Enabling Innovation and Greater Flexibility on ArmToday, the Arm architecture enables developers to write high-performance portable code, capable of running unmodified on billions of devices. In contrast, the decline of Moores law, combined with ever increasing d
2、emands for computational performance at the edge, lead to a need for product customization and specialization. To address these challenges, Arm Custom Instructions for Armv8-M offer implementers the ability to make tradeoffs specific to their own goals.This white paper introduces Arm Custom Instruct
3、ions for Armv8-M, which enable further innovation with Arm-based systemdesigns, highlighting their features and benefits.What are Arm Custom Instructions for Armv8-M?Arm Custom Instructions for Armv8-M enable chip designers to push performance and efficiency further by adding application domain-spec
4、ific features in small embedded processors, while maintaining the ecosystem advantages of Arm processors. In this context, Arm Custom Instructions for Armv8-M aim to:Enable differentiation by giving you the power to innovate within the proven Arm architecture, a worldwide standard.Reduce time to mar
5、ket when exploring new classes of user-defined instructions for emerging algorithms and applications.Develop a domain-specific architecture by allowing you to implement a customized accelerator with an Arm architecturally compliant CPU as a container.One way to categorize accelerators based on the c
6、onnection to the CPU is:Memory-mapped accelerators, such as a GPU, directly connected to the memory bus.Coprocessor interface, recently introduced on the Arm Cortex-M33 processor, enabling you to build closely coupled accelerators under the direct control of the CPU.Arm Custom Instructions for Armv8
7、-M further expand this view of hardware accelerators by enabling tightly coupled accelerators with even closer coupling with the datapath of the processor.1. Memory-mapped2. Coprocessor3. Tightly coupledDecoupled from CPUCan have wide register size and direct access to memoryNot limited by memory ba
8、ndwidth of CPURuns in parallel to CPUIntegrated with CPUAdditional customer registersHigh-throughput interface with CPU (64 bits per cycle)Ideal for medium latency operations (3 cycles or more)Runs in parallel to CPUTightly Integrated with CPUAccess to standard registersIdeal for low-latency operati
9、ons(1 to 3 cycles)Instructions interleaved with Arm standard instructionsFigure 1. Arm Custom Instructions for Armv8-M introduce the abilityto add tightly coupled accelerators to Arm Cortex-M CPUsHow do Arm Custom Instructions for Armv8-M Work?Arm redefines the coprocessor Instruction Set Architecur
10、e (ISA) encoding space to enable custom instructions using the Arm architectural registers and flags. You can use this encoding space to add your own, differentiating data processing instructions without compromising performance.Arm Custom Instructions for Armv8-M add a customizable module inside th
11、e processor. This module is driven by the pre-decoded instructions and shares the same interface as the standard Arithmetic Logic Unit (ALU) of the CPU. This configuration space enables you to design your own operations where:The CPU manages control and dependencies.Instructions are either single-cy
12、cle or multi-cycle and can be pipelined.Implementations might contain one configuration space for the CPU and its general purpose registers, and one configuration space for the Floating-Point Unit (FPU) and its floating-point registers.There are multiple regions of the encoding space available for c
13、ustomization. You can choose how many regions to use, up to eight, based on the type of instructions you want to implement. For the regions that are not used, Arm decodes the instruction either for the coprocessor interface, if present and enabled, or as a NOCP exception, if not present or not acces
14、sible.Figure 2. Arm Custom Instructions for Armv8-M configuration spaceAdding custom instructions to a customizable CPU requires two steps:Providing a configuration file that lists the regions you want to use for adding your own custom instructions.Building the datapath for your own custom instructi
15、ons and integrating it into the configuration space.Decoding logic is automatically configured to decode your custom instructions and control your custom datapath. On top of the decoder, the CPU resolves all required control signals, instruction interlocking, and data dependencies.The custom ALUs fo
16、llow the execution resources available on the customizable CPU: Operating out of the extended register file, that is registers in the Floating-Point Unit (FPU) or M-profile Vector Extension (MVE), is only possible on a CPU thatimplements the extended register file.The support for multi-cycle instruc
17、tions matches the supported latencies in the customizable CPU.Arm provides all required control signals and operands, and writes results into the register file for the custom datapath. Arm control logic handles all hazarding logic. As a result, any declared required operand or flag, and any declared
18、 result write, requires the appropriate hazarding to be handled, even if not used by the custom instructions.For CPUs with a coprocessor interface, the encoding space for each coprocessorcan be dedicated to either the external coprocessor or the customizable ISA extension, with mutual exclusion.Base
19、d on the configuration file you provided, Arm configures the instruction decoder and provides all control logic to drive your custom datapath. Arm also verifies all the control logic interlocks and forwarding. You design and verify the custom datapath.Arm provides a set of assertions and properties
20、to check compliance with the custom datapath interface protocol.Arm provides a testbench to verify the integration of the custom instructions into the customizable CPU. You develop the integration test suites to execute the custom instructions and check correctness.Which Custom Instructions are Now
21、Enabled?Arm introduces 2 3 classes of instruction extension in the coprocessor instruction space:Three classes operate on the general-purpose register file, including the condition code flags APSR_nzcv.Three classes operate on the floating-point/Single Instruction Multiple Data (SIMD) register file
22、only. The three classes are defined by the following instruction patterns: , , , The destination register or the destination register pair of an instruction might be read, as well as written (non-accumulator and accumulator variants).The operation code can be split between a true operation code in t
23、he custom datapath and an immediate value used in the custom datapath.Immediate consequences of the above are:No operations on the floating-point registers can set condition codes. There are no operations using registers from both register files.Operations on the general-purpose register file operat
24、e on 32-bit registers, or a dual-register consisting of a 64-bit value constructed from an even-numbered, general-purpose register and its immediately following odd pair.Table 1. General-purposeregisters and NZCV flagsTable 2. FPU/M- Profile Vector Extension (MVE) registersFor each class, the imm fi
25、eld can be partitioned between what operation is being requested and a constant immediate value that the operation can consume.Example in PracticeConsider a population count popcount(uint32_t x) int n = 0;for (int i = 0; i i) & 1;return n;A standard, hand-optimized Arm implementation wo
26、uld look like this:r0, LSR #4#0 xF0F0F0F0r0#24r0,r0,r1,r0,#0 x55555555r1, r0, LSR #1r0, r1#0 x33333333 r1, r0, LSR #2r0, #0 xCCCCCCCCr1#0 x01010101r1,r1,r0,r1,r1,r0,r0,r1,r0,r0,r0,r0,MOV.WAND.W SUBS MOV.W AND.W BIC ADD MOV.W ADD.W BIC MULS LSRSThis could be replaced with a single user-defined instru
27、ction that can be implemented in one cycle:CX1A p0, r0, #0 / population in r0, return r0Such an instruction can be used either directly in a specialized library or be added as an intrinsic instruction within C code. All standard compilers conform to a future release of Arm C Language Extension (ACLE
28、) will support intrinsic functions for user- defined instructions.What are the Benefits of Arm Custom Instructions for Armv8-M?The possibility to add built-in instructions into the CPU provides significant benefits for some use cases.Arm introduces an extended datapath and instructions pre-decoded b
29、y the decode function. The CPU manages Read/Write registers, as well as control and dependencies.The set of custom instructions is defined in the Arm architecture, therefore, all major compilers support it. Also, because the encoding of the custom instructions is integrated with tools, such as compi
30、lers and debuggers, you do not need to develop or distribute any special version of the tools.Arm Custom Instructions for Armv8-M bring three main benefits: Performance boost for low-latency instructions.Easy integration with the existing software ecosystem. Scalability across Arm Cortex-M CPUs.Although a strictly compatible interface across multiple CPUs might not be achievable, Arm aims to maintain consistency between different implementations to simplify porting accelerators across the Arm Cortex-M portfolio.Innovate w
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 特殊作业监护人培训考试卷含答案
- 2025年村级工作人员笔试题目及答案
- 2026年山东文化产业职业学院单招综合素质考试模拟试题附答案详解
- 高三英语语法 词性(名词、动词、形容词与副词)讲义-2026届高三英语上学期一轮复习专项
- 2026年智能三明治机项目投资计划书
- 人教版2025中考英语中考一轮复习八上Units 6-7 (共45张含内嵌视频)
- 设计口号题目及答案
- 2026年湖南机电职业技术学院单招职业技能考试备考试题带答案解析
- 2026年江西应用科技学院高职单招职业适应性测试模拟试题带答案解析
- 租赁合同更换名字协议书
- 2026年黑龙江农业经济职业学院高职单招职业适应性测试模拟试题及答案详解
- 2025-2026学年上海市行知实验中学高二上册期中考试语文试题 含答案
- 2026年广东省佛山市六年级数学上册期末考试试卷及答案
- 2026届吉林省长春六中、八中、十一中等省重点中学高二生物第一学期期末联考试题含解析
- 2025年低压电工操作证理论全国考试题库(含答案)
- 2026届浙江省学军中学英语高三第一学期期末达标检测试题含解析
- 2025北京市公共资源交易中心招聘8人(公共基础知识)测试题带答案解析
- 工会女工培训课件
- 雨课堂学堂在线学堂云《临床伦理与科研道德(山东大学)》单元测试考核答案
- 2025新疆和田地区“才聚和田·智汇玉都”招才引智招聘工作人员204人(公共基础知识)综合能力测试题附答案解析
- 2026年医疗机构人力资源配置降本增效项目分析方案
评论
0/150
提交评论