本文
主要介绍关于浮点运算单元的一些基础知识,和作为验证师应该关注的点。
专业术语与缩略语
缩写 |
全称 |
说明 |
FPU |
Floatpoint Unit |
浮点运算单元 |
LSB |
least significant bit |
最低有效位 |
MSB |
most significant bit |
最高有效位 |
NaN |
not a number |
非数 |
qNaN |
quiet NaN |
一般表示未定义的算术运算结果 |
sNaN |
signaling NaN |
一般表示标记未初始化的值 |
FMP |
fused Multiply Add |
融合乘加 |
RM |
Rounding Mode |
舍入模式 |
|
Sign |
符号位(浮点格式中的组成部分,0表示正,1表示负) |
|
trailing significand field |
尾数有效位(浮点格式中的组成部分,除前导数字外的所有有效数字) |
|
biased exponent |
偏指数(浮点格式中的组成部分,指数与偏移量常数的和,目的是biased exponent为非负数) |
|
Mantissa |
尾数 |
|
radix |
基数(进制) |
|
precision |
精度 |
|
infinite |
无穷 |
|
|
|
参考
名称 |
作者 |
来源 |
《DDI0487D_b_armv8_arm》 |
ARM |
ARM官网 |
《IEEE-754-2008》 |
IEEE |
IEEE官网 |
IEEE-754标准
概述
- 浮点数的表示可以理解为实数连续无线集合的有限子集,另外加上一些扩展集(qNaN和sNaN)。
- 根据给定的格式(单精度/双精度/其他),通过舍入将实数集合映射到该格式可表示的浮点数。
- 浮点数据包含以下类型:有符号零、有限的非零数、有符号无穷大、NaN(非数)。
- 将浮点数据的表示映射到固定的比特位,形成浮点数的二进制表示方法,以及相应到运算规则。
二进制表示
- 1-bit符号位 S
- w-bit偏指数 E = e+bias
- (t=p−1)-bit尾数有效位 T=d1 d2 … dp−1 ,有效位的首位d0被隐含在偏指数E中,如果E等于0,则d0等于0,如果E非0且非全1,则d0等于1。
- 关于 k、p、t、w、bias 在不同精度表示格式下的值,如下图:
舍入模式
-
舍入模式:
- roundTiesToEven :就近舍入的向偶数舍入,类似于熟悉的四舍五入,而这里是 四舍六入五凑偶 ,另外向偶数舍入是规范中默认的舍入模式。
- roundTowardPositive :向上舍入,正浮点数,尾数非0,则向前进1,负浮点数,尾数非0,则舍去尾数。
- roundTowardNegative :向下舍入,正浮点数,尾数非0,则舍去尾数,负浮点数,尾数非0,则向前进1。
- roundTowardZero :向零舍入,也就是无论正负,都舍去尾数。
- roundTiesToAway :就近舍入中的向上舍入,也就是四舍五入。roundTiesToAway为十进制提供,而规范中并不建议使用roundTiesToAway舍入模式。
-
举例说明(保留1位小数):
- 保留位(Guard bit):以保留1位小数为例,保留位即第一位小数。
- 近似位(Round bit):以保留1位小数为例,近似位即保留位的下一位,也就是第二位小数。
- 中间值:距两个最近的精确值相等,以保留1位小数为例,十进制2.3和2.4的中间值为2.35,二进制1.0和1.1的中间值为1.01。
- 向偶数舍入,所谓四舍六入五凑偶,就是原始值等于中间值,如果当前保留位是奇数,则进1,如果当前保留位是偶数,则舍去。
原始值 |
中间值 |
向偶数舍入 |
向上舍入 |
向下舍入 |
向零舍入 |
+1.1110 |
+1.11 |
+10.0 |
+10.0 |
+1.1 |
+1.1 |
+1.0101 |
+1.01 |
+1.1 |
+1.1 |
+1.0 |
+1.0 |
+1.0010 |
+1.01 |
+1.0 |
+1.1 |
+1.0 |
+1.0 |
+1.1000 |
+1.11/+1.01 |
+1.1 |
+1.1 |
+1.1 |
+1.1 |
+1.1100 |
+1.11 |
+10.0 |
+10.0 |
+1.1 |
+1.1 |
+1.0100 |
+1.01 |
+1.0 |
+1.1 |
+1.0 |
+1.0 |
-1.1110 |
-1.11 |
-10.0 |
-1.1 |
-10.0 |
-1.1 |
-1.0101 |
-1.01 |
-1.1 |
-1.0 |
-1.1 |
-1.0 |
-1.0010 |
-1.01 |
-1.0 |
-1.0 |
-1.1 |
-1.0 |
-1.1000 |
-1.11/-1.01 |
-1.1 |
-1.1 |
-1.1 |
-1.1 |
-1.1100 |
-1.11 |
-10.0 |
-1.1 |
-10.0 |
-1.1 |
-1.0100 |
-1.01 |
-1.0 |
-1.0 |
-1.1 |
-1.0 |
- 为什么采用向偶数舍入?
- 四舍五入:十进制中近似位可能的数字为 1 到 9,1/2/3/4舍去,9/8/7/6进位,毋庸置疑,但是对于5,如果采用进位的话,在进行大量数据的统计时,就会累积比较大的偏差。
- 向偶数舍入:在大多数情况下,5舍去还是进位概率相等,统计时产生的偏差也就相应要小一些。
特殊值
- 特殊值参与的运算,在规范中有特殊的处理方式,在FPU验证中都要格外关注。
- 规格化值虽然不算特殊值,但最大值、最小值、最小精度值在FPU验证中也是需要关注的。
- 对于0、无穷大、非数这类特殊值参与运算,可能会产生浮点异常,详见 浮点异常 章节。
- 本例只对半精度做展示,单精度和双精度与其类似。
特殊值 |
半精度 |
+0 |
0_00000_0000000000 |
-0 |
1_00000_0000000000 |
正无穷 |
0_11111_0000000000 |
负无穷 |
1_11111_0000000000 |
qNaN |
x_11111_1xxxxxxxxx |
sNaN |
x_11111_0xxxxxxxxx |
非规格化最大值 |
0_00000_1111111111 |
非规格化最小值 |
1_00000_1111111111 |
非规格化正最小精度值 |
0_00000_0000000001 |
非规格化负最小精度值 |
1_00000_0000000001 |
规格化最大值 |
0_11110_1111111111 |
规格化最小值 |
1_11110_1111111111 |
规格化正最小精度值 |
0_00001_0000000001 |
规格化负最小精度值 |
1_00001_0000000001 |
- 不发生异常的特殊值运算:
- 无穷 加/减 规格化/非规格化/0
- 无穷 乘 规格化/非规格化
- 无穷 除 规格化/非规格化
- 正无穷 开方运算
- 规格化/非规格化 除 无穷大 取余
- 无穷 格式转换(如单精度、双精度间的转换)
- 0 加/减/乘 规格化/非规格化/0
- 0 除 规格化/非规格化/无穷
- 0 开方运算
浮点异常
-
Invalid operation,无效操作输出结果为qNaN:
- 操作数为 NaN 的运算,格式转换除外,如单/双精度间的转换
- 0 乘 无穷 ,以及融合乘加中乘法项为 0 和 无穷 的运算
- 正无穷 加 负无穷 (包括减法形式),以及融合乘加中最后加法项为 正无穷 和 负无穷 的运算
- 0 除 0 , 无穷 除 无穷
- 取余,被除数是 规格化/非规格化值 除数是 0 ,或者被除数是 无穷 除数是 规格化/非规格化值
- 负数开方
- NaN、无穷 转换为整数
- NaN、无穷、0 取对数
- NaN 参与的比较运算,及正无穷与正无穷的大小比较等
-
Division by zero ,除零运算输出结果为无穷。
- 被除数为 规格化/非规格化值 除数是 0
- logB(0),对应结果浮点格式为 负无穷
-
Overflow ,操作数为 规格化/非规格化值 ,并且运算结果根据舍入模式进行舍入后,大小超出可表示的最大值,则发生上溢异常。
- roundTiesToEven/roundTiesToAway 舍入模式下,正溢出输出结果为正无穷,负溢出输出结果为负无穷。
- roundTowardZero 舍入模式下,正溢出输出结果为最大值,负溢出输出结果为最小值。
- roundTowardNegative 舍入模式下,正溢出输出结果为最大值,负溢出输出结果为负无穷。
- roundTowardPositive 舍入模式下,正溢出输出结果为正无穷,负溢出输出结果为最小值。
- 另外,在输出运算结果的同时,还要发出上溢和非精确异常。
-
Underflow ,操作数为 规格化/非规格化值 ,并且运算结果为非规格化值(小于2^emin),则发生下溢异常。
- 运算结果为非规格化值,无需舍入操作,则只发生下溢异常。
- 运算结果为非规格化值,并且需舍入操作,则发生下溢异常和非精确异常。
- 最终结果要根据舍入模式进行舍入操作,可能为 0、2^emin、非规格化值
-
Inexact ,运算结果需要根据舍入模式进行舍入操作,择发生非精确异常。
- 当浮点格式的精度无法表示运算结果,需要根据舍入模式进行舍入操作,得到近似值,这时要报告非精确异常。
- 输出结果为舍入后的结果。
浮点类型指令(ARMv8 AArch64)
浮点寄存器型数据传输
- FMOV(general):Floating-point Move to or from general-purpose register without conversion.(设计中一般不存在浮点寄存器和通用寄存器间的通路,需要借用ld/st的通路实现)
FMOV <Wd>, <Hn>
FMOV <Xd>, <Hn>
FMOV <Hd>, <Wn>
FMOV <Sd>, <Wn>
FMOV <Wd>, <Sn>
FMOV <Hd>, <Xn>
FMOV <Dd>, <Xn>
FMOV <Vd>.D[1], <Xn>
FMOV <Xd>, <Dn>
FMOV <Xd>, <Vn>.D[1]
- FMOV(register):Floating-point Move register without conversion.
FMOV <Hd>, <Hn>
FMOV <Sd>, <Sn>
FMOV <Dd>, <Dn>
浮点立即数型数据传输
- FMOV(scalar, immediate):Floating-point move immediate (scalar). 立即数可表示范围以及数据组织结构请参考下文。
FMOV <Hd>, #<imm>
FMOV <Sd>, #<imm>
FMOV <Dd>, #<imm>
- FMOV(vector, immediate):Floating-point move immediate (vector). 立即数可表示范围以及数据组织结构请参考下文。
FMOV <Vd>.<T>, #<imm> //<T>: 4H/8H, 2S/4S
FMOV <Vd>.2D, #<imm>
- imm立即数可表示范围以及数据组织结构:8bit数据{abcdefgh},{a}为符号位,{bcd}为阶码,({!b, cd} - 3),{efgh}为尾数。
8位浮点立即数表示及与半/单/双精度的转换关系
8位浮点立即数可表示的十进制范围
浮点转换指令
scalar类型
- FCVT:Floating-point Convert precision (scalar)
FCVT <Sd>, <Hn>
FCVT <Dd>, <Hn>
FCVT <Hd>, <Sn>
FCVT <Dd>, <Sn>
FCVT <Hd>, <Dn>
FCVT <Sd>, <Dn>
- FCVTAS (scalar):Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar).
FCVTAS <Wd>, <Hn>
FCVTAS <Xd>, <Hn>
FCVTAS <Wd>, <Sn>
FCVTAS <Xd>, <Sn>
FCVTAS <Wd>, <Dn>
FCVTAS <Xd>, <Dn>
- FCVTAU (scalar):Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar).
FCVTAU <Wd>, <Hn>
FCVTAU <Xd>, <Hn>
FCVTAU <Wd>, <Sn>
FCVTAU <Xd>, <Sn>
FCVTAU <Wd>, <Dn>
FCVTAU <Xd>, <Dn>
- FCVTMS (scalar):Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar).
FCVTMS <Wd>, <Hn>
FCVTMS <Xd>, <Hn>
FCVTMS <Wd>, <Sn>
FCVTMS <Xd>, <Sn>
FCVTMS <Wd>, <Dn>
FCVTMS <Xd>, <Dn>
- FCVTMU (scalar):Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar).
FCVTMU <Wd>, <Hn>
FCVTMU <Xd>, <Hn>
FCVTMU <Wd>, <Sn>
FCVTMU <Xd>, <Sn>
FCVTMU <Wd>, <Dn>
FCVTMU <Xd>, <Dn>
- FCVTNS (scalar):Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar).
FCVTNS <Wd>, <Hn>
FCVTNS <Xd>, <Hn>
FCVTNS <Wd>, <Sn>
FCVTNS <Xd>, <Sn>
FCVTNS <Wd>, <Dn>
FCVTNS <Xd>, <Dn>
- FCVTNU (scalar):Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar).
FCVTNU <Wd>, <Hn>
FCVTNU <Xd>, <Hn>
FCVTNU <Wd>, <Sn>
FCVTNU <Xd>, <Sn>
FCVTNU <Wd>, <Dn>
FCVTNU <Xd>, <Dn>
- FCVTPS (scalar):Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar).
FCVTPS <Wd>, <Hn>
FCVTPS <Xd>, <Hn>
FCVTPS <Wd>, <Sn>
FCVTPS <Xd>, <Sn>
FCVTPS <Wd>, <Dn>
FCVTPS <Xd>, <Dn>
- FCVTPU (scalar):Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar).
FCVTPU <Wd>, <Hn>
FCVTPU <Xd>, <Hn>
FCVTPU <Wd>, <Sn>
FCVTPU <Xd>, <Sn>
FCVTPU <Wd>, <Dn>
FCVTPU <Xd>, <Dn>
- FCVTZS (scalar, fixed-point):Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
FCVTZS <Wd>, <Hn>, #<fbits>
FCVTZS <Xd>, <Hn>, #<fbits>
FCVTZS <Wd>, <Sn>, #<fbits>
FCVTZS <Xd>, <Sn>, #<fbits>
FCVTZS <Wd>, <Dn>, #<fbits>
FCVTZS <Xd>, <Dn>, #<fbits>
- FCVTZS (scalar, integer):Floating-point Convert to Signed integer, rounding toward Zero (scalar).
FCVTZS <Wd>, <Hn>
FCVTZS <Xd>, <Hn>
FCVTZS <Wd>, <Sn>
FCVTZS <Xd>, <Sn>
FCVTZS <Wd>, <Dn>
FCVTZS <Xd>, <Dn>
- FCVTZU (scalar, fixed-point):Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
FCVTZU <Wd>, <Hn>, #<fbits>
FCVTZU <Xd>, <Hn>, #<fbits>
FCVTZU <Wd>, <Sn>, #<fbits>
FCVTZU <Xd>, <Sn>, #<fbits>
FCVTZU <Wd>, <Dn>, #<fbits>
FCVTZU <Xd>, <Dn>, #<fbits>
- FCVTZU (scalar, integer):Floating-point Convert to Unsigned integer, rounding toward Zero (scalar).
FCVTZU <Wd>, <Hn>
FCVTZU <Xd>, <Hn>
FCVTZU <Wd>, <Sn>
FCVTZU <Xd>, <Sn>
FCVTZU <Wd>, <Dn>
FCVTZU <Xd>, <Dn>
- SCVTF (scalar, fixed-point):Signed fixed-point Convert to Floating-point (scalar).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
SCVTF <Hd>, <Wn>, #<fbits>
SCVTF <Sd>, <Wn>, #<fbits>
SCVTF <Dd>, <Wn>, #<fbits>
SCVTF <Hd>, <Xn>, #<fbits>
SCVTF <Sd>, <Xn>, #<fbits>
SCVTF <Dd>, <Xn>, #<fbits>
- SCVTF (scalar, integer):Signed integer Convert to Floating-point (scalar).
SCVTF <Hd>, <Wn>
SCVTF <Sd>, <Wn>
SCVTF <Dd>, <Wn>
SCVTF <Hd>, <Xn>
SCVTF <Sd>, <Xn>
SCVTF <Dd>, <Xn>
- UCVTF (scalar, fixed-point):Unsigned fixed-point Convert to Floating-point (scalar).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
UCVTF <Hd>, <Wn>, #<fbits>
UCVTF <Sd>, <Wn>, #<fbits>
UCVTF <Dd>, <Wn>, #<fbits>
UCVTF <Hd>, <Xn>, #<fbits>
UCVTF <Sd>, <Xn>, #<fbits>
UCVTF <Dd>, <Xn>, #<fbits>
- UCVTF (scalar, integer):Unsigned integer Convert to Floating-point (scalar).
UCVTF <Hd>, <Wn>
UCVTF <Sd>, <Wn>
UCVTF <Dd>, <Wn>
UCVTF <Hd>, <Xn>
UCVTF <Sd>, <Xn>
UCVTF <Dd>, <Xn>
vector类型
- FCVTAS (vector):Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector).
FCVTAS <Hd>, <Hn>
FCVTAS <V><d>, <V><n> // <V>: S,D
FCVTAS <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- FCVTAU (vector):Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector).
FCVTAU <Hd>, <Hn>
FCVTAU <V><d>, <V><n> // <V>: S,D
FCVTAU <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- FCVTMS (vector):Floating-point Convert to Signed integer, rounding toward Minus infinity (vector).
FCVTMS <Hd>, <Hn>
FCVTMS <V><d>, <V><n> // <V>: S,D
FCVTMS <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- FCVTMU (vector):Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector).
FCVTMU <Hd>, <Hn>
FCVTMU <V><d>, <V><n> // <V>: S,D
FCVTMU <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.73 FCVTNS (vector):Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector).
FCVTNS <Hd>, <Hn>
FCVTNS <V><d>, <V><n> // <V>: S,D
FCVTNS <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- FCVTNU (vector):Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector).
FCVTNU <Hd>, <Hn>
FCVTNU <V><d>, <V><n> // <V>: S,D
FCVTNU <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- FCVTPS (vector):Floating-point Convert to Signed integer, rounding toward Plus infinity (vector).
FCVTPS <Hd>, <Hn>
FCVTPS <V><d>, <V><n> // <V>: S,D
FCVTPS <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- FCVTPU (vector):Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector).
FCVTPU <Hd>, <Hn>
FCVTPU <V><d>, <V><n> // <V>: S,D
FCVTPU <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- FCVTZS (vector, fixed-point):Floating-point Convert to Signed fixed-point, rounding toward Zero (vector).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
FCVTZS <V><d>, <V><n>, #<fbits> // <V>: S,D
FCVTZS <Vd>.<T>, <Vn>.<T>, #<fbits> //<T>: 4H/8H, 2S/4S, 2D
- FCVTZS (vector, integer):Floating-point Convert to Signed integer, rounding toward Zero (vector).
FCVTZS <Hd>, <Hn>
FCVTZS <V><d>, <V><n> // <V>: S,D
FCVTZS <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- FCVTZU (vector, fixed-point):Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
FCVTZU <V><d>, <V><n>, #<fbits> // <V>: S,D
FCVTZU <Vd>.<T>, <Vn>.<T>, #<fbits> //<T>: 4H/8H, 2S/4S, 2D
- FCVTZU (vector, integer):Floating-point Convert to Unsigned integer, rounding toward Zero (vector).
FCVTZU <Hd>, <Hn>
FCVTZU <V><d>, <V><n> // <V>: S,D
FCVTZU <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- SCVTF (vector, fixed-point):Signed fixed-point Convert to Floating-point (vector).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
SCVTF <V><d>, <V><n>, #<fbits> // <V>: H,S,D
SCVTF <Vd>.<T>, <Vn>.<T>, #<fbits> //<T>: 4H/8H, 2S/4S, 2D
- SCVTF (vector, integer):Signed integer Convert to Floating-point (vector).
SCVTF <Hd>, <Hn>
SCVTF <V><d>, <V><n> // <V>: S,D
SCVTF <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- UCVTF (vector, fixed-point):Unsigned fixed-point Convert to Floating-point (vector).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
UCVTF <V><d>, <V><n>, #<fbits> // <V>: H,S,D
UCVTF <Vd>.<T>, <Vn>.<T>, #<fbits> //<T>: 4H/8H, 2S/4S, 2D
- UCVTF (vector, integer):Unsigned integer Convert to Floating-point (vector).
UCVTF <Hd>, <Hn>
UCVTF <V><d>, <V><n> // <V>: S,D
UCVTF <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
浮点舍入到整数
scalar类型
- C7.2.141 FRINTA (scalar):Floating-point Round to Integral, to nearest with ties to Away (scalar).
FRINTA <Hd>, <Hn>
FRINTA <Sd>, <Sn>
FRINTA <Dd>, <Dn>
- C7.2.143 FRINTI (scalar):Floating-point Round to Integral, using current rounding mode (scalar).
FRINTI <Hd>, <Hn>
FRINTI <Sd>, <Sn>
FRINTI <Dd>, <Dn>
- C7.2.145 FRINTM (scalar):Floating-point Round to Integral, toward Minus infinity (scalar).
FRINTM <Hd>, <Hn>
FRINTM <Sd>, <Sn>
FRINTM <Dd>, <Dn>
- C7.2.147 FRINTN (scalar):Floating-point Round to Integral, to nearest with ties to even (scalar).
FRINTN <Hd>, <Hn>
FRINTN <Sd>, <Sn>
FRINTN <Dd>, <Dn>
- C7.2.149 FRINTP (scalar):Floating-point Round to Integral, toward Plus infinity (scalar).
FRINTP <Hd>, <Hn>
FRINTP <Sd>, <Sn>
FRINTP <Dd>, <Dn>
- C7.2.151 FRINTX (scalar):Floating-point Round to Integral exact, using current rounding mode (scalar).
FRINTX <Hd>, <Hn>
FRINTX <Sd>, <Sn>
FRINTX <Dd>, <Dn>
- C7.2.153 FRINTZ (scalar):Floating-point Round to Integral, toward Zero (scalar).
FRINTZ <Hd>, <Hn>
FRINTZ <Sd>, <Sn>
FRINTZ <Dd>, <Dn>
vector类型
- C7.2.140 FRINTA (vector):Floating-point Round to Integral, to nearest with ties to Away (vector).
FRINTA <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.142 FRINTI (vector):Floating-point Round to Integral, using current rounding mode (vector).
FRINTI <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.144 FRINTM (vector):Floating-point Round to Integral, toward Minus infinity (vector).
FRINTM <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.146 FRINTN (vector):Floating-point Round to Integral, to nearest with ties to even (vector).
FRINTN <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.148 FRINTP (vector):Floating-point Round to Integral, toward Plus infinity (vector).
FRINTP <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.150 FRINTX (vector):Floating-point Round to Integral exact, using current rounding mode (vector).
FRINTX <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.152 FRINTZ (vector):Floating-point Round to Integral, toward Zero (vector).
FRINTZ <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
浮点(融合)乘加指令
- C7.2.93 FMADD:Floating-point fused Multiply-Add (scalar).
FMADD <Hd>, <Hn>, <Hm>, <Ha>
FMADD <Sd>, <Sn>, <Sm>, <Sa>
FMADD <Dd>, <Dn>, <Dm>, <Da>
- C7.2.126 FMSUB:Floating-point Fused Multiply-Subtract (scalar).
FMSUB <Hd>, <Hn>, <Hm>, <Ha>
FMSUB <Sd>, <Sn>, <Sm>, <Sa>
FMSUB <Dd>, <Dn>, <Dm>, <Da>
- C7.2.134 FNMADD:Floating-point Negated fused Multiply-Add (scalar).
FNMADD <Hd>, <Hn>, <Hm>, <Ha>
FNMADD <Sd>, <Sn>, <Sm>, <Sa>
FNMADD <Dd>, <Dn>, <Dm>, <Da>
- C7.2.135 FNMSUB:Floating-point Negated fused Multiply-Subtract (scalar).
FNMSUB <Hd>, <Hn>, <Hm>, <Ha>
FNMSUB <Sd>, <Sn>, <Sm>, <Sa>
FNMSUB <Dd>, <Dn>, <Dm>, <Da>
浮点一源算数指令
scalar类型
- C7.2.39 FABS (scalar):Floating-point Absolute value (scalar).
FABS <Hd>, <Hn>
FABS <Sd>, <Sn>
FABS <Dd>, <Dn>
- C7.2.133 FNEG (scalar):Floating-point Negate (scalar).
FNEG <Hd>, <Hn>
FNEG <Sd>, <Sn>
FNEG <Dd>, <Dn>
- C7.2.157 FSQRT (scalar):Floating-point Square Root (scalar).
FSQRT <Hd>, <Hn>
FSQRT <Sd>, <Sn>
FSQRT <Dd>, <Dn>
vector类型
- C7.2.38 FABS (vector):Floating-point Absolute value (vector).
FABS <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.132 FNEG (vector):Floating-point Negate (vector).
FNEG <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.156 FSQRT (vector):Floating-point Square Root (vector).
FSQRT <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
浮点二源算数指令
scalar类型
- C7.2.43 FADD (scalar):Floating-point Add (scalar).
FADD <Hd>, <Hn>, <Hm>
FADD <Sd>, <Sn>, <Sm>
FADD <Dd>, <Dn>, <Dm>
- C7.2.159 FSUB (scalar):Floating-point Subtract (scalar).
FSUB <Hd>, <Hn>, <Hm>
FSUB <Sd>, <Sn>, <Sm>
FSUB <Dd>, <Dn>, <Dm>
- C7.2.91 FDIV (scalar):Floating-point Divide (scalar).
FDIV <Hd>, <Hn>, <Hm>
FDIV <Sd>, <Sn>, <Sm>
FDIV <Dd>, <Dn>, <Dm>
- C7.2.129 FMUL (scalar):Floating-point Multiply (scalar).
FMUL <Hd>, <Hn>, <Hm>
FMUL <Sd>, <Sn>, <Sm>
FMUL <Dd>, <Dn>, <Dm>
- C7.2.136 FNMUL (scalar):Floating-point Multiply-Negate (scalar).
FNMUL <Hd>, <Hn>, <Hm>
FNMUL <Sd>, <Sn>, <Sm>
FNMUL <Dd>, <Dn>, <Dm>
vector类型
- C7.2.42 FADD (vector):Floating-point Add (vector).
FADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.158 FSUB (vector):Floating-point Subtract (vector).
FSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.90 FDIV (vector):Floating-point Divide (vector).
FDIV <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.128 FMUL (vector):Floating-point Multiply (vector).
FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
浮点最大和最小值
scalar类型
- C7.2.95 FMAX (scalar):Floating-point Maximum (scalar).
FMAX <Hd>, <Hn>, <Hm>
FMAX <Sd>, <Sn>, <Sm>
FMAX <Dd>, <Dn>, <Dm>
- C7.2.101 FMAXP (scalar):Floating-point Maximum of Pair of elements (scalar).
FMAXP <V><d>, <Vn>.<T> //<V>: H/S/D; <T>:2H/2S/2D
- C7.2.97 FMAXNM (scalar):Floating-point Maximum Number (scalar). If one vector element is numeric and the other is a quiet NaN, the result that is placed in the vector is the numerical value, otherwise the result is identical to FMAX (scalar).
FMAXNM <Hd>, <Hn>, <Hm>
FMAXNM <Sd>, <Sn>, <Sm>
FMAXNM <Dd>, <Dn>, <Dm>
- C7.2.98 FMAXNMP (scalar):Floating-point Maximum Number of Pair of elements (scalar).
FMAXNMP <V><d>, <Vn>.<T> //<T>: 2H, 2S, 2D
- C7.2.105 FMIN (scalar):Floating-point Minimum (scalar).
FMIN <Hd>, <Hn>, <Hm>
FMIN <Sd>, <Sn>, <Sm>
FMIN <Dd>, <Dn>, <Dm>
- C7.2.111 FMINP (scalar):Floating-point Minimum of Pair of elements (scalar).
FMINP <V><d>, <Vn>.<T> //<V>: H/S/D; <T>:2H/2S/2D
- C7.2.107 FMINNM (scalar):Floating-point Minimum Number (scalar). If one vector element is numeric and the other is a quiet NaN, the result that is placed in the vector is the numerical value, otherwise the result is identical to FMIN (scalar).
FMINNM <Hd>, <Hn>, <Hm>
FMINNM <Sd>, <Sn>, <Sm>
FMINNM <Dd>, <Dn>, <Dm>
- C7.2.108 FMINNMP (scalar):Floating-point Minimum Number of Pair of elements (scalar).
FMINNMP <V><d>, <Vn>.<T> //<T>: 2H, 2S, 2D
vector类型
- C7.2.94 FMAX (vector):Floating-point Maximum (vector).
FMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.102 FMAXP (vector):Floating-point Maximum Pairwise (vector).
FMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.96 FMAXNM (vector):Floating-point Maximum Number (vector). If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value, otherwise the result is identical to FMAX (scalar).
FMAXNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.99 FMAXNMP (vector):Floating-point Maximum Number Pairwise (vector).
FMAXNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.100 FMAXNMV:Floating-point Maximum Number across Vector.
FMAXNMV <V><d>, <Vn>.<T> //<T>: 4S, 4H/8H <V>: H, S
- C7.2.104 FMIN (vector):Floating-point minimum (vector).
FMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.112 FMINP (vector):Floating-point Minimum Pairwise (vector).
FMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.106 FMINNM (vector):Floating-point Minimum Number (vector). If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value, otherwise the result is identical to FMIN (scalar).
FMINNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.109 FMINNMP (vector):Floating-point Minimum Number Pairwise (vector).
FMINNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.110 FMINNMV:Floating-point Minimum Number across Vector.
FMINNMV <V><d>, <Vn>.<T> //<T>: 4S, 4H/8H <V>: H, S
浮点比较指令
scalar类型
- C7.2.59 FCMP:Floating-point quiet Compare (scalar).It raises an Invalid Operation exception only if either operand is a signaling NaN.
FCMP <Hn>, <Hm>
FCMP <Hn>, #0.0
FCMP <Sn>, <Sm>
FCMP <Sn>, #0.0
FCMP <Dn>, <Dm>
FCMP <Dn>, #0.0
- C7.2.60 FCMPE:Floating-point signaling Compare (scalar).If either operand is any type of NaN, or if either operand is a signaling NaN, the instruction raises an Invalid Operation exception.
FCMPE <Hn>, <Hm>
FCMPE <Hn>, #0.0
FCMPE <Sn>, <Sm>
FCMPE <Sn>, #0.0
FCMPE <Dn>, <Dm>
FCMPE <Dn>, #0.0
- C7.2.47 FCCMP:Floating-point Conditional quiet Compare (scalar). It raises an Invalid Operation exception only if either operand is a signaling NaN.
FCCMP <Hn>, <Hm>, #<nzcv>, <cond>
FCCMP <Sn>, <Sm>, #<nzcv>, <cond>
FCCMP <Dn>, <Dm>, #<nzcv>, <cond>
- C7.2.48 FCCMPE:Floating-point Conditional signaling Compare (scalar).
FCCMPE <Hn>, <Hm>, #<nzcv>, <cond>
FCCMPE <Sn>, <Sm>, #<nzcv>, <cond>
FCCMPE <Dn>, <Dm>, #<nzcv>, <cond>
vector类型
- C7.2.49 FCMEQ (register):Floating-point Compare Equal (vector).
FCMEQ <Hd>, <Hn>, <Hm>
FCMEQ <Sd>, <Sn>, <Sm>
FCMEQ <Dd>, <Dn>, <Dm>
FCMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.50 FCMEQ (zero):Floating-point Compare Equal to zero (vector).
FCMEQ <Hd>, <Hn>, #0.0
FCMEQ <Sd>, <Sn>, #0.0
FCMEQ <Dd>, <Dn>, #0.0
FCMEQ <Vd>.<T>, <Vn>.<T>, #0.0 //<T>: 4H/8H, 2S/4S, 2D
- C7.2.51 FCMGE (register):Floating-point Compare Greater than or Equal (vector).
FCMGE <Hd>, <Hn>, <Hm>
FCMGE <Sd>, <Sn>, <Sm>
FCMGE <Dd>, <Dn>, <Dm>
FCMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.52 FCMGE (zero):Floating-point Compare Greater than or Equal to zero (vector).
FCMGE <Hd>, <Hn>, #0.0
FCMGE <Sd>, <Sn>, #0.0
FCMGE <Dd>, <Dn>, #0.0
FCMGE <Vd>.<T>, <Vn>.<T>, #0.0 //<T>: 4H/8H, 2S/4S, 2D
- C7.2.53 FCMGT (register):Floating-point Compare Greater than (vector).
FCMGT <Hd>, <Hn>, <Hm>
FCMGT <Sd>, <Sn>, <Sm>
FCMGT <Dd>, <Dn>, <Dm>
FCMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.54 FCMGT (zero):Floating-point Compare Greater than zero (vector).
FCMGE <Hd>, <Hn>, #0.0
FCMGE <Sd>, <Sn>, #0.0
FCMGE <Dd>, <Dn>, #0.0
FCMGE <Vd>.<T>, <Vn>.<T>, #0.0 //<T>: 4H/8H, 2S/4S, 2D
- C7.2.57 FCMLE (zero):Floating-point Compare Less than or Equal to zero (vector).
FCMLE <Hd>, <Hn>, #0.0
FCMLE <Sd>, <Sn>, #0.0
FCMLE <Dd>, <Dn>, #0.0
FCMLE <Vd>.<T>, <Vn>.<T>, #0.0 //<T>: 4H/8H, 2S/4S, 2D
- C7.2.58 FCMLT (zero):Floating-point Compare Less than zero (vector).
FCMLT <Hd>, <Hn>, #0.0
FCMLT <Sd>, <Sn>, #0.0
FCMLT <Dd>, <Dn>, #0.0
FCMLT <Vd>.<T>, <Vn>.<T>, #0.0 //<T>: 4H/8H, 2S/4S, 2D
- C7.2.40 FACGE:Floating-point Absolute Compare Greater than or Equal (vector).
FACGE <Hd>, <Hn>, <Hm>
FACGE <Sd>, <Sn>, <Sm>
FACGE <Dd>, <Dn>, <Dm>
FACGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.41 FACGT:Floating-point Absolute Compare Greater than (vector).
FACGT <Hd>, <Hn>, <Hm>
FACGT <Sd>, <Sn>, <Sm>
FACGT <Dd>, <Dn>, <Dm>
FACGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
浮点条件选择指令
- C7.2.61 FCSEL:Floating-point Conditional Select (scalar).
FCSEL <Hd>, <Hn>, <Hm>, <cond>
FCSEL <Sd>, <Sn>, <Sm>, <cond>
FCSEL <Dd>, <Dn>, <Dm>, <cond>
其他指令
- C7.2.137 FRECPE:Floating-point Reciprocal Estimate.
FRECPE <Hd>, <Hn>
FRECPE <Sd>, <Sn>
FRECPE <Dd>, <Dn>
FRECPE <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.138 FRECPS:Floating-point Reciprocal Step.
FRECPS <Hd>, <Hn>, <Hm>
FRECPS <Sd>, <Sn>, <Sm>
FRECPS <Dd>, <Dn>, <Dm>
FRECPS <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.139 FRECPX:Floating-point Reciprocal exponent (scalar).
FRECPX <Hd>, <Hn>
FRECPX <Sd>, <Sn>
FRECPX <Dd>, <Dn>
FRECPX <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.154 FRSQRTE:Floating-point Reciprocal Square Root Estimate.
FSQRTE <Hd>, <Hn>
FSQRTE <Sd>, <Sn>
FSQRTE <Dd>, <Dn>
FSQRTE <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
- C7.2.155 FRSQRTS:Floating-point Reciprocal Square Root Step.
FSQRTS <Hd>, <Hn>, <Hm>
FSQRTS <Sd>, <Sn>, <Sm>
FSQRTS <Dd>, <Dn>, <Dm>
FSQRTS <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
典型浮点运算(ARMv8 AArch64)
FPAdd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
//以下代码为FPAdd运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPAdd(bits(N) op1, bits(N) op2, FPCRType fpcr)
assert N IN {16,32,64};
rounding = FPRoundingMode(fpcr);
(type1,sign1,value1) = FPUnpack(op1, fpcr);
(type2,sign2,value2) = FPUnpack(op2, fpcr);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpcr);
if !done then
inf1 = (type1 == FPType_Infinity);
inf2 = (type2 == FPType_Infinity);
zero1 = (type1 == FPType_Zero);
zero2 = (type2 == FPType_Zero);
if inf1 && inf2 && sign1 == NOT(sign2) then
result = FPDefaultNaN();
FPProcessException(FPExc_InvalidOp, fpcr);
else if (inf1 && sign1 == '0') || (inf2 && sign2 == '0') then
result = FPInfinity('0');
else if (inf1 && sign1 == '1') || (inf2 && sign2 == '1') then
result = FPInfinity('1');
else if zero1 && zero2 && sign1 == sign2 then
result = FPZero(sign1);
else
result_value = value1 + value2;
if result_value == 0.0 then // Sign of exact zero result depends on rounding mode
result_sign = if rounding == FPRounding_NEGINF then '1' else '0';
result = FPZero(result_sign);
else
result = FPRound(result_value, fpcr, rounding);
return result;
|
FPSub
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
//以下代码为FPSub运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPSub(bits(N) op1, bits(N) op2, FPCRType fpcr)
assert N IN {16,32,64};
rounding = FPRoundingMode(fpcr);
(type1,sign1,value1) = FPUnpack(op1, fpcr);
(type2,sign2,value2) = FPUnpack(op2, fpcr);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpcr);
if !done then
inf1 = (type1 == FPType_Infinity);
inf2 = (type2 == FPType_Infinity);
zero1 = (type1 == FPType_Zero);
zero2 = (type2 == FPType_Zero);
if inf1 && inf2 && sign1 == sign2 then
result = FPDefaultNaN();
FPProcessException(FPExc_InvalidOp, fpcr);
else if (inf1 && sign1 == '0') || (inf2 && sign2 == '1') then
result = FPInfinity('0');
else if (inf1 && sign1 == '1') || (inf2 && sign2 == '0') then
result = FPInfinity('1');
else if zero1 && zero2 && sign1 == NOT(sign2) then
result = FPZero(sign1);
else
result_value = value1 - value2;
if result_value == 0.0 then // Sign of exact zero result depends on rounding mode
result_sign = if rounding == FPRounding_NEGINF then '1' else '0';
result = FPZero(result_sign);
else
result = FPRound(result_value, fpcr, rounding);
return result;
|
FPMul
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
//以下代码为FPMul运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPMul(bits(N) op1, bits(N) op2, FPCRType fpcr)
assert N IN {16,32,64};
(type1,sign1,value1) = FPUnpack(op1, fpcr);
(type2,sign2,value2) = FPUnpack(op2, fpcr);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpcr);
if !done then
inf1 = (type1 == FPType_Infinity);
inf2 = (type2 == FPType_Infinity);
zero1 = (type1 == FPType_Zero);
zero2 = (type2 == FPType_Zero);
if (inf1 && zero2) || (zero1 && inf2) then
result = FPDefaultNaN();
FPProcessException(FPExc_InvalidOp, fpcr);
else if inf1 || inf2 then
result = FPInfinity(sign1 EOR sign2);
else if zero1 || zero2 then
result = FPZero(sign1 EOR sign2);
else
result = FPRound(value1*value2, fpcr);
return result;
|
FPDiv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
//以下代码为FPDiv运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPDiv(bits(N) op1, bits(N) op2, FPCRType fpcr)
assert N IN {16,32,64};
(type1,sign1,value1) = FPUnpack(op1, fpcr);
(type2,sign2,value2) = FPUnpack(op2, fpcr);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpcr);
if !done then
inf1 = (type1 == FPType_Infinity);
inf2 = (type2 == FPType_Infinity);
zero1 = (type1 == FPType_Zero);
zero2 = (type2 == FPType_Zero);
if (inf1 && inf2) || (zero1 && zero2) then
result = FPDefaultNaN();
FPProcessException(FPExc_InvalidOp, fpcr);
else if inf1 || zero2 then
result = FPInfinity(sign1 EOR sign2);
if !inf1 then FPProcessException(FPExc_DivideByZero, fpcr);
else if zero1 || inf2 then
result = FPZero(sign1 EOR sign2);
else
result = FPRound(value1/value2, fpcr);
return result;
|
FPSqrt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
//以下代码为FPSqrt运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPSqrt(bits(N) op, FPCRType fpcr)
assert N IN {16,32,64};
(fptype,sign,value) = FPUnpack(op, fpcr);
if fptype == FPType_SNaN || fptype == FPType_QNaN then
result = FPProcessNaN(fptype, op, fpcr);
else if fptype == FPType_Zero then
result = FPZero(sign);
else if fptype == FPType_Infinity && sign == '0' then
result = FPInfinity(sign);
else if sign == '1' then
result = FPDefaultNaN();
FPProcessException(FPExc_InvalidOp, fpcr);
else
result = FPRound(Sqrt(value), fpcr);
return result;
|
FPMulAdd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
|
//以下代码为FPMulAdd运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPMulAdd(bits(N) addend, bits(N) op1, bits(N) op2, FPCRType fpcr)
assert N IN {16,32,64};
rounding = FPRoundingMode(fpcr);
(typeA,signA,valueA) = FPUnpack(addend, fpcr);
(type1,sign1,value1) = FPUnpack(op1, fpcr);
(type2,sign2,value2) = FPUnpack(op2, fpcr);
inf1 = (type1 == FPType_Infinity); zero1 = (type1 == FPType_Zero);
inf2 = (type2 == FPType_Infinity); zero2 = (type2 == FPType_Zero);
(done,result) = FPProcessNaNs3(typeA, type1, type2, addend, op1, op2, fpcr);
if typeA == FPType_QNaN && ((inf1 && zero2) || (zero1 && inf2)) then
result = FPDefaultNaN();
FPProcessException(FPExc_InvalidOp, fpcr);
if !done then
infA = (typeA == FPType_Infinity);
zeroA = (typeA == FPType_Zero);
// Determine sign and type product will have if it does not cause an Invalid
// Operation.
signP = sign1 EOR sign2;
infP = inf1 || inf2;
zeroP = zero1 || zero2;
// Non SNaN-generated Invalid Operation cases are multiplies of zero by infinity and
// additions of opposite-signed infinities.
if (inf1 && zero2) || (zero1 && inf2) || (infA && infP && signA != signP) then
result = FPDefaultNaN();
FPProcessException(FPExc_InvalidOp, fpcr);
// Other cases involving infinities produce an infinity of the same sign.
else if (infA && signA == '0') || (infP && signP == '0') then
result = FPInfinity('0');
else if (infA && signA == '1') || (infP && signP == '1') then
result = FPInfinity('1');
// Cases where the result is exactly zero and its sign is not determined by the
// rounding mode are additions of same-signed zeros.
else if zeroA && zeroP && signA == signP then
result = FPZero(signA);
// Otherwise calculate numerical result and round it.
else
result_value = valueA + (value1 * value2);
if result_value == 0.0 then // Sign of exact zero result depends on rounding mode
result_sign = if rounding == FPRounding_NEGINF then '1' else '0';
result = FPZero(result_sign);
else
result = FPRound(result_value, fpcr);
return result;
|
FPMax
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
//以下代码为FPMax运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPMax(bits(N) op1, bits(N) op2, FPCRType fpcr)
assert N IN {16,32,64};
(type1,sign1,value1) = FPUnpack(op1, fpcr);
(type2,sign2,value2) = FPUnpack(op2, fpcr);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpcr);
if !done then
if value1 > value2 then
(fptype,sign,value) = (type1,sign1,value1);
else
(fptype,sign,value) = (type2,sign2,value2);
if fptype == FPType_Infinity then
result = FPInfinity(sign);
else if fptype == FPType_Zero then
sign = sign1 AND sign2; // Use most positive sign
result = FPZero(sign);
else
// The use of FPRound() covers the case where there is a trapped underflow exception
// for a denormalized number even though the result is exact.
result = FPRound(value, fpcr);
return result;
|
FPMin
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
//以下代码为FPMin运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPMin(bits(N) op1, bits(N) op2, FPCRType fpcr)
assert N IN {16,32,64};
(type1,sign1,value1) = FPUnpack(op1, fpcr);
(type2,sign2,value2) = FPUnpack(op2, fpcr);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpcr);
if !done then
if value1 < value2 then
(fptype,sign,value) = (type1,sign1,value1);
else
(fptype,sign,value) = (type2,sign2,value2);
if fptype == FPType_Infinity then
result = FPInfinity(sign);
else if fptype == FPType_Zero then
sign = sign1 OR sign2; // Use most negative sign
result = FPZero(sign);
else
// The use of FPRound() covers the case where there is a trapped underflow exception
// for a denormalized number even though the result is exact.
result = FPRound(value, fpcr);
return result;
|
浮点运算功能点
关注的操作数
- 关注的操作数主要指特殊值,以及规格化的最大值、最小值、正负经典值、正负精度值,这些值在浮点运算中往往涉及特殊运算规则,需要格外关注。
- 二进制表示形式以半精度浮点为例,并注意,NaN值尾数非全零。
- 经典值指典型的常规值,可以添加多个经典值作为操作数的覆盖。
操作数类型 |
二进制形式 |
+0 |
0_00000_0000000000 |
-0 |
1_00000_0000000000 |
正无穷 |
0_11111_0000000000 |
负无穷 |
1_11111_0000000000 |
qNaN |
x_11111_1xxxxxxxxx |
sNaN |
x_11111_0xxxxxxxxx |
非规格化最大值 |
0_00000_1111111111 |
非规格化最小值 |
1_00000_1111111111 |
非规格化正最小精度值 |
0_00000_0000000001 |
非规格化负最小精度值 |
1_00000_0000000001 |
非规格化正经典值 |
0_00000_1001011010 |
非规格化负经典值 |
1_00000_0110100101 |
规格化最大值 |
0_11110_1111111111 |
规格化最小值 |
1_11110_1111111111 |
规格化正最小精度值 |
0_00001_0000000001 |
规格化负最小精度值 |
1_00001_0000000001 |
规格化正经典值 |
0_10110_1001011010 |
规格化负经典值 |
1_01001_0110100101 |
加减指令
Feature |
Sub_Feature |
操作数类型 |
|
|
关注的操作数组合遍历 |
|
通过遍历,可覆盖结果为NaN、结果为0、结果为无穷、结果为非规格化,以及特殊值运算优先级等场景 |
|
后面是以结果角度分析,对功能点的补充。 |
结果为非规格化 |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为正normal值 |
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为负normal值 |
结果为0 |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为正normal值 |
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为负normal值 |
结果上溢 |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为正normal值 |
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为负normal值 |
结果下溢 |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为正normal值 |
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为负normal值 |
结果非精确 |
|
|
上溢且非精确 |
|
下溢且非精确 |
|
结果正非精确 |
|
结果负非精确 |
结果为最大值 |
|
|
opa为正normal值, opb为正normal值 |
|
opa为正normal值, opb为负normal值 |
结果为最小值 |
|
|
opa为负normal值, opb为负normal值 |
|
opa为负normal值, opb为正normal值 |
结果为正/负normal |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为正normal值 |
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为负normal值 |
乘法指令
Feature |
Sub_Feature |
操作数类型 |
|
|
关注的操作数组合遍历 |
|
通过遍历,可覆盖结果为NaN、结果为0、结果为无穷、结果为非规格化,以及特殊值运算优先级等场景 |
|
后面是以结果角度分析,对功能点的补充。 |
结果为非规格化 |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为正normal值 |
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为负normal值 |
结果上溢 |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为正normal值 |
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为负normal值 |
结果下溢 |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为正normal值 |
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为负normal值 |
结果非精确 |
|
|
上溢且非精确 |
|
下溢且非精确 |
|
结果正非精确 |
|
结果负非精确 |
结果为最大值 |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为负normal值 |
结果为最小值 |
|
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为正normal值 |
结果为正/负normal |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为正normal值 |
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为负normal值 |
除法指令
Feature |
Sub_Feature |
操作数类型 |
|
|
关注的操作数组合遍历 |
|
通过遍历,可覆盖结果为NaN、结果为0、结果为无穷、结果为非规格化,以及特殊值运算优先级等场景 |
|
后面是以结果角度分析,对功能点的补充。 |
结果为非规格化 |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为正normal值 |
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为负normal值 |
结果上溢 |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为正normal值 |
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为负normal值 |
结果下溢 |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为正normal值 |
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为负normal值 |
结果非精确 |
|
|
上溢且非精确 |
|
下溢且非精确 |
|
结果正非精确 |
|
结果负非精确 |
结果为最大值 |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为负normal值 |
结果为最小值 |
|
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为正normal值 |
结果为正/负normal |
|
|
opa为正normal值, opb为正normal值 |
|
opa为负normal值, opb为正normal值 |
|
opa为正normal值, opb为负normal值 |
|
opa为负normal值, opb为负normal值 |
比较指令
Feature |
Sub_Feature |
操作数类型 |
|
|
关注的操作数组合遍历 |
|
通过遍历,可覆盖结果为NaN、结果为0、结果为无穷、结果为非规格化,以及特殊值运算优先级等场景 |
开方指令
Feature |
Sub_Feature |
操作数类型 |
|
|
关注的操作数组合遍历 |
|
通过遍历,可覆盖结果为NaN、结果为0、结果为无穷、结果为非规格化,以及特殊值运算优先级等场景 |
|
后面是以结果角度分析,对功能点的补充。 |
结果为非规格化 |
|
|
被开方数为正normal值 |
结果下溢 |
|
|
被开方数为正normal值 |
结果非精确 |
|
|
被开方数为正normal值 |
结果为正normal |
|
|
被开方数为正normal值 |
转换指令
Feature |
Sub_Feature |
操作数类型 |
|
|
关注的操作数组合遍历 |
|
通过遍历,可覆盖结果为NaN、结果为0、结果为无穷、结果为非规格化,以及特殊值运算优先级等场景 |
|
后面是以结果角度分析,对功能点的补充。 |
结果上溢 |
|
|
被转换数为正normal值 |
|
被转换数为负normal值 |
结果下溢 |
|
|
被转换数为正normal值 |
|
被转换数为负normal值 |
FMOV指令
Feature |
Sub_Feature |
操作数类型 |
|
|
关注的操作数组合遍历 |
|
通过遍历,可覆盖结果为NaN、结果为0、结果为无穷、结果为非规格化,以及特殊值运算优先级等场景 |
舍入模式
Feature |
Sub_Feature |
result |
就近舍入 |
|
|
|
结果为正,最低有效位的后一位为0 |
舍 |
|
结果为正,最低有效位的后一位为1,且后面数位不全为0 |
入 |
|
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数 |
入 |
|
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数 |
舍 |
|
结果为负,最低有效位的后一位为0 |
舍 |
|
结果为负,最低有效位的后一位为1,且后面数位不全为0 |
入 |
|
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数 |
入 |
|
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数 |
舍 |
向上舍入 |
|
|
|
结果为正,最低有效位的后一位为0 |
入 |
|
结果为正,最低有效位的后一位为1,且后面数位不全为0 |
入 |
|
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数 |
入 |
|
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数 |
入 |
|
结果为负,最低有效位的后一位为0 |
舍 |
|
结果为负,最低有效位的后一位为1,且后面数位不全为0 |
舍 |
|
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数 |
舍 |
|
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数 |
舍 |
向下舍入 |
|
|
|
结果为正,最低有效位的后一位为0 |
舍 |
|
结果为正,最低有效位的后一位为1,且后面数位不全为0 |
舍 |
|
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数 |
舍 |
|
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数 |
舍 |
|
结果为负,最低有效位的后一位为0 |
入 |
|
结果为负,最低有效位的后一位为1,且后面数位不全为0 |
入 |
|
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数 |
入 |
|
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数 |
入 |
向0舍入 |
|
|
|
结果为正,最低有效位的后一位为0 |
舍 |
|
结果为正,最低有效位的后一位为1,且后面数位不全为0 |
舍 |
|
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数 |
舍 |
|
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数 |
舍 |
|
结果为负,最低有效位的后一位为0 |
舍 |
|
结果为负,最低有效位的后一位为1,且后面数位不全为0 |
舍 |
|
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数 |
舍 |
|
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数 |
舍 |
文章原创,可能存在部分错误,欢迎指正,联系邮箱 cao_arvin@163.com。