英文:
Can the x87 perform exact division on UNsigned QUADword integers?
问题
以下是您要翻译的内容:
"8086/8087/8088 MACRO ASSEMBLY LANGUAGE REFERENCE MANUAL"(c)1980年英特尔公司提到(重点在于):
"> ... 8087提供了对实数系统的非常好的近似。但是重要的是要记住,它不是精确的表示,实数上的算术 inherently 是近似的。
相反,同样重要的是,8087对其实数的整数子集执行精确算术。也就是说,对两个整数的操作返回一个精确的整数结果,前提是真实结果是整数且在范围内。
更近期的手册更加简洁(重点在于):
"> IA处理器...它们可以处理长达18位的十进制数,而不会出现四舍五入错误,对于最大为2^64(或10^18)的整数执行精确算术。
FPU支持的整数数据类型包括有符号字(16位)、有符号双字(32位)和有符号四字(64位)。关于FPU的一切都没有提到无符号。事实上,关于FPU的一切都透露出有符号性,以至于它们甚至支持有符号零(+0和-0)。
那么,是否可以使用FPU来除以无符号64位数字并获得确切的商和余数?
对于一对有符号64位数字的除法,我编写了以下代码。商看起来没问题,但余数总是返回为零。为什么?
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
FiDiv: push edi ecx ebx edx eax
mov edi, esp
fninit
fild qword [edi] ; 被除数
fild qword [edi+8] ; 除数
fld
fnstcw [edi]
or word [edi], 0C00h ; 截断向零
fldcw [edi]
fdivr st2 ; st0 = st2 / st0
fld ; 复制因为 'fistp' 弹出
fistp qword [edi] ; 商
fmulp ; st1 *= st0, 弹出st0
fsubp ; st1 -= st0, 弹出st0
fistp qword [edi+8] ; 余数
fnstsw ax
test ax, 101b ; #Z (除以零),#I (无效算术)
pop eax edx ebx ecx edi
jnz .NOK
ret ; CF=0
.NOK: stc ; 溢出在8000000000000000h / -1
ret ; 或者除以零
; ------------------------------
FPU的舍入模式设置为“截断向零”,也称为“截断”,以模拟ALU的idiv
指令的行为。"
英文:
The 8086/8087/8088 MACRO ASSEMBLY LANGUAGE REFERENCE MANUAL (c) 1980 Intel Corporation mentions (Emphasis mine):
> ... the 8087 provides a very good approximation of the real number system. It is important to remember, however, that it is not an exact representation, and that arithmetic on real numbers is inherently approximate.
> Conversely, and equally important, the 8087 does perform exact arithmetic on its integer subset of the reals. That is, an operation on two integers returns an exact integral result, provided that the true result is an integer and is in range.
A more recent manual is even more concise (Emphasis theirs):
> the IA processors ... They can process decimal numbers of up to 18 digits without round-off errors, performing exact arithmetic on integers as large as 2^64 (or 10^18).
The integer data types that the FPU supports include signed word (16 bit), signed dword (32 bit), and signed qword (64 bit). There is never any mention of UNsigned. In fact, everything about the FPU breathes signedness, to the extent that they even support signed zero (+0 and -0).
So, is it possible to use the FPU to divide a couple of unsigned 64-bit numbers and get an exact quotient and remainder for it?
For the division of a couple of signed 64-bit numbers I wrote next code. The quotient seems fine but the remainder is always returned zero. Why is this?
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
FiDiv: push edi ecx ebx edx eax
mov edi, esp
fninit
fild qword [edi] ; Dividend
fild qword [edi+8] ; Divisor
fld
fnstcw [edi]
or word [edi], 0C00h ; Truncate Towards Zero
fldcw [edi]
fdivr st2 ; st0 = st2 / st0
fld ; Duplicate because `fistp` does pop
fistp qword [edi] ; Quotient
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
fnstsw ax
test ax, 101b ; #Z (Divide-By-Zero), #I (Invalid Arithmetic)
pop eax edx ebx ecx edi
jnz .NOK
ret ; CF=0
.NOK: stc ; Overflow on 8000000000000000h / -1
ret ; or Division by zero
; ------------------------------
The FPU rounding mode is set to 'Truncate Towards Zero' aka 'Chop', so as to mimic the behavior of the ALU idiv
instruction.
答案1
得分: 5
以下是代码的翻译部分:
零余数
-
> fdivr st2 ; st0 = st2 / st0
> fld ; 复制,因为 `fistp` 不会弹出
> fistp qword [edi] ; 商
> fmulp ; st1 *= st0,弹出st0
> fsubp ; st1 -= st0,弹出st0
> fistp qword [edi+8] ; 余数
这段代码计算了余数:
余数 = 被除数 - (商 * 除数)
因为舍入模式被设置为'截断向零',`fistp qword [edi]` 指令将在存储到内存之前将ST0中的(副本)商转换为整数。但是,仍然在(fpu)堆栈上保留的商的值是带有小数的实数。一旦与除数相乘,它将再次产生被除数,导致余数为零。
缺少的是将商舍入为整数并在(fpu)堆栈上执行的操作:
fdivr st2 ; st0 = st2 / st0
frndint
fld ; 复制,因为 `fistp` 不会弹出
fistp qword [edi] ; 商
fmulp ; st1 *= st0,弹出st0
fsubp ; st1 -= st0,弹出st0
fistp qword [edi+8] ; 余数
但更快的方法是简单地重新从内存中加载整数商:
fdivr st2 ; st0 = st2 / st0
fistp qword [edi] ; 商
fild qword [edi]
fmulp ; st1 *= st0,弹出st0
fsubp ; st1 -= st0,弹出st0
fistp qword [edi+8] ; 余数
无符号除法
-
在内部,FPU为一个数字的尾数分配了64位,还有一个用于数字的符号的单独位。FPU可以表示范围从-18'446'744'073'709'551'616到18'446'744'073'709'551'616的每个整数。64位尾数允许我们使用范围从0到18'446'744'073'709'551'615的无符号64位整数。唯一的问题是如何加载和存储这些值,因为 `fild` 和 `fistp` 不能处理它们(因为它们只能在范围从-9'223'372'036'854'775'808到9'223'372'036'854'775'807的范围内操作)。
可以将无符号四字节和扩展实数格式之间相互转换,以便可以使用 `fld` 和 `fstp`。另一种方法是从它们的高半部分加载/存储无符号四字节。但是转换需要时间,因此我发现通过消除足够多的特殊情况,唯一仍然存在问题的操作是加载被除数。
特殊情况包括:
- 除以零:CF=1
- 除以一:商=被除数,余数=0
- 被除数等于除数的除法:商=1,余数=0
- 被除数小于除数的除法:商=0,余数=被除数
在实际需要 `fdiv` 的情况下,代码首先加载被除数的一半,将其返回到(fpu)堆栈上,并在真实被除数为奇数时条件性地加1。
```assembly
; 输入 (edx:eax,ecx:ebx) 输出 (edx:eax,ecx:ebx,CF)
FuDiv: cmp ebx, 1
jbe .TST ; 除数可能为0或1
.a: cmp edx, ecx
jb .LT ; 被除数 < 除数
ja .b ; 被除数 > 除数
cmp eax, ebx
jb .LT ; 被除数 < 除数
je .GE ; 被除数 = 除数
.b: test ecx, ecx
js .GE ; 被除数 > 除数 > 7FFFFFFFFFFFFFFFh
shr edx, 1 ; 将无符号64位被除数减半
rcr eax, 1 ; (CF) 为了使用 `fild` 而将其带入范围
push edi ecx ebx edx eax
mov edi, esp
fninit
fild qword [edi] ; st0 = int(被除数 / 2)
fadd st0 ; st0 = {被除数 - 1, 被除数}
jnc .c ; (CF)
fld1
faddp ; st0 = 被除数 [0, FFFFFFFFFFFFFFFFh]
.c: fild qword [edi+8] ; 除数为 [2, 7FFFFFFFFFFFFFFFh]
fld
fnstcw [edi]
or word [edi], 0C00h ; 向零截断
fldcw [edi]
fdivr st2 ; st0 = st2 / st0
fistp qword [edi] ; 商
fild qword [edi]
fmulp ; st1 *= st0,弹出st0
fsubp ; st1 -= st0,弹出st0
fistp qword [edi+8] ; 余数
pop eax edx ebx ecx edi
ret ; CF=0
.TST: test ecx, ecx
jnz .a
cmp ebx, 1 ; 除数为0或1
<details>
<summary>英文:</summary>
Zero remainder
-
> fdivr st2 ; st0 = st2 / st0
> fld ; Duplicate because `fistp` does pop
> fistp qword [edi] ; Quotient
> fmulp ; st1 *= st0, pop st0
> fsubp ; st1 -= st0, pop st0
> fistp qword [edi+8] ; Remainder
This code calculates the remainder from:
Remainder = Dividend - (Quotient * Divisor)
Because the Rounding Mode got set to 'Truncate Towards Zero', the `fistp qword [edi]` instruction will convert the (copy of the) quotient held in ST0 to an integer right before storing to memory. However the value for the quotient that remains on the (fpu) stack is still a real number with a fraction. Once multiplied with the divisor it will produce the dividend again, resulting in a zero remainder.
What is missing is rounding the quotient to an integer and doing it on the (fpu) stack already:
fdivr st2 ; st0 = st2 / st0
frndint
fld ; Duplicate because `fistp` does pop
fistp qword [edi] ; Quotient
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
But a faster way is to simply reload the integer quotient from memory:
fdivr st2 ; st0 = st2 / st0
fistp qword [edi] ; Quotient
fild qword [edi]
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
Unsigned division
-
Internally, the FPU dedicates 64 bits to the significand of a number plus a separate bit for the sign of the number. The FPU can represent every whole number in the range from -18'446744073'709551616 to 18'446744073'709551616. The 64-bit significand allows us to work with the unsigned 64-bit integers that range from 0 to 18'446744073'709551615. The only trouble is how to load and store these values that `fild` and `fistp` can't deal with (because they are restricted to operate in the range from -9'223372036'854775808 to 9'223372036'854775807).
It would be possible to convert back and forth between unsigned quadword and the extended real format so we could use `fld` and `fstp` instead. Another approach would load/store the unsigned quadwords from/to their upper and lower halves. But conversions take time and so I found that by eliminating enough special cases, the only troublesome operation that remains was the loading of the dividend. Everything else can just use `fild` and `fistp` as usual.
The special cases include:
- Division by zero: CF=1
- Division by one: Quotient=Dividend and Remainder=0
- Divisions where the Dividend is equal to the Divisor: Quotient=1 and Remainder=0
- Divisions where the Dividend is smaller than the Divisor: Quotient=0 and Remainder=Dividend
Where an actual `fdiv` is needed, the code first loads half of the dividend, doubles it back on the (fpu) stack, and conditionally adds 1 if the true dividend was odd.
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
FuDiv: cmp ebx, 1
jbe .TST ; Divisor could be 0 or 1
.a: cmp edx, ecx
jb .LT ; Dividend < Divisor
ja .b ; Dividend > Divisor
cmp eax, ebx
jb .LT ; Dividend < Divisor
je .GE ; Dividend = Divisor
.b: test ecx, ecx
js .GE ; Dividend > Divisor > 7FFFFFFFFFFFFFFFh
shr edx, 1 ; Halving the unsigned 64-bit Dividend
rcr eax, 1 ; (CF) to get in range for `fild`
push edi ecx ebx edx eax
mov edi, esp
fninit
fild qword [edi] ; st0 = int(Dividend / 2)
fadd st0 ; st0 = {Dividend - 1, Dividend}
jnc .c ; (CF)
fld1
faddp ; st0 = Dividend [0, FFFFFFFFFFFFFFFFh]
.c: fild qword [edi+8] ; Divisor is [2, 7FFFFFFFFFFFFFFFh]
fld
fnstcw [edi]
or word [edi], 0C00h ; Truncate Towards Zero
fldcw [edi]
fdivr st2 ; st0 = st2 / st0
fistp qword [edi] ; Quotient
fild qword [edi]
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
pop eax edx ebx ecx edi
ret ; CF=0
.TST: test ecx, ecx
jnz .a
cmp ebx, 1 ; Divisor is 0 or 1
jb .NOK
.ONE: dec ebx ; Remainder ECX:EBX is 0
.NOK: ret
.GE: sub eax, ebx ; Remainder ECX:EBX is Dividend - Divisor
sbb edx, ecx
mov ebx, eax
mov ecx, edx
mov eax, 1 ; Quotient EDX:EAX is 1
cdq
ret ; CF=0
.LT: mov ebx, eax ; Remainder ECX:EBX is Dividend
mov ecx, edx
xor eax, eax ; Quotient EDX:EAX is 0
cdq
ret ; CF=0
; ------------------------------
Faster results
-
Quoting from an [answer that uses a technique named Chunking to divide a couple of 64-bit integers](https://stackoverflow.com/questions/17217440/how-to-do-64-bit-by-64-bit-division-in-assembly/76332918#76332918):
> Even if you are using a 64-bit data type, practice shows me that the majority of divisions in a (general purpose) program could still do with just using the built-in `div` instruction. And that is why I prefixed my code with a detection mechanism that checks whether the divisor in ECX:EBX is less than 4GB (so fitting EBX) and that the dividend's extension in EDX is less than the divisor in EBX. If these conditions are met, the normal `div` instruction does the job, and does it faster too. If for some reason (eg. school) using `div` is not allowed, then simply remove the prefixed code to be in the clear.
Today's code can benefit from the same prefix, but as it turns out, it is more beneficial to keep detecting the special cases first:
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
FuDiv: cmp ebx, 1
jbe .TST ; Divisor could be 0 or 1
.a: cmp edx, ecx
jb .LT ; Dividend < Divisor
ja .b ; Dividend > Divisor
cmp eax, ebx
jb .LT ; Dividend < Divisor
je .GE ; Dividend = Divisor
.b: test ecx, ecx
js .GE ; Dividend > Divisor > 7FFFFFFFFFFFFFFFh
; - - - - - - - - - - - - - - -
jnz .fdiv
cmp edx, ebx
jnb .fdiv
.div: div ebx ; EDX:EAX / EBX --> EAX Quotient, EDX Remainder
mov ebx, edx ; Remainder to ECX:EBX
xor edx, edx ; Quotient to EDX:EAX
ret ; CF=0
; - - - - - - - - - - - - - - -
.fdiv: shr edx, 1 ; Halving the unsigned 64-bit Dividend
rcr eax, 1 ; (CF) to get in range for fild
push edi ecx ebx edx eax
mov edi, esp
fninit
fild qword [edi] ; st0 = int(Dividend / 2)
fadd st0 ; st0 = {Dividend - 1, Dividend}
jnc .c ; (CF)
fld1
faddp ; st0 = Dividend [0, FFFFFFFFFFFFFFFFh]
.c: fild qword [edi+8] ; Divisor is [2, 7FFFFFFFFFFFFFFFh]
fld
fnstcw [edi]
or word [edi], 0C00h ; Truncate Towards Zero
fldcw [edi]
fdivr st2 ; st0 = st2 / st0
fistp qword [edi] ; Quotient
fild qword [edi]
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
pop eax edx ebx ecx edi
ret ; CF=0
.TST: test ecx, ecx
jnz .a
cmp ebx, 1 ; Divisor is 0 or 1
jb .NOK
.ONE: dec ebx ; Remainder ECX:EBX is 0
.NOK: ret
.GE: sub eax, ebx ; Remainder ECX:EBX is Dividend - Divisor
sbb edx, ecx
mov ebx, eax
mov ecx, edx
mov eax, 1 ; Quotient EDX:EAX is 1
cdq
ret ; CF=0
.LT: mov ebx, eax ; Remainder ECX:EBX is Dividend
mov ecx, edx
xor eax, eax ; Quotient EDX:EAX is 0
cdq
ret ; CF=0
; ------------------------------
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论