2023年6月11日 21:44:41go评论70阅读模式

英文:

Can the x87 perform exact division on UNsigned QUADword integers?

问题

以下是您要翻译的内容：

"8086/8087/8088 MACRO ASSEMBLY LANGUAGE REFERENCE MANUAL"（c）1980年英特尔公司提到（重点在于）：

"> ... 8087提供了对实数系统的非常好的近似。但是重要的是要记住，它不是精确的表示，实数上的算术 inherently 是近似的。

相反，同样重要的是，8087对其实数的整数子集执行精确算术。也就是说，对两个整数的操作返回一个精确的整数结果，前提是真实结果是整数且在范围内。

更近期的手册更加简洁（重点在于）：

"> IA处理器...它们可以处理长达18位的十进制数，而不会出现四舍五入错误，对于最大为2^64（或10^18）的整数执行精确算术。

FPU支持的整数数据类型包括有符号字（16位）、有符号双字（32位）和有符号四字（64位）。关于FPU的一切都没有提到无符号。事实上，关于FPU的一切都透露出有符号性，以至于它们甚至支持有符号零（+0和-0）。
那么，是否可以使用FPU来除以无符号64位数字并获得确切的商和余数？

对于一对有符号64位数字的除法，我编写了以下代码。商看起来没问题，但余数总是返回为零。为什么？

; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
FiDiv:  push    edi ecx ebx edx eax
        mov     edi, esp
        fninit
        fild    qword [edi]     ; 被除数
        fild    qword [edi+8]   ; 除数
        fld
        fnstcw  [edi]
        or      word [edi], 0C00h ; 截断向零
        fldcw   [edi]
        fdivr   st2             ; st0 = st2 / st0
        fld                     ; 复制因为 'fistp' 弹出
        fistp   qword [edi]     ; 商
        fmulp                   ; st1 *= st0, 弹出st0
        fsubp                   ; st1 -= st0, 弹出st0
        fistp   qword [edi+8]   ; 余数
        fnstsw  ax
        test    ax, 101b        ; #Z (除以零)，#I (无效算术)
        pop     eax edx ebx ecx edi
        jnz     .NOK
        ret                     ; CF=0
.NOK:   stc                     ; 溢出在8000000000000000h / -1
        ret                     ; 或者除以零
; ------------------------------

FPU的舍入模式设置为“截断向零”，也称为“截断”，以模拟ALU的idiv指令的行为。"

英文:

> ... the 8087 provides a very good approximation of the real number system. It is important to remember, however, that it is not an exact representation, and that arithmetic on real numbers is inherently approximate.
> Conversely, and equally important, the 8087 does perform exact arithmetic on its integer subset of the reals. That is, an operation on two integers returns an exact integral result, provided that the true result is an integer and is in range.

A more recent manual is even more concise (Emphasis theirs):

> the IA processors ... They can process decimal numbers of up to 18 digits without round-off errors, performing exact arithmetic on integers as large as 2^64 (or 10^18).

The integer data types that the FPU supports include signed word (16 bit), signed dword (32 bit), and signed qword (64 bit). There is never any mention of UNsigned. In fact, everything about the FPU breathes signedness, to the extent that they even support signed zero (+0 and -0).
So, is it possible to use the FPU to divide a couple of unsigned 64-bit numbers and get an exact quotient and remainder for it?

For the division of a couple of signed 64-bit numbers I wrote next code. The quotient seems fine but the remainder is always returned zero. Why is this?

; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
FiDiv:  push    edi ecx ebx edx eax
        mov     edi, esp
        fninit
        fild    qword [edi]     ; Dividend
        fild    qword [edi+8]   ; Divisor
        fld
        fnstcw  [edi]
        or      word [edi], 0C00h ; Truncate Towards Zero
        fldcw   [edi]
        fdivr   st2             ; st0 = st2 / st0
        fld                     ; Duplicate because `fistp` does pop
        fistp   qword [edi]     ; Quotient
        fmulp                   ; st1 *= st0, pop st0
        fsubp                   ; st1 -= st0, pop st0
        fistp   qword [edi+8]   ; Remainder
        fnstsw  ax
        test    ax, 101b        ; #Z (Divide-By-Zero), #I (Invalid Arithmetic)
        pop     eax edx ebx ecx edi
        jnz     .NOK
        ret                     ; CF=0
.NOK:   stc                     ; Overflow on 8000000000000000h / -1
        ret                     ; or Division by zero
; ------------------------------

The FPU rounding mode is set to 'Truncate Towards Zero' aka 'Chop', so as to mimic the behavior of the ALU idiv instruction.

答案1

得分: 5

以下是代码的翻译部分：

零余数
-

> fdivr st2             ; st0 = st2 / st0
> fld                     ; 复制，因为 `fistp` 不会弹出
> fistp qword [edi]     ; 商
> fmulp                   ; st1 *= st0，弹出st0
> fsubp                   ; st1 -= st0，弹出st0
> fistp qword [edi+8]   ; 余数

这段代码计算了余数：

    余数 = 被除数 - (商 * 除数)

因为舍入模式被设置为'截断向零'，`fistp qword [edi]` 指令将在存储到内存之前将ST0中的（副本）商转换为整数。但是，仍然在（fpu）堆栈上保留的商的值是带有小数的实数。一旦与除数相乘，它将再次产生被除数，导致余数为零。
缺少的是将商舍入为整数并在（fpu）堆栈上执行的操作：

    fdivr st2             ; st0 = st2 / st0
    frndint
    fld                     ; 复制，因为 `fistp` 不会弹出
    fistp qword [edi]     ; 商
    fmulp                   ; st1 *= st0，弹出st0
    fsubp                   ; st1 -= st0，弹出st0
    fistp qword [edi+8]   ; 余数

但更快的方法是简单地重新从内存中加载整数商：

    fdivr st2             ; st0 = st2 / st0
    fistp qword [edi]     ; 商
    fild qword [edi]
    fmulp                   ; st1 *= st0，弹出st0
    fsubp                   ; st1 -= st0，弹出st0
    fistp qword [edi+8]   ; 余数


无符号除法
-

在内部，FPU为一个数字的尾数分配了64位，还有一个用于数字的符号的单独位。FPU可以表示范围从-18'446'744'073'709'551'616到18'446'744'073'709'551'616的每个整数。64位尾数允许我们使用范围从0到18'446'744'073'709'551'615的无符号64位整数。唯一的问题是如何加载和存储这些值，因为 `fild` 和 `fistp` 不能处理它们（因为它们只能在范围从-9'223'372'036'854'775'808到9'223'372'036'854'775'807的范围内操作）。
可以将无符号四字节和扩展实数格式之间相互转换，以便可以使用 `fld` 和 `fstp`。另一种方法是从它们的高半部分加载/存储无符号四字节。但是转换需要时间，因此我发现通过消除足够多的特殊情况，唯一仍然存在问题的操作是加载被除数。

特殊情况包括：

- 除以零：CF=1
- 除以一：商=被除数，余数=0
- 被除数等于除数的除法：商=1，余数=0
- 被除数小于除数的除法：商=0，余数=被除数

在实际需要 `fdiv` 的情况下，代码首先加载被除数的一半，将其返回到（fpu）堆栈上，并在真实被除数为奇数时条件性地加1。

```assembly
; 输入 (edx:eax,ecx:ebx) 输出 (edx:eax,ecx:ebx,CF)
FuDiv:  cmp     ebx, 1
        jbe     .TST            ; 除数可能为0或1
.a:     cmp     edx, ecx
        jb      .LT             ; 被除数 < 除数
        ja      .b              ; 被除数 > 除数
        cmp     eax, ebx
        jb      .LT             ; 被除数 < 除数
        je      .GE             ; 被除数 = 除数
.b:     test    ecx, ecx
        js      .GE             ; 被除数 > 除数 > 7FFFFFFFFFFFFFFFh

        shr     edx, 1          ; 将无符号64位被除数减半
        rcr     eax, 1          ; (CF) 为了使用 `fild` 而将其带入范围
        push    edi ecx ebx edx eax
        mov     edi, esp
        fninit
        fild    qword [edi]     ; st0 = int(被除数 / 2)
        fadd    st0             ; st0 = {被除数 - 1, 被除数}
        jnc     .c              ; (CF)
        fld1
        faddp                   ; st0 = 被除数 [0, FFFFFFFFFFFFFFFFh]
.c:     fild    qword [edi+8]   ; 除数为 [2, 7FFFFFFFFFFFFFFFh]
        fld
        fnstcw  [edi]
        or      word [edi], 0C00h ; 向零截断
        fldcw   [edi]
        fdivr   st2             ; st0 = st2 / st0
        fistp   qword [edi]     ; 商
        fild    qword [edi]
        fmulp                   ; st1 *= st0，弹出st0
        fsubp                   ; st1 -= st0，弹出st0
        fistp   qword [edi+8]   ; 余数
        pop     eax edx ebx ecx edi
        ret                     ; CF=0

.TST:   test    ecx, ecx
        jnz     .a
        cmp     ebx, 1          ; 除数为0或1


<details>
<summary>英文:</summary>

Zero remainder
-

&gt;     fdivr   st2             ; st0 = st2 / st0
&gt;     fld                     ; Duplicate because `fistp` does pop
&gt;     fistp   qword [edi]     ; Quotient
&gt;     fmulp                   ; st1 *= st0, pop st0
&gt;     fsubp                   ; st1 -= st0, pop st0
&gt;     fistp   qword [edi+8]   ; Remainder

This code calculates the remainder from:

    Remainder = Dividend - (Quotient * Divisor)

Because the Rounding Mode got set to &#39;Truncate Towards Zero&#39;, the `fistp qword [edi]` instruction will convert the (copy of the) quotient held in ST0 to an integer right before storing to memory. However the value for the quotient that remains on the (fpu) stack is still a real number with a fraction. Once multiplied with the divisor it will produce the dividend again, resulting in a zero remainder.  
What is missing is rounding the quotient to an integer and doing it on the (fpu) stack already:

    fdivr   st2             ; st0 = st2 / st0
    frndint
    fld                     ; Duplicate because `fistp` does pop
    fistp   qword [edi]     ; Quotient
    fmulp                   ; st1 *= st0, pop st0
    fsubp                   ; st1 -= st0, pop st0
    fistp   qword [edi+8]   ; Remainder

But a faster way is to simply reload the integer quotient from memory:

    fdivr   st2             ; st0 = st2 / st0
    fistp   qword [edi]     ; Quotient
    fild    qword [edi]
    fmulp                   ; st1 *= st0, pop st0
    fsubp                   ; st1 -= st0, pop st0
    fistp   qword [edi+8]   ; Remainder


Unsigned division
-

Internally, the FPU dedicates 64 bits to the significand of a number plus a separate bit for the sign of the number. The FPU can represent every whole number in the range from  -18&#39;446744073&#39;709551616 to 18&#39;446744073&#39;709551616. The 64-bit significand allows us to work with the unsigned 64-bit integers that range from 0 to 18&#39;446744073&#39;709551615. The only trouble is how to load and store these values that `fild` and `fistp` can&#39;t deal with (because they are restricted to operate in the range from -9&#39;223372036&#39;854775808 to 9&#39;223372036&#39;854775807).  
It would be possible to convert back and forth between unsigned quadword and the extended real format so we could use `fld` and `fstp` instead. Another approach would load/store the unsigned quadwords from/to their upper and lower halves. But conversions take time and so I found that by eliminating enough special cases, the only troublesome operation that remains was the loading of the dividend. Everything else can just use `fild` and `fistp` as usual.

The special cases include:

- Division by zero: CF=1
- Division by one: Quotient=Dividend and Remainder=0
- Divisions where the Dividend is equal to the Divisor: Quotient=1 and Remainder=0  
- Divisions where the Dividend is smaller than the Divisor: Quotient=0 and Remainder=Dividend

Where an actual `fdiv` is needed, the code first loads half of the dividend, doubles it back on the (fpu) stack, and conditionally adds 1 if the true dividend was odd.

    shr     edx, 1          ; Halving the unsigned 64-bit Dividend
    rcr     eax, 1          ; (CF)      to get in range for `fild`
    push    edi ecx ebx edx eax
    mov     edi, esp
    fninit
    fild    qword [edi]     ; st0 = int(Dividend / 2)
    fadd    st0             ; st0 = {Dividend - 1, Dividend}
    jnc     .c              ; (CF)
    fld1
    faddp                   ; st0 = Dividend [0, FFFFFFFFFFFFFFFFh]

.c: fild qword [edi+8] ; Divisor is [2, 7FFFFFFFFFFFFFFFh]
fld
fnstcw [edi]
or word [edi], 0C00h ; Truncate Towards Zero
fldcw [edi]
fdivr st2 ; st0 = st2 / st0
fistp qword [edi] ; Quotient
fild qword [edi]
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
pop eax edx ebx ecx edi
ret ; CF=0

.TST: test ecx, ecx
jnz .a
cmp ebx, 1 ; Divisor is 0 or 1
jb .NOK
.ONE: dec ebx ; Remainder ECX:EBX is 0
.NOK: ret

.GE: sub eax, ebx ; Remainder ECX:EBX is Dividend - Divisor
sbb edx, ecx
mov ebx, eax
mov ecx, edx
mov eax, 1 ; Quotient EDX:EAX is 1
cdq
ret ; CF=0

.LT: mov ebx, eax ; Remainder ECX:EBX is Dividend
mov ecx, edx
xor eax, eax ; Quotient EDX:EAX is 0
cdq
ret ; CF=0
; ------------------------------


Faster results
-

Quoting from an [answer that uses a technique named Chunking to divide a couple of 64-bit integers](https://stackoverflow.com/questions/17217440/how-to-do-64-bit-by-64-bit-division-in-assembly/76332918#76332918):

&gt; Even if you are using a 64-bit data type, practice shows me that the majority of divisions in a (general purpose) program could still do with just using the built-in `div` instruction. And that is why I prefixed my code with a detection mechanism that checks whether the divisor in ECX:EBX is less than 4GB (so fitting EBX) and that the dividend&#39;s extension in EDX is less than the divisor in EBX. If these conditions are met, the normal `div` instruction does the job, and does it faster too. If for some reason (eg. school) using `div` is not allowed, then simply remove the prefixed code to be in the clear.

Today&#39;s code can benefit from the same prefix, but as it turns out, it is more beneficial to keep detecting the special cases first:

; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
FuDiv: cmp ebx, 1
jbe .TST ; Divisor could be 0 or 1
.a: cmp edx, ecx
jb .LT ; Dividend < Divisor
ja .b ; Dividend > Divisor
cmp eax, ebx
jb .LT ; Dividend < Divisor
je .GE ; Dividend = Divisor
.b: test ecx, ecx
js .GE ; Dividend > Divisor > 7FFFFFFFFFFFFFFFh
; - - - - - - - - - - - - - - -
jnz .fdiv
cmp edx, ebx
jnb .fdiv
.div: div ebx ; EDX:EAX / EBX --> EAX Quotient, EDX Remainder
mov ebx, edx ; Remainder to ECX:EBX
xor edx, edx ; Quotient to EDX:EAX
ret ; CF=0
; - - - - - - - - - - - - - - -
.fdiv: shr edx, 1 ; Halving the unsigned 64-bit Dividend
rcr eax, 1 ; (CF) to get in range for fild
push edi ecx ebx edx eax
mov edi, esp
fninit
fild qword [edi] ; st0 = int(Dividend / 2)
fadd st0 ; st0 = {Dividend - 1, Dividend}
jnc .c ; (CF)
fld1
faddp ; st0 = Dividend [0, FFFFFFFFFFFFFFFFh]
.c: fild qword [edi+8] ; Divisor is [2, 7FFFFFFFFFFFFFFFh]
fld
fnstcw [edi]
or word [edi], 0C00h ; Truncate Towards Zero
fldcw [edi]
fdivr st2 ; st0 = st2 / st0
fistp qword [edi] ; Quotient
fild qword [edi]
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
pop eax edx ebx ecx edi
ret ; CF=0

.TST: test ecx, ecx
jnz .a
cmp ebx, 1 ; Divisor is 0 or 1
jb .NOK
.ONE: dec ebx ; Remainder ECX:EBX is 0
.NOK: ret

.GE: sub eax, ebx ; Remainder ECX:EBX is Dividend - Divisor
sbb edx, ecx
mov ebx, eax
mov ecx, edx
mov eax, 1 ; Quotient EDX:EAX is 1
cdq
ret ; CF=0

.LT: mov ebx, eax ; Remainder ECX:EBX is Dividend
mov ecx, edx
xor eax, eax ; Quotient EDX:EAX is 0
cdq
ret ; CF=0
; ------------------------------



</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

x87能够对无符号QUADword整数执行精确的除法吗？

问题

答案1

舍入问题 – VFP

IorD 控制信号在 MIPS 中

使用192/256位整数计算无符号64位整数向量的点积和的最快方法是什么？

LLVM，预期指令？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论