gpt4 book ai didi

assembly - x87 可以对 UNsigned QUADword 整数执行精确除法吗?

转载 作者:行者123 更新时间:2023-12-03 07:54:17 24 4
gpt4 key购买 nike

8086/8087/8088 宏汇编语言引用手册 (c) 1980 Intel Corporation 提到(重点是我的):

... the 8087 provides a very good approximation of the real number system. It is important to remember, however, that it is not an exact representation, and that arithmetic on real numbers is inherently approximate.
Conversely, and equally important, the 8087 does perform exact arithmetic on its integer subset of the reals. That is, an operation on two integers returns an exact integral result, provided that the true result is an integer and is in range.

最近的手册更加简洁(强调他们的):

the IA processors ... They can process decimal numbers of up to 18 digits without round-off errors, performing exact arithmetic on integers as large as 2^64 (or 10^18).

FPU 支持的整数数据类型包括有符号字(16 位)、有符号双字(32 位)和有符号 qword(64 位)。从来没有提到过 UNsigned。事实上,关于 FPU 的一切都具有符号性,甚至支持有符号零(+0 和 -0)。
那么,是否可以使用 FPU 将几个无符号 64 位数字相除并获得精确的商和余数?

对于几个有符号 64 位数字的除法,我编写了下一个代码。商看起来不错,但余数总是返回零。这是为什么?

; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
FiDiv: push edi ecx ebx edx eax
mov edi, esp
fninit
fild qword [edi] ; Dividend
fild qword [edi+8] ; Divisor
fld
fnstcw [edi]
or word [edi], 0C00h ; Truncate Towards Zero
fldcw [edi]
fdivr st2 ; st0 = st2 / st0
fld ; Duplicate because `fistp` does pop
fistp qword [edi] ; Quotient
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
fnstsw ax
test ax, 101b ; #Z (Divide-By-Zero), #I (Invalid Arithmetic)
pop eax edx ebx ecx edi
jnz .NOK
ret ; CF=0
.NOK: stc ; Overflow on 8000000000000000h / -1
ret ; or Division by zero
; ------------------------------

FPU 舍入模式设置为“向零截断”又名“Chop”,以模仿 ALU idiv 指令的行为。

最佳答案

余数为零

fdivr   st2             ; st0 = st2 / st0
fld ; Duplicate because `fistp` does pop
fistp qword [edi] ; Quotient
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder

此代码计算余数:

Remainder = Dividend - (Quotient * Divisor)

由于舍入模式设置为“向零截断”,fistp qword [edi] 指令将在存储到内存之前将 ST0 中保存的商(商的副本)转换为整数。然而,保留在 (fpu) 堆栈上的商值仍然是带分数的实数。一旦与除数相乘,它将再次产生被除数,导致余数为零。
缺少的是将商舍入为整数并已在 (fpu) 堆栈上执行此操作:

fdivr   st2             ; st0 = st2 / st0
frndint
fld ; Duplicate because `fistp` does pop
fistp qword [edi] ; Quotient
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder

但更快的方法是简单地从内存中重新加载整数商:

fdivr   st2             ; st0 = st2 / st0
fistp qword [edi] ; Quotient
fild qword [edi]
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder

无符号除法

在内部,FPU 将 64 位专用于数字的有效数,加上一个单独的位用于数字的符号。 FPU 可以表示从 -18'446744073'709551616 到 18'446744073'709551616 范围内的每个整数。 64 位有效数允许我们处理范围从 0 到 18'446744073'709551615 的无符号 64 位整数。唯一的麻烦是如何加载和存储这些 fildfistp 无法处理的值(因为它们被限制在 -9'223372036' 范围内操作) 854775808 至 9'223372036'854775807)。
可以在无符号四字和扩展实数格式之间来回转换,因此我们可以使用 fld 和 fstp 来代替。另一种方法是从上半部分和下半部分加载/存储无符号四字。但转换需要时间,所以我发现通过消除足够多的特殊情况,剩下的唯一麻烦的操作就是股息的加载。其他一切都可以照常使用 fildfistp

特殊情况包括:

  • 除以零:CF=1
  • 除以一:商=被除数,余数=0
  • 被除数等于除数的除法:商 = 1 且余数 = 0
  • 被除数小于除数的除法:商=0 且余数=被除数

在需要实际 fdiv 的地方,代码首先加载一半的被除数,将其加倍返回 (fpu) 堆栈,如果真正的被除数是奇数,则有条件地加 1。

; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
FuDiv: cmp ebx, 1
jbe .TST ; Divisor could be 0 or 1
.a: cmp edx, ecx
jb .LT ; Dividend < Divisor
ja .b ; Dividend > Divisor
cmp eax, ebx
jb .LT ; Dividend < Divisor
je .GE ; Dividend = Divisor
.b: test ecx, ecx
js .GE ; Dividend > Divisor > 7FFFFFFFFFFFFFFFh

shr edx, 1 ; Halving the unsigned 64-bit Dividend
rcr eax, 1 ; (CF) to get in range for `fild`
push edi ecx ebx edx eax
mov edi, esp
fninit
fild qword [edi] ; st0 = int(Dividend / 2)
fadd st0 ; st0 = {Dividend - 1, Dividend}
jnc .c ; (CF)
fld1
faddp ; st0 = Dividend [0, FFFFFFFFFFFFFFFFh]
.c: fild qword [edi+8] ; Divisor is [2, 7FFFFFFFFFFFFFFFh]
fld
fnstcw [edi]
or word [edi], 0C00h ; Truncate Towards Zero
fldcw [edi]
fdivr st2 ; st0 = st2 / st0
fistp qword [edi] ; Quotient
fild qword [edi]
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
pop eax edx ebx ecx edi
ret ; CF=0

.TST: test ecx, ecx
jnz .a
cmp ebx, 1 ; Divisor is 0 or 1
jb .NOK
.ONE: dec ebx ; Remainder ECX:EBX is 0
.NOK: ret

.GE: sub eax, ebx ; Remainder ECX:EBX is Dividend - Divisor
sbb edx, ecx
mov ebx, eax
mov ecx, edx
mov eax, 1 ; Quotient EDX:EAX is 1
cdq
ret ; CF=0

.LT: mov ebx, eax ; Remainder ECX:EBX is Dividend
mov ecx, edx
xor eax, eax ; Quotient EDX:EAX is 0
cdq
ret ; CF=0
; ------------------------------

更快的结果

引自answer that uses a technique named Chunking to divide a couple of 64-bit integers :

Even if you are using a 64-bit data type, practice shows me that the majority of divisions in a (general purpose) program could still do with just using the built-in div instruction. And that is why I prefixed my code with a detection mechanism that checks whether the divisor in ECX:EBX is less than 4GB (so fitting EBX) and that the dividend's extension in EDX is less than the divisor in EBX. If these conditions are met, the normal div instruction does the job, and does it faster too. If for some reason (eg. school) using div is not allowed, then simply remove the prefixed code to be in the clear.

今天的代码可以受益于相同的前缀,但事实证明,首先继续检测特殊情况会更有利:

; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
FuDiv: cmp ebx, 1
jbe .TST ; Divisor could be 0 or 1
.a: cmp edx, ecx
jb .LT ; Dividend < Divisor
ja .b ; Dividend > Divisor
cmp eax, ebx
jb .LT ; Dividend < Divisor
je .GE ; Dividend = Divisor
.b: test ecx, ecx
js .GE ; Dividend > Divisor > 7FFFFFFFFFFFFFFFh
; - - - - - - - - - - - - - - -
jnz .fdiv
cmp edx, ebx
jnb .fdiv
.div: div ebx ; EDX:EAX / EBX --> EAX Quotient, EDX Remainder
mov ebx, edx ; Remainder to ECX:EBX
xor edx, edx ; Quotient to EDX:EAX
ret ; CF=0
; - - - - - - - - - - - - - - -
.fdiv: shr edx, 1 ; Halving the unsigned 64-bit Dividend
rcr eax, 1 ; (CF) to get in range for `fild`
push edi ecx ebx edx eax
mov edi, esp
fninit
fild qword [edi] ; st0 = int(Dividend / 2)
fadd st0 ; st0 = {Dividend - 1, Dividend}
jnc .c ; (CF)
fld1
faddp ; st0 = Dividend [0, FFFFFFFFFFFFFFFFh]
.c: fild qword [edi+8] ; Divisor is [2, 7FFFFFFFFFFFFFFFh]
fld
fnstcw [edi]
or word [edi], 0C00h ; Truncate Towards Zero
fldcw [edi]
fdivr st2 ; st0 = st2 / st0
fistp qword [edi] ; Quotient
fild qword [edi]
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
pop eax edx ebx ecx edi
ret ; CF=0

.TST: test ecx, ecx
jnz .a
cmp ebx, 1 ; Divisor is 0 or 1
jb .NOK
.ONE: dec ebx ; Remainder ECX:EBX is 0
.NOK: ret

.GE: sub eax, ebx ; Remainder ECX:EBX is Dividend - Divisor
sbb edx, ecx
mov ebx, eax
mov ecx, edx
mov eax, 1 ; Quotient EDX:EAX is 1
cdq
ret ; CF=0

.LT: mov ebx, eax ; Remainder ECX:EBX is Dividend
mov ecx, edx
xor eax, eax ; Quotient EDX:EAX is 0
cdq
ret ; CF=0
; ------------------------------

关于assembly - x87 可以对 UNsigned QUADword 整数执行精确除法吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76450759/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com