英文:
How do you manually calculate imul -1 * 3?
问题
我正在学习x86-64汇编语言,参加了一门在线课程,但是课程细节非常不清晰。我在网上搜索并阅读了几个Stack Overflow问题,但没有找到答案。
我尝试手工计算二进制乘法,但卡在了imul
。
以二进制乘法示例11111111 * 00000011
为例,它可以视为无符号的255 * 3
或有符号的-1 * 3
。
255*3
mov al, 255
mov bl, 3
mul bl
这很简单,以下是我手工计算的方法,就像十进制乘法一样:
11111111
x 00000011
--------------
11111111
11111111
--------------
1011111101
结果溢出,上半部分是ah
中的10
,下半部分是al
中的11111101
。我的手工计算与程序结果相符。
-1*3
当涉及到有符号数时,
mov al, -1
mov bl, 3
imul bl
程序的结果是ah
中的11111111
和al
中的11111101
。
我该如何手工计算这个结果?我被告知imul
中涉及符号扩展,但我真的不知道它在这里是如何工作的。
我正在使用SASM IDE和NASM汇编器。
英文:
I am learning x86-64 assembly with an online course and the course is horribly unclear in detail. I've searched online and read several SO questions but couldn't get an answer.
I tried to figure out how to calculate binary multiplication by hand, but I am stuck with imul
.
Given this example of binary multiplication, 11111111 * 00000011
, it can be viewed as 255 * 3
unsigned or -1 * 3
signed.
255*3
mov al, 255
mov bl, 3
mul bl
this is easy and here's how I calculate by hand, just like decimal multiplication:
11111111
x 00000011
--------------
11111111
11111111
--------------
1011111101
The result overflows, the upper half is 10
in ah
and the lower half is 11111101
in al
. My manual calculation matches the program result.
-1*3
when it comes to signed,
mov al, -1
mov bl, 3
imul bl
the program result is 11111111
in ah
and 11111101
in al
.
How can I calculate this result by hand? I was told that sign extension is involved in imul
, but I really don't know how it works here.
I am using SASM IDE and NASM Assembler.
答案1
得分: 2
技术上,你使用了长乘法的捷径。虽然这个捷径对于获取无符号位模式来说是相当合理的,但它在概念上忽略了一些内容。
无符号乘法的全面计算如下:
11111111
x 00000011
--------------
0000000011111111
+000000011111111
+00000000000000
+0000000000000
+000000000000
+00000000000
+0000000000
+000000000
------------------
00000001011111101 = 255 x 3 = 765
这些相加的结果是17位,尽管你只会从mul
得到16位,因为输入只有8位宽度,所以不会发生进位或无符号溢出。
在上面的计算中,我添加了前导零,但将尾随零留为空格,就像你的原文一样。
而在有符号算术中:
11111111
x 00000011
--------------
1111111111111111
+111111111111111
+00000000000000
+0000000000000
+000000000000
+00000000000
+0000000000
-000000000 // 注意,因为MSB的位置值为-2^7,所以是减法
------------------
11111111111111101 = -1 x 3 = -3,这个结果也是17位
这些相加的结果同样是17位,尽管你只会从`imul`得到16位,因为输入只有8位宽度,所以不会发生有符号溢出。
因此,我们通过将乘数(第一个操作数)的符号扩展为16位部分积来考虑它的符号。
我们通过减去而不是加上最高有效位的部分积来考虑乘法器(第二个操作数)的符号。这是因为这个部分积是通过乘以0或`-2^7`得到的,这个输入的符号位具有-2^7的位置值。对于无符号数来说,所有的位置值都是正的,但n位的[二进制补码][1]是通过给最高位赋予-2^(n-1)的位置值,而不是+2^(n-1)。例如,在8位二进制补码(`int8_t`)中,`0x80`是-128的位模式,但在无符号数(`uint8_t`)中是+128。
或者,我们可以对这个输入进行符号扩展,并进行16个部分积的计算,而不是8个,但这需要更多的工作。
[1]: https://en.wikipedia.org/wiki/Two%27s_complement
英文:
Technically, you took a short cut with the long multiplication. Though the short cut is quite reasonable for obtaining the unsigned bit pattern, it glosses over something conceptually.
11111111
x 00000011
--------------
11111111
11111111
--------------
1011111101
Done out fully we have for unsigned:
11111111
x 00000011
--------------
0000000011111111
+000000011111111
+00000000000000
+0000000000000
+000000000000
+00000000000
+0000000000
+000000000
------------------
00000001011111101 = 255 x 3 = 765,
The summation of those being 17 bits, though you would only get 16 from mul
, and carry/unsigned-overflow cannot happen because the inputs are only 8 bits wide.
In the above, have added leading zeros though left the trailing zeros as blanks the way you have it.
Whereas in signed arithmetic:
11111111
x 00000011
--------------
1111111111111111
+111111111111111
+00000000000000
+0000000000000
+000000000000
+00000000000
+0000000000
-000000000 // note, subtracting because MSB has place value -2^7
------------------
11111111111111101 = -1 x 3 = -3, the summation being 17 bits
The summation of those being 17 bits, though you would only get 16 from imul
, and signed overflow cannot happen because the inputs are only 8 bits wide.
So we account for the sign of the multiplicand (first operand) by sign-extending it into 16-bit partial products.
We account for the sign of the multiplier (second operand) by subtracting instead of adding the most-significant partial product. Because that's from multiplying by 0 or -2^7
, the place value of the sign bit in that input. For unsigned, all the place values are positive, but n-bit 2's complement works by giving the MSB a place value of -2^(n-1)
instead of +2^(n-1)
. e.g. 0x80
is the bit-pattern for -128
in 8-bit 2's complement (int8_t
), but +128 as unsigned (uint8_t
).
Alternatively, we could sign-extend that input and do 16 partial products instead of 8, but that's more work.
答案2
得分: 2
以下是翻译好的部分:
"n x n => 2n" 乘积的低位 "n" 位与有符号与无符号无关。但高位则有关。结果等同于将目标宽度的两个输入都进行符号扩展(imul
)或零扩展(mul
),然后进行非扩展乘法,尽管实际实现当然会更有效率。有关二进制表示,请参阅Erik的答案。
如果我只想手动计算imul
会产生什么结果,我会做-1 * 3 = -3
,然后计算 -3
作为2的补码16位整数的位模式。由于这是imul
,我知道我必须将8位位模式解释为有符号的-1
,而不是像mul
那样无符号的0xff
。
除非我正在检查某些位操作,使用乘法来移动和添加位字段(例如https://stackoverflow.com/questions/76139951/popcount-assembly-sum-indexes-of-set-bits/76144619#76144619),否则我不会手动使用二进制(即使在那种情况下,我实际上也会考虑十六进制或字节)。当然,了解如何以这种方式书写是很好的,但如果你愿意,也可以这样做。
在C语言中,imul r/m8
是 int16_t ax = (int16_t)(int8_t)al * (int16_t)(int8_t)src;
。 (C语言很奇怪,因为像*
这样的运算符已经将窄输入提升为int
,而int
至少与int16_t
一样宽。)
在汇编语言中,movsx ax, al
(或 cbw
);movsx cx, byte src
;imul ax, cx
,但不破坏CX。(如果你愿意,可以想象一个隐藏的临时寄存器。)
你很少需要扩展的mul
或imul
,因为它们对于16位或更宽的输入需要多于1个uop(因为它们必须将乘积的两半写入多个寄存器,如EDX:EAX,而不仅仅是RAX的低16位)。有趣的是,8位的imul bl
在现代CPU上运行为单个uop。请参见 https://uops.info/ 和 https://agner.org/optimize/
但 imul bl
只将其结果扩展到16位,没有扩展到方便的32位,因此你更可能希望将输入扩展到32位,以进行 imul ecx, ebx
等操作,因为在x86-64上,32位是大多数情况下最有效的操作数大小。
imul r, r/m
和 imul r, r/m, imm
形式在现代CPU上是单个uop且快速的。一些较旧或低功耗的CPU对于64位的imul
较慢。
英文:
The low n
bits of an n x n => 2n
product don't depend on signed vs. unsigned. But the high half does. The result is equivalent to sign-extending (imul
) or zero-extending (mul
) both inputs to the destination width and then doing a non-widening multiply, although of course the actual implementation would do something more efficient. See Erik's answer for what that looks like in terms of bits.
If I just wanted to manually work out what imul
would produce, I'd do -1 * 3 = -3
, and then work out the bit-pattern for -3
as a 2's complement 16-bit integer. Since it's imul
, I know I have to interpret the 8-bit bit-patterns as signed -1
not unsigned 0xff
like mul
would.
I wouldn't use binary by hand unless I was checking some bithack that was using multiplies to shift around and add bit-fields (like https://stackoverflow.com/questions/76139951/popcount-assembly-sum-indexes-of-set-bits/76144619#76144619 but even then I was actually thinking in hex or bytes.) It is of course good to understand how you could write it down that way if you wanted to, though.
In C terms, imul r/m8
is int16_t ax = (int16_t)(int8_t)al * (int16_t)(int8_t)src;
. (C is weird because operators like *
already promote narrow inputs to int
, and int
is at least as wide as int16_t
.)
In asm terms, movsx ax, al
(or cbw
); movsx cx, byte src
; imul ax, cx
, but without destroying CX. (Imagine a hidden temporary register if you want.)
You rarely actually want widening mul
or imul
since they run as more than 1 uop for 16-bit or wider inputs (since they have to write halves of the product to multiple registers like EDX:EAX, not just the low 16 bits of RAX). Interestingly, 8-bit imul bl
runs as a single uop on modern CPUs. See https://uops.info/ and https://agner.org/optimize/
But imul bl
only widens its result to 16-bit, not all the way to a convenient 32-bit, so you'd more often want to have your inputs widened to 32-bit for imul ecx, ebx
or something, since 32-bit is the most efficient operand-size most of the time on x86-64.
The imul r, r/m
and imul r, r/m, imm
forms are single-uop and fast on modern CPUs. Some older or low-power CPUs are slower for 64-bit imul
.
答案3
得分: 0
诚实地说,我无法完全理解其他两个答案。对我来说,它们过于复杂了。我只需要一个愚蠢、简单和通用的规则。
我想只是选择对我有效的方法。
结果等同于将目标宽度的两个输入进行符号扩展(imul)或零扩展(mul),然后执行非扩展乘法
@Peter Cordes
要手动计算,有几种方法:您可以将两个输入符号扩展为16位,然后进行16×16乘法,仅保留低16位
@Erik Eidt
我尝试并验证了这个规则对我有效。
-1*3
,
符号扩展
11111111 11111111
x00000000 00000011
-------------------
11111111 11111111
111111111 1111111
-------------------
1011111111 11111101
保留低16位,我得到正确的结果11111111(ah) 11111101(al)
。
尝试一个4位的例子:
-2 * 3
,1110
imul 0011
符号扩展
1111 1110
x0000 0011
-----------
1111 1110
11111 110
-----------
101111 1010
保留较低的8位,结果是1111 1010
,-6
。
之前我不确定imul
中的符号扩展是如何工作的,现在我明白了,很容易验证。顺便说一下,如果您觉得某些示例的手动计算很烦琐(例如4位-7 * -1
,1001 x 1111
,符号扩展1111 1001 x 1111 1111
,需要添加许多行),您可以使用Windows计算器(程序员模式)快速验证结果。
英文:
Honestly, I can't fully understand the other two answers. It's over complicated for me. I just need a dumb, simple and universal rule.
I'd like to just pick up what works for me.
> The result is equivalent to sign-extending (imul) or zero-extending
> (mul) both inputs to the destination width and then doing a
> non-widening multiply
>
> @Peter Cordes
> To manually compute there are several approaches: you can sign extend
> both inputs to 16-bits and do 16 × 16 keeping only the low 16-bits
>
> @Erik Eidt
I tried and verified and this rule works for me.
-1*3
,
sign extended
11111111 11111111
x00000000 00000011
-------------------
11111111 11111111
111111111 1111111
-------------------
1011111111 11111101
keep the low 16 bits, I get the correct result 11111111(ah) 11111101(al)
.
Try a 4 bit example:
-2 * 3
, 1110
imul 0011
sign extended
1111 1110
x0000 0011
-----------
1111 1110
11111 110
-----------
101111 1010
keep the lower 8 bits, the result is 1111 1010
, -6
.
Before I wasn't sure how sign extended works in imul
, now I get it and it's easy to verify. Btw, if you find manual calculation tedious for some examples (e.g. 4 bit -7 * -1
, 1001 x 1111
, sign extended 1111 1001 x 1111 1111
, many lines to add), you can use the Windows Calculator (programmer mode) and verify the result very quickly.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论