uoy erA woH ycancalC lunami -1 * 3?

huangapple go评论58阅读模式
英文:

How do you manually calculate imul -1 * 3?

问题

我正在学习x86-64汇编语言,参加了一门在线课程,但是课程细节非常不清晰。我在网上搜索并阅读了几个Stack Overflow问题,但没有找到答案。

我尝试手工计算二进制乘法,但卡在了imul

以二进制乘法示例11111111 * 00000011为例,它可以视为无符号的255 * 3或有符号的-1 * 3

255*3

   mov al, 255
   mov bl, 3
   mul bl

这很简单,以下是我手工计算的方法,就像十进制乘法一样:

     11111111
x    00000011 
--------------
     11111111       
    11111111
--------------
   1011111101  

结果溢出,上半部分是ah中的10,下半部分是al中的11111101。我的手工计算与程序结果相符。

-1*3

当涉及到有符号数时,

   mov al, -1
   mov bl, 3
   imul bl

程序的结果是ah中的11111111al中的11111101

我该如何手工计算这个结果?我被告知imul中涉及符号扩展,但我真的不知道它在这里是如何工作的。

我正在使用SASM IDE和NASM汇编器。

英文:

I am learning x86-64 assembly with an online course and the course is horribly unclear in detail. I've searched online and read several SO questions but couldn't get an answer.

I tried to figure out how to calculate binary multiplication by hand, but I am stuck with imul .

Given this example of binary multiplication, 11111111 * 00000011, it can be viewed as 255 * 3 unsigned or -1 * 3 signed.

255*3

   mov al, 255
   mov bl, 3
   mul bl

this is easy and here's how I calculate by hand, just like decimal multiplication:

     11111111
x    00000011 
--------------
     11111111       
    11111111
--------------
   1011111101  

The result overflows, the upper half is 10 in ah and the lower half is 11111101 in al. My manual calculation matches the program result.

-1*3

when it comes to signed,

   mov al, -1
   mov bl, 3
   imul bl

the program result is 11111111 in ah and 11111101 in al.

How can I calculate this result by hand? I was told that sign extension is involved in imul, but I really don't know how it works here.

I am using SASM IDE and NASM Assembler.

答案1

得分: 2

技术上,你使用了长乘法的捷径。虽然这个捷径对于获取无符号位模式来说是相当合理的,但它在概念上忽略了一些内容。

无符号乘法的全面计算如下:

             11111111
        x    00000011
        --------------
     0000000011111111
    +000000011111111 
    +00000000000000 
    +0000000000000 
    +000000000000 
    +00000000000 
    +0000000000 
    +000000000 
    ------------------   
    00000001011111101  = 255 x 3 = 765

这些相加的结果是17位,尽管你只会从mul得到16位,因为输入只有8位宽度,所以不会发生进位或无符号溢出。

在上面的计算中,我添加了前导零,但将尾随零留为空格,就像你的原文一样。

而在有符号算术中:

             11111111
        x    00000011
        --------------
     1111111111111111
    +111111111111111 
    +00000000000000 
    +0000000000000 
    +000000000000 
    +00000000000 
    +0000000000 
    -000000000         // 注意,因为MSB的位置值为-2^7,所以是减法
    ------------------   
    11111111111111101  = -1 x 3 = -3,这个结果也是17位

这些相加的结果同样是17位,尽管你只会从`imul`得到16位,因为输入只有8位宽度,所以不会发生有符号溢出。

因此,我们通过将乘数(第一个操作数)的符号扩展为16位部分积来考虑它的符号。

我们通过减去而不是加上最高有效位的部分积来考虑乘法器(第二个操作数)的符号。这是因为这个部分积是通过乘以0或`-2^7`得到的,这个输入的符号位具有-2^7的位置值。对于无符号数来说,所有的位置值都是正的,但n位的[二进制补码][1]是通过给最高位赋予-2^(n-1)的位置值,而不是+2^(n-1)。例如,在8位二进制补码(`int8_t`)中,`0x80`是-128的位模式,但在无符号数(`uint8_t`)中是+128。

或者,我们可以对这个输入进行符号扩展,并进行16个部分积的计算,而不是8个,但这需要更多的工作。

[1]: https://en.wikipedia.org/wiki/Two%27s_complement
英文:

Technically, you took a short cut with the long multiplication.  Though the short cut is quite reasonable for obtaining the unsigned bit pattern, it glosses over something conceptually.

     11111111
x    00000011 
--------------
     11111111
    11111111
--------------
   1011111101  

Done out fully we have for unsigned:

         11111111
    x    00000011
    --------------
 0000000011111111
+000000011111111 
+00000000000000 
+0000000000000 
+000000000000 
+00000000000 
+0000000000 
+000000000 
------------------   
00000001011111101  = 255 x 3 = 765, 

The summation of those being 17 bits, though you would only get 16 from mul, and carry/unsigned-overflow cannot happen because the inputs are only 8 bits wide.

In the above, have added leading zeros though left the trailing zeros as blanks the way you have it.

Whereas in signed arithmetic:

         11111111
    x    00000011
    --------------
 1111111111111111
+111111111111111 
+00000000000000 
+0000000000000 
+000000000000 
+00000000000 
+0000000000 
-000000000         // note, subtracting because MSB has place value -2^7
------------------   
11111111111111101  = -1 x 3 = -3, the summation being 17 bits

The summation of those being 17 bits, though you would only get 16 from imul, and signed overflow cannot happen because the inputs are only 8 bits wide.

So we account for the sign of the multiplicand (first operand) by sign-extending it into 16-bit partial products.

We account for the sign of the multiplier (second operand) by subtracting instead of adding the most-significant partial product. Because that's from multiplying by 0 or -2^7, the place value of the sign bit in that input. For unsigned, all the place values are positive, but n-bit 2's complement works by giving the MSB a place value of -2^(n-1) instead of +2^(n-1). e.g. 0x80 is the bit-pattern for -128 in 8-bit 2's complement (int8_t), but +128 as unsigned (uint8_t).

Alternatively, we could sign-extend that input and do 16 partial products instead of 8, but that's more work.

答案2

得分: 2

以下是翻译好的部分:

"n x n => 2n" 乘积的低位 "n" 位与有符号与无符号无关。但高位则有关。结果等同于将目标宽度的两个输入都进行符号扩展(imul)或零扩展(mul),然后进行非扩展乘法,尽管实际实现当然会更有效率。有关二进制表示,请参阅Erik的答案。

如果我只想手动计算imul会产生什么结果,我会做-1 * 3 = -3,然后计算 -3 作为2的补码16位整数的位模式。由于这是imul,我知道我必须将8位位模式解释为有符号的-1,而不是像mul那样无符号的0xff

除非我正在检查某些位操作,使用乘法来移动和添加位字段(例如https://stackoverflow.com/questions/76139951/popcount-assembly-sum-indexes-of-set-bits/76144619#76144619),否则我不会手动使用二进制(即使在那种情况下,我实际上也会考虑十六进制或字节)。当然,了解如何以这种方式书写是很好的,但如果你愿意,也可以这样做。

在C语言中,imul r/m8int16_t ax = (int16_t)(int8_t)al * (int16_t)(int8_t)src;。 (C语言很奇怪,因为像*这样的运算符已经将窄输入提升为int,而int至少与int16_t一样宽。)

在汇编语言中,movsx ax, al(或 cbw);movsx cx, byte srcimul ax, cx,但不破坏CX。(如果你愿意,可以想象一个隐藏的临时寄存器。)

你很少需要扩展的mulimul,因为它们对于16位或更宽的输入需要多于1个uop(因为它们必须将乘积的两半写入多个寄存器,如EDX:EAX,而不仅仅是RAX的低16位)。有趣的是,8位的imul bl 在现代CPU上运行为单个uop。请参见 https://uops.info/https://agner.org/optimize/

imul bl 只将其结果扩展到16位,没有扩展到方便的32位,因此你更可能希望将输入扩展到32位,以进行 imul ecx, ebx 等操作,因为在x86-64上,32位是大多数情况下最有效的操作数大小
imul r, r/mimul r, r/m, imm 形式在现代CPU上是单个uop且快速的。一些较旧或低功耗的CPU对于64位的imul 较慢。

英文:

The low n bits of an n x n => 2n product don't depend on signed vs. unsigned. But the high half does. The result is equivalent to sign-extending (imul) or zero-extending (mul) both inputs to the destination width and then doing a non-widening multiply, although of course the actual implementation would do something more efficient. See Erik's answer for what that looks like in terms of bits.

If I just wanted to manually work out what imul would produce, I'd do -1 * 3 = -3, and then work out the bit-pattern for -3 as a 2's complement 16-bit integer. Since it's imul, I know I have to interpret the 8-bit bit-patterns as signed -1 not unsigned 0xff like mul would.

I wouldn't use binary by hand unless I was checking some bithack that was using multiplies to shift around and add bit-fields (like https://stackoverflow.com/questions/76139951/popcount-assembly-sum-indexes-of-set-bits/76144619#76144619 but even then I was actually thinking in hex or bytes.) It is of course good to understand how you could write it down that way if you wanted to, though.


In C terms, imul r/m8 is int16_t ax = (int16_t)(int8_t)al * (int16_t)(int8_t)src;. (C is weird because operators like * already promote narrow inputs to int, and int is at least as wide as int16_t.)

In asm terms, movsx ax, al (or cbw); movsx cx, byte src ; imul ax, cx, but without destroying CX. (Imagine a hidden temporary register if you want.)

You rarely actually want widening mul or imul since they run as more than 1 uop for 16-bit or wider inputs (since they have to write halves of the product to multiple registers like EDX:EAX, not just the low 16 bits of RAX). Interestingly, 8-bit imul bl runs as a single uop on modern CPUs. See https://uops.info/ and https://agner.org/optimize/

But imul bl only widens its result to 16-bit, not all the way to a convenient 32-bit, so you'd more often want to have your inputs widened to 32-bit for imul ecx, ebx or something, since 32-bit is the most efficient operand-size most of the time on x86-64.
The imul r, r/m and imul r, r/m, imm forms are single-uop and fast on modern CPUs. Some older or low-power CPUs are slower for 64-bit imul.

答案3

得分: 0

诚实地说,我无法完全理解其他两个答案。对我来说,它们过于复杂了。我只需要一个愚蠢、简单和通用的规则。

我想只是选择对我有效的方法。

结果等同于将目标宽度的两个输入进行符号扩展(imul)或零扩展(mul),然后执行非扩展乘法

@Peter Cordes

要手动计算,有几种方法:您可以将两个输入符号扩展为16位,然后进行16×16乘法,仅保留低16位

@Erik Eidt

我尝试并验证了这个规则对我有效。

-1*3

符号扩展

  11111111 11111111
 x00000000 00000011
-------------------
  11111111 11111111
 111111111 1111111
-------------------
1011111111 11111101

保留低16位,我得到正确的结果11111111(ah) 11111101(al)

尝试一个4位的例子:

-2 * 31110 imul 0011

符号扩展

  1111 1110
 x0000 0011
-----------
  1111 1110
 11111 110
-----------
101111 1010

保留较低的8位,结果是1111 1010-6

之前我不确定imul中的符号扩展是如何工作的,现在我明白了,很容易验证。顺便说一下,如果您觉得某些示例的手动计算很烦琐(例如4位-7 * -11001 x 1111,符号扩展1111 1001 x 1111 1111,需要添加许多行),您可以使用Windows计算器(程序员模式)快速验证结果。

英文:

Honestly, I can't fully understand the other two answers. It's over complicated for me. I just need a dumb, simple and universal rule.

I'd like to just pick up what works for me.

> The result is equivalent to sign-extending (imul) or zero-extending
> (mul) both inputs to the destination width and then doing a
> non-widening multiply
>
> @Peter Cordes

> To manually compute there are several approaches: you can sign extend
> both inputs to 16-bits and do 16 × 16 keeping only the low 16-bits
>
> @Erik Eidt

I tried and verified and this rule works for me.

-1*3,

sign extended

  11111111 11111111
 x00000000 00000011
-------------------
  11111111 11111111
 111111111 1111111
-------------------
1011111111 11111101

keep the low 16 bits, I get the correct result 11111111(ah) 11111101(al).

Try a 4 bit example:

-2 * 3, 1110 imul 0011

sign extended

  1111 1110
 x0000 0011
-----------
  1111 1110
 11111 110
-----------
101111 1010

keep the lower 8 bits, the result is 1111 1010, -6.

Before I wasn't sure how sign extended works in imul, now I get it and it's easy to verify. Btw, if you find manual calculation tedious for some examples (e.g. 4 bit -7 * -1, 1001 x 1111, sign extended 1111 1001 x 1111 1111, many lines to add), you can use the Windows Calculator (programmer mode) and verify the result very quickly.

huangapple
  • 本文由 发表于 2023年6月1日 23:13:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76383379.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定