2023年6月1日 23:13:18go评论130阅读模式

英文:

How do you manually calculate imul -1 * 3?

问题

我正在学习x86-64汇编语言，参加了一门在线课程，但是课程细节非常不清晰。我在网上搜索并阅读了几个Stack Overflow问题，但没有找到答案。

我尝试手工计算二进制乘法，但卡在了imul。

以二进制乘法示例11111111 * 00000011为例，它可以视为无符号的255 * 3或有符号的-1 * 3。

255*3

   mov al, 255
   mov bl, 3
   mul bl

这很简单，以下是我手工计算的方法，就像十进制乘法一样：

     11111111
x    00000011 
--------------
     11111111       
    11111111
--------------
   1011111101

结果溢出，上半部分是ah中的10，下半部分是al中的11111101。我的手工计算与程序结果相符。

-1*3

当涉及到有符号数时，

   mov al, -1
   mov bl, 3
   imul bl

程序的结果是ah中的11111111和al中的11111101。

我该如何手工计算这个结果？我被告知imul中涉及符号扩展，但我真的不知道它在这里是如何工作的。

我正在使用SASM IDE和NASM汇编器。

英文:

I am learning x86-64 assembly with an online course and the course is horribly unclear in detail. I've searched online and read several SO questions but couldn't get an answer.

I tried to figure out how to calculate binary multiplication by hand, but I am stuck with imul .

Given this example of binary multiplication, 11111111 * 00000011, it can be viewed as 255 * 3 unsigned or -1 * 3 signed.

255*3

   mov al, 255
   mov bl, 3
   mul bl

this is easy and here's how I calculate by hand, just like decimal multiplication:

     11111111
x    00000011 
--------------
     11111111       
    11111111
--------------
   1011111101

The result overflows, the upper half is 10 in ah and the lower half is 11111101 in al. My manual calculation matches the program result.

-1*3

when it comes to signed,

   mov al, -1
   mov bl, 3
   imul bl

the program result is 11111111 in ah and 11111101 in al.

How can I calculate this result by hand? I was told that sign extension is involved in imul, but I really don't know how it works here.

I am using SASM IDE and NASM Assembler.

答案1

得分: 2

技术上，你使用了长乘法的捷径。虽然这个捷径对于获取无符号位模式来说是相当合理的，但它在概念上忽略了一些内容。

无符号乘法的全面计算如下：

             11111111
        x    00000011
        --------------
     0000000011111111
    +000000011111111 
    +00000000000000 
    +0000000000000 
    +000000000000 
    +00000000000 
    +0000000000 
    +000000000 
    ------------------   
    00000001011111101  = 255 x 3 = 765

这些相加的结果是17位，尽管你只会从mul得到16位，因为输入只有8位宽度，所以不会发生进位或无符号溢出。

在上面的计算中，我添加了前导零，但将尾随零留为空格，就像你的原文一样。

而在有符号算术中：

             11111111
        x    00000011
        --------------
     1111111111111111
    +111111111111111 
    +00000000000000 
    +0000000000000 
    +000000000000 
    +00000000000 
    +0000000000 
    -000000000         // 注意，因为MSB的位置值为-2^7，所以是减法
    ------------------   
    11111111111111101  = -1 x 3 = -3，这个结果也是17位
这些相加的结果同样是17位，尽管你只会从`imul`得到16位，因为输入只有8位宽度，所以不会发生有符号溢出。
因此，我们通过将乘数（第一个操作数）的符号扩展为16位部分积来考虑它的符号。
我们通过减去而不是加上最高有效位的部分积来考虑乘法器（第二个操作数）的符号。这是因为这个部分积是通过乘以0或`-2^7`得到的，这个输入的符号位具有-2^7的位置值。对于无符号数来说，所有的位置值都是正的，但n位的[二进制补码][1]是通过给最高位赋予-2^(n-1)的位置值，而不是+2^(n-1)。例如，在8位二进制补码（`int8_t`）中，`0x80`是-128的位模式，但在无符号数（`uint8_t`）中是+128。
或者，我们可以对这个输入进行符号扩展，并进行16个部分积的计算，而不是8个，但这需要更多的工作。
[1]: https://en.wikipedia.org/wiki/Two%27s_complement

英文:

Technically, you took a short cut with the long multiplication.  Though the short cut is quite reasonable for obtaining the unsigned bit pattern, it glosses over something conceptually.

     11111111
x    00000011 
--------------
     11111111
    11111111
--------------
   1011111101

Done out fully we have for unsigned:

         11111111
    x    00000011
    --------------
 0000000011111111
+000000011111111 
+00000000000000 
+0000000000000 
+000000000000 
+00000000000 
+0000000000 
+000000000 
------------------   
00000001011111101  = 255 x 3 = 765,

The summation of those being 17 bits, though you would only get 16 from mul, and carry/unsigned-overflow cannot happen because the inputs are only 8 bits wide.

In the above, have added leading zeros though left the trailing zeros as blanks the way you have it.

Whereas in signed arithmetic:

         11111111
    x    00000011
    --------------
 1111111111111111
+111111111111111 
+00000000000000 
+0000000000000 
+000000000000 
+00000000000 
+0000000000 
-000000000         // note, subtracting because MSB has place value -2^7
------------------   
11111111111111101  = -1 x 3 = -3, the summation being 17 bits

The summation of those being 17 bits, though you would only get 16 from imul, and signed overflow cannot happen because the inputs are only 8 bits wide.

So we account for the sign of the multiplicand (first operand) by sign-extending it into 16-bit partial products.

We account for the sign of the multiplier (second operand) by subtracting instead of adding the most-significant partial product. Because that's from multiplying by 0 or -2^7, the place value of the sign bit in that input. For unsigned, all the place values are positive, but n-bit 2's complement works by giving the MSB a place value of -2^(n-1) instead of +2^(n-1). e.g. 0x80 is the bit-pattern for -128 in 8-bit 2's complement (int8_t), but +128 as unsigned (uint8_t).

Alternatively, we could sign-extend that input and do 16 partial products instead of 8, but that's more work.

答案2

得分: 2

以下是翻译好的部分：

"n x n => 2n" 乘积的低位 "n" 位与有符号与无符号无关。但高位则有关。结果等同于将目标宽度的两个输入都进行符号扩展（imul）或零扩展（mul），然后进行非扩展乘法，尽管实际实现当然会更有效率。有关二进制表示，请参阅Erik的答案。

如果我只想手动计算imul会产生什么结果，我会做-1 * 3 = -3，然后计算 -3 作为2的补码16位整数的位模式。由于这是imul，我知道我必须将8位位模式解释为有符号的-1，而不是像mul那样无符号的0xff。

除非我正在检查某些位操作，使用乘法来移动和添加位字段（例如https://stackoverflow.com/questions/76139951/popcount-assembly-sum-indexes-of-set-bits/76144619#76144619），否则我不会手动使用二进制（即使在那种情况下，我实际上也会考虑十六进制或字节）。当然，了解如何以这种方式书写是很好的，但如果你愿意，也可以这样做。

在C语言中，imul r/m8 是 int16_t ax = (int16_t)(int8_t)al * (int16_t)(int8_t)src;。（C语言很奇怪，因为像*这样的运算符已经将窄输入提升为int，而int至少与int16_t一样宽。）

在汇编语言中，movsx ax, al（或 cbw）；movsx cx, byte src；imul ax, cx，但不破坏CX。（如果你愿意，可以想象一个隐藏的临时寄存器。）

你很少需要扩展的mul或imul，因为它们对于16位或更宽的输入需要多于1个uop（因为它们必须将乘积的两半写入多个寄存器，如EDX:EAX，而不仅仅是RAX的低16位）。有趣的是，8位的imul bl 在现代CPU上运行为单个uop。请参见 https://uops.info/ 和 https://agner.org/optimize/

但 imul bl 只将其结果扩展到16位，没有扩展到方便的32位，因此你更可能希望将输入扩展到32位，以进行 imul ecx, ebx 等操作，因为在x86-64上，32位是大多数情况下最有效的操作数大小。
imul r, r/m 和 imul r, r/m, imm 形式在现代CPU上是单个uop且快速的。一些较旧或低功耗的CPU对于64位的imul 较慢。

英文:

The low n bits of an n x n => 2n product don't depend on signed vs. unsigned. But the high half does. The result is equivalent to sign-extending (imul) or zero-extending (mul) both inputs to the destination width and then doing a non-widening multiply, although of course the actual implementation would do something more efficient. See Erik's answer for what that looks like in terms of bits.

If I just wanted to manually work out what imul would produce, I'd do -1 * 3 = -3, and then work out the bit-pattern for -3 as a 2's complement 16-bit integer. Since it's imul, I know I have to interpret the 8-bit bit-patterns as signed -1 not unsigned 0xff like mul would.

I wouldn't use binary by hand unless I was checking some bithack that was using multiplies to shift around and add bit-fields (like https://stackoverflow.com/questions/76139951/popcount-assembly-sum-indexes-of-set-bits/76144619#76144619 but even then I was actually thinking in hex or bytes.) It is of course good to understand how you could write it down that way if you wanted to, though.

In C terms, imul r/m8 is int16_t ax = (int16_t)(int8_t)al * (int16_t)(int8_t)src;. (C is weird because operators like * already promote narrow inputs to int, and int is at least as wide as int16_t.)

In asm terms, movsx ax, al (or cbw); movsx cx, byte src ; imul ax, cx, but without destroying CX. (Imagine a hidden temporary register if you want.)

You rarely actually want widening mul or imul since they run as more than 1 uop for 16-bit or wider inputs (since they have to write halves of the product to multiple registers like EDX:EAX, not just the low 16 bits of RAX). Interestingly, 8-bit imul bl runs as a single uop on modern CPUs. See https://uops.info/ and https://agner.org/optimize/

But imul bl only widens its result to 16-bit, not all the way to a convenient 32-bit, so you'd more often want to have your inputs widened to 32-bit for imul ecx, ebx or something, since 32-bit is the most efficient operand-size most of the time on x86-64.
The imul r, r/m and imul r, r/m, imm forms are single-uop and fast on modern CPUs. Some older or low-power CPUs are slower for 64-bit imul.

答案3

得分: 0

诚实地说，我无法完全理解其他两个答案。对我来说，它们过于复杂了。我只需要一个愚蠢、简单和通用的规则。

我想只是选择对我有效的方法。

结果等同于将目标宽度的两个输入进行符号扩展（imul）或零扩展（mul），然后执行非扩展乘法

@Peter Cordes

要手动计算，有几种方法：您可以将两个输入符号扩展为16位，然后进行16×16乘法，仅保留低16位

@Erik Eidt

我尝试并验证了这个规则对我有效。

-1*3，

符号扩展

  11111111 11111111
 x00000000 00000011
-------------------
  11111111 11111111
 111111111 1111111
-------------------
1011111111 11111101

保留低16位，我得到正确的结果11111111(ah) 11111101(al)。

尝试一个4位的例子：

-2 * 3，1110 imul 0011

符号扩展

  1111 1110
 x0000 0011
-----------
  1111 1110
 11111 110
-----------
101111 1010

保留较低的8位，结果是1111 1010，-6。

之前我不确定imul中的符号扩展是如何工作的，现在我明白了，很容易验证。顺便说一下，如果您觉得某些示例的手动计算很烦琐（例如4位-7 * -1，1001 x 1111，符号扩展1111 1001 x 1111 1111，需要添加许多行），您可以使用Windows计算器（程序员模式）快速验证结果。

英文:

Honestly, I can't fully understand the other two answers. It's over complicated for me. I just need a dumb, simple and universal rule.

I'd like to just pick up what works for me.

> The result is equivalent to sign-extending (imul) or zero-extending
> (mul) both inputs to the destination width and then doing a
> non-widening multiply
>
> @Peter Cordes

> To manually compute there are several approaches: you can sign extend
> both inputs to 16-bits and do 16 × 16 keeping only the low 16-bits
>
> @Erik Eidt

I tried and verified and this rule works for me.

-1*3,

sign extended

  11111111 11111111
 x00000000 00000011
-------------------
  11111111 11111111
 111111111 1111111
-------------------
1011111111 11111101

keep the low 16 bits, I get the correct result 11111111(ah) 11111101(al).

Try a 4 bit example:

-2 * 3, 1110 imul 0011

sign extended

  1111 1110
 x0000 0011
-----------
  1111 1110
 11111 110
-----------
101111 1010

keep the lower 8 bits, the result is 1111 1010, -6.

Before I wasn't sure how sign extended works in imul, now I get it and it's easy to verify. Btw, if you find manual calculation tedious for some examples (e.g. 4 bit -7 * -1, 1001 x 1111, sign extended 1111 1001 x 1111 1111, many lines to add), you can use the Windows Calculator (programmer mode) and verify the result very quickly.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

uoy erA woH ycancalC lunami -1 * 3?

问题

答案1

答案2

答案3

需要保存在x86-64体系结构中，使用C函数返回时的上下文有哪些？

ASSEMBLY 8086 EXERCISE UNIVERSITY 汇编 8086 练习大学

启动加载程序在QEMU上运行正常，但在实际机器上无法工作。

x64 NASM Assembly 打印从函数返回的单个字符。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。