Multiplying two 64 bits numbers in RISC-V assembly.

huangapple go评论62阅读模式
英文:

Multiplying two 64 bits numbers in RISC-V assembly

问题

I can help with the translation:

我对如何在Risc-v汇编中相乘两个64位数感到困惑。我想加载64位的数字,将它们相乘,然后将结果存储在一个变量中并打印出结果。

据我理解,你需要将这些数字加载到寄存器中,然后使用mul和mulh指令分别存储上半部分和下半部分,最后将结果存储回结果变量中。以下是我对mul和mulh的理解:

mul x12, x10, x11 
mulh x13, x10, x11

这两行代码将x10和x11的值相乘,分别存储上半部分和下半部分。你可能感到困惑的地方在于如何将64位数加载到x10和x11中。你可以尝试使用ld指令加载x10,0(number1),然后对x11执行相同操作,但当将x10和x11相乘时,可能会得到不正确的结果。你可以尝试以下方式来解决这个问题。

英文:

I was confused on how I would go about multiplying two 64 bits with each other in Risc-v assembly. I want to load numbers that are 64 bits, multiply them, and store it back in a variable and print the result.

From what I understand you have to load the variables into a register, use mul and mulh to store upper and lower and then store it back in the result variable. This is what I understand about the mul and mulh:

mul x12, x10, x11
mulh x13, x10, x11

I take the x10 and x11 and multiply them storing upper and lower bits. I was confused in how I would load the 64 bits into x10 and x11. I tried using ld x10, 0(number1) and doing the same with x11 but it doesn't give correct results when multiplying x10 and x11. how would I go about doing this in Risc-v

答案1

得分: 2

数学上来说,64x64的乘法结果是一个128位的答案,保证不会溢出(任何更小的结果都有可能在某些输入时溢出)。这被称为扩展乘法。

正如Peter所说,如果你要将结果限制回64位并假设没有溢出,你可以简化计算。

如果你在一个32位机器上,64位值将占用两个寄存器。

对于算术运算,有两个操作数,因此每个操作数都需要一对寄存器,总共需要4个寄存器来存储源操作数。

对于每个64位操作数,需要加载高位和低位。根据你的存储格式,但如果你使用小端序,如RISC V上所期望的,那么低位先存储,高位则在内存中隔4个字节。如果你已经将这些值存储在寄存器中,那就更好,只要清楚哪个是高位,哪个是低位。

RISC V没有双寄存器加载(或存储),所以你可以选择寄存器编号,例如,一个对的寄存器可以是x10、x11,或者x11、x10,高位/低位的顺序可以相反,完全取决于你,但你甚至不需要使用连续的寄存器编号来表示寄存器对(尽管这是预期的,对程序员来说,这样做会减少认知负荷)。

然后,你需要像长乘法一样将这两对数相乘,将较短的乘法相加得到完整的答案。

让我们意识到在32位机器上,我们有32x32 => 64位的答案。因此,我们将64x64 => 128位的乘法拆分为几个32x32 => 64位的答案,然后进行适当的缩放相加。

假设一个64位数是AB,其中A是高位的32位,B是低位的32位,而另一个是CD(同样是两个32位的部分)。

AB x CD =
(A x C << 64) + (A x D << 32) +
(B x C << 32) +
(B x D)

上面的每个单独的乘法需要一个mulmulh(如果是无符号操作数则是mulhu),产生64位的答案。这些64位的答案(共4个)需要有效地进行移位,然后相加(移位要么是无,要么是32,要么是64,因此实际上不需要移位,只需在正确的位置使用正确的寄存器)。

相加必须考虑进位,因此需要一些额外的指令,但它们很简单。

一个C编译器(例如godbolt)在正确的配置下可以显示基本原理,尽管由于标准C不支持扩展算术,无法显示真正的64x64 => 128形式,尽管可能有一些编译器(例如gcc)的扩展/内建函数支持扩展形式。

英文:

Mathematically speaking, a 64&times;64 multiplication results in a 128-bit answer with guaranteed no overflow (anything less risks overflow for some inputs).&nbsp; This is called widening multiplication.

As Peter says, though, if you're going to limit your result back to 64 bits and assume no overflow, you can simplify the calculations.


If you're on a 32-bit machine, 64-bit values will occupy two registers.

For arithmetic, there's two operands, &#8212; so, a pair of registers for each operand, for a total of 4 registers just for the source operands.

For each 64-bit operand, need to load high order and low order.&nbsp; It will be up to your storage format, but if you're using little endian as would be expected on RISC V, then the low order comes first and the high order 4 bytes further on, when stored in memory.&nbsp; If you already have the values in registers, so much the better, just be clear about which is high order and which is low order.

RISC V has no double register loads (or stores) so, you get to pick register numbers, which could be, for example, x10,x11 for one pair or x11,x10 for the same pair, with reversed high/low order, totally up to you, but you certainly don't even need to use consecutive register numbers for the register pair (though that would be expected and reduce cognitive load for programmers).


You then need to multiply the two pairs as in long multiplication, which sums shorter multiplications to make the full answer.

Let's be aware then that on 32-bit machine, we have 32&times;32 => 64-bit answer.&nbsp; So, we break down the 64&times;64 => 128 multiplication into several 32&times;32 => 64-bit answers to be summed with appropriate scaling.

Let's say that one 64-bit number is AB, with A the high order 32 bits and B the low order 32 bits, while the other is CD (similarly two 32-bit halves).

AB &times; CD =
(A &times; C << 64) + (A &times; D << 32) +
(B &times; C << 32) +
(B &times; D)

Each separate multiplication in the above requires a mul and mulh (mulhu if unsigned operands), producing 64-bit answers.&nbsp; These 64-bit answers (4 of them) need to be effectively shifted then summed (the shifting is either none, 32 or 64, so no actual shifting needed, just using the right registers in the right places).

The summing must take place with carry taken into account, so that adds a few more instructions, but they are simple.

A C compiler (e.g. godbolt) in the right configuration can show the basics, though since standard C doesn't support widening arithmetic, can't show the true 64&times;64 => 128 form, though there may be some compiler (e.g. gcc) extensions/builtins that support the widening forms.

huangapple
  • 本文由 发表于 2023年4月11日 05:38:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/75980932.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定