如何使用 ‘clmul’ 内置函数来改善在 RISCV 中计算 CRC 的过程?

huangapple go评论190阅读模式
英文:

How can 'clmul' intrinsics be used to improve computation of CRC in RISCV?

问题

My goal is to understand and improve the computation of CRC using clmul instructions in RISCV.
The way described in RISCV bitmanip documentation is following:

4.8 Cyclic redundancy checks (CRC)

There are special instructions for performing CRCs using the two most widespread 32-bit CRC
polynomials, CRC-32 and CRC-32C.
CRCs with other polynomials can be computed efficiently using CLMUL. The following examples
RISC-V Bitmanip Extension V0.93 83
are using CRC32Q.
The easiest way of implementing CRC32Q with clmul is using a Barrett reduction. On RV32:

uint32_t crc32q_simple(const uint32_t *data, int length) {
    uint32_t P = 0x814141AB;    // CRC polynomial (implicit x^32)
    uint32_t mu = 0xFEFF7F62;   // x^64 divided by CRC polynomial
    uint32_t mu1 = 0xFF7FBFB1;  // "mu" with leading 1, shifted right by 1 bit
    uint32_t crc = 0;
    for (int i = 0; i < length; i++) {
        crc ^= rev8(data[i]);
        crc = clmulr(crc, mu1);
        crc = clmul(crc, P);
    }
    return crc;
}

I already know how constants mu and mu1 are computed, but I still have questions:

  • Why can the original internal loop be replaced with clmul intrinsics? This is not stated in documentation. For reference, this is a generic CRC code sample for CRC-32/AIXM:
uint32_t crc32_bitwise_aixm(const uint8_t* data, size_t length) {   
  uint32_t crc = 0;
  for (int i = 0; i < length; i++) {
    uint8_t byte = data[i];
    for (int j = 0; j < 8; j++) {
      if ((crc ^ (byte << 24)) & 0x80000000)
        crc = (crc << 1) ^ 0x814141ab; // Here is the polynomial from which constants mu and mu1 are computed
      else
        crc = crc << 1;
      byte = byte << 1;
    }
  }
  return crc;
}
  • I know how these constants are computed, but the question is, WHY are they computed and why passing them to clmul makes a working CRC?

  • What is the 0x80000000 constant in the original CRC loop? I think it has something to do with masking a portion of number byte << 24.

英文:

My goal is to understand and improve the computation of CRC using clmul instructions in RISCV.
The way described in RISCV bitmanip documentation is following:

4.8 Cyclic redundency checks (CRC)

There are special instructions for performing CRCs using the two most widespread 32-bit CRC
polynomials, CRC-32 and CRC-32C.
CRCs with other polynomials can be computed efficiently using CLMUL. The following examples
RISC-V Bitmanip Extension V0.93 83
are using CRC32Q.
The easiest way of implementing CRC32Q with clmul is using a Barrett reduction. On RV32:

uint32_t crc32q_simple(const uint32_t *data, int length) {
    uint32_t P = 0x814141AB;    // CRC polynomial (implicit x^32)
    uint32_t mu = 0xFEFF7F62;   // x^64 divided by CRC polynomial
    uint32_t mu1 = 0xFF7FBFB1;  // &quot;mu&quot; with leading 1, shifted right by 1 bit
    uint32_t crc = 0;
    for (int i = 0; i &lt; length; i++) {
        crc ^= rev8(data[i]);
        crc = clmulr(crc, mu1);
        crc = clmul(crc, P);
    }
    return crc;
}

I already know how constants mu and mu1 are computed, but I still have questions:

  • Why can original internal loop be replaced with clmul intrinsics? This is not stated in documentation. For reference, this is a generic CRC code sample for CRC-32/AIXM:
uint32_t crc32_bitwise_aixm(const uint8_t* data, size_t length) {   
  uint32_t crc = 0;
  for (int i = 0; i &lt; length; i++) {
    uint8_t byte = data[i];
    for (int j = 0; j &lt; 8; j++) {
      if ((crc ^ (byte &lt;&lt; 24)) &amp; 0x80000000)
        crc = (crc &lt;&lt; 1) ^ 0x814141ab; // Here is the polynomial from which constants mu and mu1 are computed
      else
        crc = crc &lt;&lt; 1;
      byte = byte &lt;&lt; 1;
    }
  }
  return crc;
}
  • I know how this constants are computed, but the question is, WHY are they computed and why passing them to clmul makes working CRC?

  • What is 0x80000000 constant in original CRC loop? I think it has something to do with masking portion of number byte &lt;&lt; 24.

答案1

得分: 1

有太多内容需要在这里解释。由于您甚至不理解0x80000000,您首先需要了解至少循环冗余校验的基本数学以及它们的实际实现。我建议参考Ross William的指南。仅回答您特定问题的简单答案是,它实现了一个线性反馈移位寄存器,向上移位,并通过异或掩码读取顶部位以反馈到其他位。但是,您还需要了解这样的移位寄存器是如何在GF(2)上实现多项式除法的,其中CRC是此类除法的余数。

一旦您理解了所有这些,那么您就准备好了解如何使用乘法而不是除法来执行CRC。您已经提到了这个方法,称为Barrett's Reduction,最初是用于整数除法,但在这里应用于多项式除法。阅读那篇文章。总的来说,您正在将多项式除法替换为多项式的倒数乘法。您的mu是倒数。无进位乘法是多项式乘法,其中寄存器中的每个位都是GF(2)上多项式的系数。

英文:

There is too much to explain in an answer here. Since you don't even understand the 0x80000000, you need to first understand at least the basic mathematics of cyclic redundancy check and importantly their practical implementations. I recommend Ross William's guide. The simple answer for just your specific question is that it is implementing a linear feedback shift register that shifts up, and is reading the top bit to feed back into the other bits using an exclusive-or mask. However you also need to understand how it is that such a shift register implements a polynomial division over GF(2), where a CRC is the remainder of such a division.

Once you understand all that, then you're ready for how to do a CRC using multiplication instead of division. You already named the approach, called Barrett's Reduction, originally for integer division but applied here to polynomial division. Read that post. In gross terms, you are replacing division by a polynomial with multiplication by the reciprocal of that polynomial. Your mu is the reciprocal. Carry-less multiplication is polynomial multiplication, where each bit in a register is the coefficient of a polynomial over GF(2).

huangapple
  • 本文由 发表于 2023年5月28日 22:17:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76351936.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定