英文:
How can 'clmul' intrinsics be used to improve computation of CRC in RISCV?
问题
My goal is to understand and improve the computation of CRC using clmul
instructions in RISCV.
The way described in RISCV bitmanip documentation is following:
4.8 Cyclic redundancy checks (CRC)
There are special instructions for performing CRCs using the two most widespread 32-bit CRC
polynomials, CRC-32 and CRC-32C.
CRCs with other polynomials can be computed efficiently using CLMUL. The following examples
RISC-V Bitmanip Extension V0.93 83
are using CRC32Q.
The easiest way of implementing CRC32Q with clmul is using a Barrett reduction. On RV32:
uint32_t crc32q_simple(const uint32_t *data, int length) {
uint32_t P = 0x814141AB; // CRC polynomial (implicit x^32)
uint32_t mu = 0xFEFF7F62; // x^64 divided by CRC polynomial
uint32_t mu1 = 0xFF7FBFB1; // "mu" with leading 1, shifted right by 1 bit
uint32_t crc = 0;
for (int i = 0; i < length; i++) {
crc ^= rev8(data[i]);
crc = clmulr(crc, mu1);
crc = clmul(crc, P);
}
return crc;
}
I already know how constants mu
and mu1
are computed, but I still have questions:
- Why can the original internal loop be replaced with
clmul
intrinsics? This is not stated in documentation. For reference, this is a generic CRC code sample forCRC-32/AIXM
:
uint32_t crc32_bitwise_aixm(const uint8_t* data, size_t length) {
uint32_t crc = 0;
for (int i = 0; i < length; i++) {
uint8_t byte = data[i];
for (int j = 0; j < 8; j++) {
if ((crc ^ (byte << 24)) & 0x80000000)
crc = (crc << 1) ^ 0x814141ab; // Here is the polynomial from which constants mu and mu1 are computed
else
crc = crc << 1;
byte = byte << 1;
}
}
return crc;
}
-
I know how these constants are computed, but the question is, WHY are they computed and why passing them to
clmul
makes a working CRC? -
What is the
0x80000000
constant in the original CRC loop? I think it has something to do with masking a portion of numberbyte << 24
.
英文:
My goal is to understand and improve the computation of CRC using clmul
instructions in RISCV.
The way described in RISCV bitmanip documentation is following:
4.8 Cyclic redundency checks (CRC)
There are special instructions for performing CRCs using the two most widespread 32-bit CRC
polynomials, CRC-32 and CRC-32C.
CRCs with other polynomials can be computed efficiently using CLMUL. The following examples
RISC-V Bitmanip Extension V0.93 83
are using CRC32Q.
The easiest way of implementing CRC32Q with clmul is using a Barrett reduction. On RV32:
uint32_t crc32q_simple(const uint32_t *data, int length) {
uint32_t P = 0x814141AB; // CRC polynomial (implicit x^32)
uint32_t mu = 0xFEFF7F62; // x^64 divided by CRC polynomial
uint32_t mu1 = 0xFF7FBFB1; // "mu" with leading 1, shifted right by 1 bit
uint32_t crc = 0;
for (int i = 0; i < length; i++) {
crc ^= rev8(data[i]);
crc = clmulr(crc, mu1);
crc = clmul(crc, P);
}
return crc;
}
I already know how constants mu
and mu1
are computed, but I still have questions:
- Why can original internal loop be replaced with
clmul
intrinsics? This is not stated in documentation. For reference, this is a generic CRC code sample forCRC-32/AIXM
:
uint32_t crc32_bitwise_aixm(const uint8_t* data, size_t length) {
uint32_t crc = 0;
for (int i = 0; i < length; i++) {
uint8_t byte = data[i];
for (int j = 0; j < 8; j++) {
if ((crc ^ (byte << 24)) & 0x80000000)
crc = (crc << 1) ^ 0x814141ab; // Here is the polynomial from which constants mu and mu1 are computed
else
crc = crc << 1;
byte = byte << 1;
}
}
return crc;
}
-
I know how this constants are computed, but the question is, WHY are they computed and why passing them to
clmul
makes working CRC? -
What is
0x80000000
constant in original CRC loop? I think it has something to do with masking portion of numberbyte << 24
.
答案1
得分: 1
有太多内容需要在这里解释。由于您甚至不理解0x80000000
,您首先需要了解至少循环冗余校验的基本数学以及它们的实际实现。我建议参考Ross William的指南。仅回答您特定问题的简单答案是,它实现了一个线性反馈移位寄存器,向上移位,并通过异或掩码读取顶部位以反馈到其他位。但是,您还需要了解这样的移位寄存器是如何在GF(2)上实现多项式除法的,其中CRC是此类除法的余数。
一旦您理解了所有这些,那么您就准备好了解如何使用乘法而不是除法来执行CRC。您已经提到了这个方法,称为Barrett's Reduction,最初是用于整数除法,但在这里应用于多项式除法。阅读那篇文章。总的来说,您正在将多项式除法替换为多项式的倒数乘法。您的mu
是倒数。无进位乘法是多项式乘法,其中寄存器中的每个位都是GF(2)上多项式的系数。
英文:
There is too much to explain in an answer here. Since you don't even understand the 0x80000000
, you need to first understand at least the basic mathematics of cyclic redundancy check and importantly their practical implementations. I recommend Ross William's guide. The simple answer for just your specific question is that it is implementing a linear feedback shift register that shifts up, and is reading the top bit to feed back into the other bits using an exclusive-or mask. However you also need to understand how it is that such a shift register implements a polynomial division over GF(2), where a CRC is the remainder of such a division.
Once you understand all that, then you're ready for how to do a CRC using multiplication instead of division. You already named the approach, called Barrett's Reduction, originally for integer division but applied here to polynomial division. Read that post. In gross terms, you are replacing division by a polynomial with multiplication by the reciprocal of that polynomial. Your mu
is the reciprocal. Carry-less multiplication is polynomial multiplication, where each bit in a register is the coefficient of a polynomial over GF(2).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论