java – BigInteger如何将字符串转换为其内部表示?

huangapple go评论80阅读模式
英文:

java - How does BigInteger convert Strings to its internal representation?

问题

I am doing a little side project where I am delving into the implementation of some popular classes that we use in the Java Libraries.

The first on my list is BigInteger. I am interested in understanding basically the algorithm via which is converts a String into a number for internal representation.

This is the piece of code I am interested in:

// Pre-allocate array of expected size. May be too large but can
// never be too small. Typically exact.
long numBits = ((numDigits * bitsPerDigit[radix]) >>> 10) + 1;
if (numBits + 31 >= (1L << 32)) {
    reportOverflow();
}
int numWords = (int) (numBits + 31) >>> 5;
int[] magnitude = new int[numWords];

// Process first (potentially short) digit group
int firstGroupLen = numDigits % digitsPerInt[radix];
if (firstGroupLen == 0)
    firstGroupLen = digitsPerInt[radix];
String group = val.substring(cursor, cursor += firstGroupLen);
magnitude[numWords - 1] = Integer.parseInt(group, radix);
if (magnitude[numWords - 1] < 0)
    throw new NumberFormatException("Illegal digit");

// Process remaining digit groups
int superRadix = intRadix[radix];
int groupVal = 0;
while (cursor < len) {
    group = val.substring(cursor, cursor += digitsPerInt[radix]);
    groupVal = Integer.parseInt(group, radix);
    if (groupVal < 0)
        throw new NumberFormatException("Illegal digit");
    destructiveMulAdd(magnitude, superRadix, groupVal);
}
// Required for cases where the array was overallocated.
mag = trustedStripLeadingZeroInts(magnitude);
if (mag.length >= MAX_MAG_LENGTH) {
    checkRange();
}

I kind of understand the previous lines. But here is when it does not make sense as the math used is not familiar.

My questions are as follows:

  • What does bitsPerDigit signify? If you run a debugger then it comes to 3402. Is that the number of bits used in a digit in int in Java?
  • Why divide the result of (numDigits * bitsPerDigit[radix]) by 2^10 and add 1? (Right logical shift is dividing a power by 2).
  • Why add 31 to the numBits before dividing by 2^5 = 32? (I kind of understand 32 bits as that is the size of an int in Java).
  • What is destructiveMulAdd? How does it work?

Are these algorithms documented somewhere?

Can anyone please point me in the right direction?

英文:

I am doing a little side project where I am delving into the implementation of some popular classes that we use in the Java Libraries.

The first on my list is BigInteger. I am interested in understanding basically the algorithm via which is converts a String into a number for internal representation.

This is the piece of code I am interested in:

// Pre-allocate array of expected size. May be too large but can
        // never be too small. Typically exact.
        long numBits = ((numDigits * bitsPerDigit[radix]) &gt;&gt;&gt; 10) + 1;
        if (numBits + 31 &gt;= (1L &lt;&lt; 32)) {
            reportOverflow();
        }
        int numWords = (int) (numBits + 31) &gt;&gt;&gt; 5;
        int[] magnitude = new int[numWords];

        // Process first (potentially short) digit group
        int firstGroupLen = numDigits % digitsPerInt[radix];
        if (firstGroupLen == 0)
            firstGroupLen = digitsPerInt[radix];
        String group = val.substring(cursor, cursor += firstGroupLen);
        magnitude[numWords - 1] = Integer.parseInt(group, radix);
        if (magnitude[numWords - 1] &lt; 0)
            throw new NumberFormatException(&quot;Illegal digit&quot;);

        // Process remaining digit groups
        int superRadix = intRadix[radix];
        int groupVal = 0;
        while (cursor &lt; len) {
            group = val.substring(cursor, cursor += digitsPerInt[radix]);
            groupVal = Integer.parseInt(group, radix);
            if (groupVal &lt; 0)
                throw new NumberFormatException(&quot;Illegal digit&quot;);
            destructiveMulAdd(magnitude, superRadix, groupVal);
        }
        // Required for cases where the array was overallocated.
        mag = trustedStripLeadingZeroInts(magnitude);
        if (mag.length &gt;= MAX_MAG_LENGTH) {
            checkRange();
        }

I kind of understand the previous lines. But here is when it does not make sense as the math used is not familiar.

My questions are as follows:

  • What does bitsPerDigit signify ? If you run a debugger then it comes to 3402. Is that the number of bits used in a digit in int in Java ?
  • Why divide the result of (numDigits * bitsPerDigit[radix]) by 2^10 and add 1 ? (Right logical shift is dividing a power by 2).
  • Why add 31 to the numBits before dividing by 2^5 = 32? (I kind of understand 32 bits as that is the size of an int in Java).
  • What is destructiveMulAdd? How does it work?

Are these algorithms documented somewhere?

Can anyone please point me in the right direction?

答案1

得分: 2

bitsPerDigit[x] 是对 log2(x) 的定点逼近,其比例为 1024,并向上取整。给定 x = 10,我们得到大约 3402 / 1024 = 3.32226562 每位的比特数(实际 log2(10) 更接近 3.3219)。明确一下,3402 表示 3.32226562,但我们从不实际计算后者,而是隐式地进行处理。

右移 10 位考虑了比例为 1024,将乘法的结果从具有小数点后 10 位的定点格式转换为普通整数。右移会向下取整,加 1 会进行补偿,以确保结果永远不会太低,也不会为零。

从比特数到字数的转换,(numBits + 31) >>> 5,只是将其除以 32 并向上取整。

在每一步中,我们必须向上取整,以避免分配太小的数组,但尺寸估计不需要非常精确:稍微高估只会导致内部数组中的一点点浪费空间。

英文:

bitsPerDigit[x] is a fixed-point approximation of log2(x), with a scale of 1024, and rounded up. Given that x = 10, we get an estimate of approximately 3402 / 1024 = 3.32226562 bits per digit (the actual log2(10) is closer to 3.3219). To be clear, 3402 represents 3.32226562, but we never actually calculate the latter, it is worked with implicitly.

The right-shift by 10 accounts for the scale of 1024, converting the result of the multiplication from a fixed-point format with 10 bits after the radix point to just a plain old integer. The right-shift rounds down, adding 1 compensates for that in a way that ensures the result is never too low, and never zero.

The conversion from the number of bits to the number of words, (numBits + 31) &gt;&gt;&gt; 5, is just a division by 32 rounded up.

At every step we must round up to avoid allocating a too small array, but the size estimate does not need to be extremely accurate: estimating it a little too high only results in a little bit of wasted space in the internal array.

huangapple
  • 本文由 发表于 2023年6月5日 21:18:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76406817.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定