`rand.Intn`函数的内部工作原理 – Go语言

huangapple go评论70阅读模式
英文:

Inner workings of `rand.Intn` function - GoLang

问题

这段代码是关于Go语言中实现随机函数的源代码。下面是对代码的解释:

首先,randomFormat函数定义了一个字符串数组formats,包含了一些格式化字符串。然后,通过调用rand.Intn(len(formats))来生成一个随机数,该随机数作为索引用于选择一个格式化字符串,并将其作为结果返回。

接下来,Intn函数是Rand结构体的一个方法,用于生成一个介于0和n之间的伪随机数。首先,它会检查n的值是否小于等于0,如果是,则会引发一个错误。然后,它会检查n是否小于等于2^31-1,如果是,则调用Int31n方法生成一个int32类型的随机数并返回。否则,它会调用Int63n方法生成一个int64类型的随机数并返回。

Int31n函数是Rand结构体的另一个方法,用于生成一个介于0和n之间的非负伪随机数。首先,它会检查n的值是否小于等于0,如果是,则会引发一个错误。然后,它会检查n是否是2的幂,如果是,则通过位运算将生成的随机数与(n-1)进行按位与操作,并返回结果。否则,它会计算一个最大值max,然后生成一个随机数v,如果v大于max,则继续生成随机数,直到v小于等于max为止。最后,返回v对n取模的结果。

Int63n函数与Int31n函数类似,只是它生成的是int64类型的随机数。

最后,还定义了一些其他辅助函数和接口。

希望这些解释对你有帮助!如果还有其他问题,请随时提问。

英文:

Somehow, I happened to look at source code for Go on how it implements Random function when passed a length of array.

Here's the calling code

func randomFormat() string {
	formats := []string{
		"Hi, %v. Welcome!",
		"Great to see you, %v!",
		"Hail, %v! Well met!",
	}
	return formats[rand.Intn(len(formats))]
}

Go Source code: main part

func (r *Rand) Intn(n int) int {
	if n <= 0 {
		panic("invalid argument to Intn")
	}
	if n <= 1<<31-1 {
		return int(r.Int31n(int32(n)))
	}
	return int(r.Int63n(int64(n)))
}

Go Source code: reference part - Most of devs have this already on their machines or go repo.

// Int31n returns, as an int32, a non-negative pseudo-random number in [0,n).
// It panics if n <= 0.
func (r *Rand) Int31n(n int32) int32 {
	if n <= 0 {
		panic("invalid argument to Int31n")
	}
	if n&(n-1) == 0 { // n is power of two, can mask
		return r.Int31() & (n - 1)
	}
	max := int32((1 << 31) - 1 - (1<<31)%uint32(n))
	v := r.Int31()
	for v > max {
		v = r.Int31()
	}
	return v % n
}
// It panics if n <= 0.
func (r *Rand) Int63n(n int64) int64 {
	if n <= 0 {
		panic("invalid argument to Int63n")
	}
	if n&(n-1) == 0 { // n is power of two, can mask
		return r.Int63() & (n - 1)
	}
	max := int64((1 << 63) - 1 - (1<<63)%uint64(n))
	v := r.Int63()
	for v > max {
		v = r.Int63()
	}
	return v % n
}
func (r *Rand) Int31() int32 { return int32(r.Int63() >> 32) }
func (r *Rand) Int63() int64 { return r.src.Int63() }

type Source interface {
	Int63() int64
	Seed(seed int64)
}

I want to understand how the random function works encapsulating all inner functions. I am overwhelmed by the code and if someone has to plan the steps out in plain English what would those be?

For example, I don't get the logic for doing minus 1 in

if n <= 1<<31-1

Then, I don't get any of the head or toe of Int31n function

  if n&(n-1) == 0 { // n is power of two, can mask
        return r.Int31() & (n - 1)
    }
    max := int32((1 << 31) - 1 - (1<<31)%uint32(n))
    v := r.Int31()
    for v > max {
        v = r.Int31()
    }
    return v % n

答案1

得分: 5

这更多是关于算法而不是关于Go的问题,但其中涉及了一些Go的部分。无论如何,我将从算法问题开始。

缩小均匀随机数生成器的范围

假设我们有一个均匀分布的随机数生成器,它返回一个介于0和7之间(包括0和7)的数字。也就是说,随着时间的推移,它会返回大约相同数量的0、1、2、...、7,但它们之间没有明显的模式。

现在,如果我们想要一个介于0和7之间的均匀分布的随机数,这个生成器就很完美。我们只需使用它即可。但是,如果我们想要一个介于0和6之间的均匀分布的随机数呢?

我们可以这样写:

func randMod7() int {
    return generate() % 7
}

这样,如果generate()返回7(它有1/8的概率这样做),我们将将该值转换为零。但是这样,我们将2/8的时间得到零,而不是1/8的时间。我们将以平均每个实际零一次和每个7一次的频率得到1、2、3、4、5和6,以及2次零。

因此,我们需要丢弃任何出现的7:

func randMod7() int {
    for {
        if i := generate() < 7 {
            return i
        }
        // 哎呀,得到了7,再试一次
    }
}

现在,如果我们有一个名为generate()的均匀随机数生成器,它返回一个介于0和(比如)11之间(共12个可能的值),而我们想要一个介于0和3之间(共4个可能的值)的值,我们可以使用generate() % 4,因为12个可能的结果将等概率地分为3组四个数。如果我们想要一个介于0和5之间(包括5)的值,我们可以使用generate() % 6,因为12个可能的结果将等概率地分为两组六个数。实际上,我们只需要检查均匀数生成器的范围的质因数分解,看看哪些模数适用。12的质因数是2、2、3,所以2、3、4和6在这里都适用。任何其他模数,比如generate() % 10,都会产生有偏差的结果:0和1出现2/12的时间,而2到9出现1/12的时间。(注意:generate() % 12也可以工作,但没有什么意义。)

在我们的特定情况中,我们有两个不同的均匀随机数生成器可用。一个是Int31(),它产生介于0和0x7fffffff(2147483647十进制,或231 - 1,或1<<31 - 1)之间的值。另一个是Int63(),它产生介于0和0x7fffffffffffffff(9223372036854775807,或263 - 1,或1<<63 - 1)之间的值。这些范围可以容纳231和263个值,因此它们的质因数分解是31个2或63个2。

这意味着我们可以计算Int31() mod 2^k,对于任何在0到31之间的整数k,而不会破坏我们的均匀性。对于Int63(),我们可以使用k范围从0到63。

引入计算机

现在,从数学和计算机的角度来看,对于任何非负整数n(在[0..0x7ffffff]或[0..0x7fffffffffffffff]范围内),以及一个在正确范围内的非负整数k(不超过31或63),计算整数n mod 2^k的结果与计算该整数并进行k位的位掩码操作产生的结果是相同的。为了获得设置的位数,我们想要取1<<k并减去1。如果k是4,我们得到1<<4或16。减去1,我们得到15,或0xf,其中有四个1位。

因此:

n % (1 << k)

和:

n & (1<<k - 1)

产生相同的结果。具体来说,当k==4时,这是n%16n&0xf。当k==5时,这是n%32n&0x1f。尝试一下k==0k==63

引入Go语言

现在,我们准备在Go中考虑所有这些。我们注意到,int(普通的、未修饰的int)保证能够容纳-2147483648到+2147483647(-0x80000000到+0x7fffffff)之间的值。它可能扩展到-0x8000000000000000到+0x7ffffffffffffff。

与此同时,int32始终处理较小的范围,而int64始终处理较大的范围。普通的int是这两者中的一个不同的类型,但实现了其中一个的相同范围。我们只是不知道是哪一个。

我们的Int31实现返回一个在0..0x7ffffff范围内的均匀分布的随机数。(它通过返回r.Int63()的高32位来实现这一点,尽管这是一个实现细节。)我们的Int63实现返回一个在0..0x7ffffffffffffff范围内的均匀分布的随机数。

你在这里展示的Intn函数:

func (r *Rand) Intn(n int) int {
    if n <= 0 {
        panic("invalid argument to Intn")
    }
    if n <= 1<<31-1 {
        return int(r.Int31n(int32(n)))
    }
    return int(r.Int63n(int64(n)))
}

根据n的值选择其中一个函数:如果n小于或等于0x7fffffff1<<31 - 1),结果适合int32,因此它使用int32(n)n转换为int32,调用r.Int31n,然后将结果转换回int。否则,n的值超过了0x7fffffff,意味着int具有更大的范围,我们必须使用更大范围的生成器r.Int63n。其余部分与类型相关的代码相同。

代码可以每次都执行:

return int(r.Int63n(int64(n)))

但在64位机器上,64位算术可能会很慢。(这里有很多可能可能,如果你现在自己编写这段代码,你应该从性能分析/基准测试代码开始。Go的作者确实做过这个,尽管这是很多年前的事情;在那个时候,这种花哨的东西是值得的。)

更多位操作

Int31nInt63n函数的内部非常相似;主要区别在于类型,然后在一些地方是最大值。再次,这至少部分是历史原因:在一些(现在大多数已经过时的)计算机上,Int63n变体比Int32n变体慢得多。(在某些非Go语言中,我们可以将它们写成泛型,然后让编译器自动生成特定类型的版本。)因此,让我们只看看Int63变体:

func (r *Rand) Int63n(n int64) int64 {
    if n <= 0 {
        panic("invalid argument to Int63n")
    }
    if n&(n-1) == 0 { // n是2的幂,可以进行位掩码
        return r.Int63() & (n - 1)
    }
    max := int64((1 << 63) - 1 - (1<<63)%uint64(n))
    v := r.Int63()
    for v > max {
        v = r.Int63()
    }
    return v % n
}

参数n的类型是int64,以确保其值不超过263-1或0x7fffffffffffffff或9223372036854775807。但它可能是负数,而负数不会正常工作,因此我们首先测试它是否为负数,如果是,则引发panic。如果输入为零,我们也会引发panic(这是一种选择,但现在提前注意它是有用的)。

接下来,我们有n&(n-1) == 0的测试。这是一个测试是否为2的幂的测试,有一个小缺陷,它在许多语言(具有位掩码的语言)中起作用:

  • 2的幂总是在数字的二进制表示中表示为单个设置位。例如,2本身是000000012,4是000000102,8是000001002,依此类推,直到128是100000002。(由于我只“画”了八位,所以这个系列在128处达到最大值。)

  • 从该数字中减去1会导致借位:该位变为零,所有较低的位变为1。例如,100000002 - 1是011111112

  • 将这两个数字进行AND运算,如果最初只有一个位设置,则结果为零。如果不是这样——例如,如果我们有初始值130或100000102,减去1会产生100000012——在两个输入中都设置了最高位,因此在AND运算的结果中也设置了最高位。

这个小缺陷是,如果初始值零,那么我们有0-1,它产生全1;0&0xffffffffffffffff也是零,但零不是整数的2的幂。(20是1,而不是0。)这个小缺陷对我们的目的来说并不重要,因为我们已经确保为这种情况引发了panic:它只是不会发生。

现在我们有了最复杂的一行代码:

    max := int64((1 << 63) - 1 - (1<<63)%uint64(n))

这里反复出现的63是因为我们的值范围从零到263-1。1<<63 - 1仍然是9223372036854775807或0x7fffffffffffffff。同时,1<<63,没有减去1,是9223372036854775808或0x8000000000000000这个值不适合int64,但它适合uint64。因此,如果我们将n转换为uint64,我们可以计算uint64(9223372036854775808) % uint64(n),这就是%表达式的作用。通过在这个计算中使用uint64,我们确保它不会溢出。

但是:这个计算是为了找出我们喜欢的最大值是多少。大于max的值需要被丢弃。

这个特殊的计算即使在n远小于我们的生成器返回的值时也能很好地工作。例如,假设我们有一个四位生成器,返回在[0..15]范围内的值,而我们想要一个在[0..2]范围内的数字。因此,我们的n是3(表示我们想要一个在[0..2]范围内的数字)。我们计算16%3得到1。然后我们取15(最大输出值减1)- 1得到14作为我们的最大可接受值。也就是说,我们将允许在这个输入范围内的数字,从0到14,但排除15。

对于返回在[0..9223372036854775807]范围内的63位生成器和n==3,我们将max设置为9223372036854775805。这正是我们想要的:它排除了两个有偏差的值,9223372036854775806和9223372036854775807。

代码的其余部分只是这样做:

    v := r.Int63()
    for v > max {
        v = r.Int63()
    }
    return v % n

我们选择一个Int63范围的数字。如果它超过了max,我们选择另一个数字并再次检查,直到我们选择一个在[0..max]范围内的数字,包括max

一旦我们得到一个在范围内的数字,我们使用% n缩小范围(如果需要的话)。例如,如果范围是[0..2],我们使用v % 3。如果v是(假设)14,14%3是2。我们的实际最大值仍然是9223372036854775805,无论v是什么,介于0和该值之间,v%3介于0和2之间,并保持均匀分布,没有对0和1的轻微偏差(9223372036854775806会给我们那一个额外的0,而9223372036854775807会给我们那一个额外的1)。

(现在对于int32321<<32,对于Int31函数,重复上述过程。)

英文:

This is more of a question about algorithms than it is about Go, but there are some Go parts. In any case I'll start with the algorithm issues.

Shrinking the range of a uniform random number generator

Suppose that we have a uniform-distribution random number generator that returns a number between, say, 0 and 7 inclusive. That is, it will, over time, return about the same number of 0s, 1s, 2s, ..., 7s, but with no apparent pattern between them.

Now, if we want a uniformly distributed random number between 0 and 7, this thing is perfect. That's what it returns. We just use it. But what if we want a uniformly distributed random number between 0 and 6 instead?

We could write:

func randMod7() int {
    return generate() % 7
}

so that if generate() returns 7 (which it has a 1 out of 8 chance of doing), we convert that value to zero. But then we'll get zero back 2 out of 8 times, instead of 1 out of 8 times. We'll get 1, 2, 3, 4, 5, and 6 back 1 out of 8 times, and zero 2 out of 8 times, on average: once for each actual zero, and once for each 7.

What we need to do, then, is throw away any occurrences of 7:

func randMod7() int {
    for {
        if i := generate() &lt; 7 {
            return i
        }
        // oops, got 7, try again
    }
}

Now, if we had a uniform-random-number-generator named generate() that returned a value between 0 and (say) 11 (12 possible values) and we wanted a value between 0 and 3 (four possible values), we could just use generate() % 4, because the 12 possible results would fall into 3 groups of four with equal probability. If we wanted a value between 0 and 5 inclusive, we could use generate() % 6, because the 12 possible results would fall into two groups of 6 with equal probability. In fact, all we need to do is examine the prime factorization of the range of our uniform number generator to see what moduli work. The factors of 12 are 2, 2, 3; so 2, 3, 4, and 6 all work here. Any other modulus, such as generate() % 10, produce a biased result: 0 and 1 occur 2 out of 12 times, but 2 through 9 occur 1 out of 12 times. (Note: generate() % 12 also works, but is kind of pointless.)

In our particular case, we have two different uniform random number generators available. One, Int31(), produces values between 0 and 0x7fffffff (2147483647 decimal, or 2<sup>31</sup> - 1, or 1&lt;&lt;31 - 1) inclusive. The other, Int63(), produces values between 0 and 0x7fffffffffffffff (9223372036854775807, or 2<sup>63</sup> - 1, or 1&lt;&lt;63 - 1). These are ranges that hold 2<sup>31</sup> and 2<sup>63</sup> values respectively, and hence their prime factorization is 31 2s, or 63 2s.

What this means is that we can compute Int31() mod 2<sup>k</sup>, for any integer k in zero to 31 inclusive, without messing up our uniformity. With Int63(), we can do the same with k ranging all the way up to 63.

Introducing the computer

Now, mathematically-and-computer-ly speaking, given any nonnegative integer n in [0..0x7ffffff] or [0..0x7fffffffffffffff], and a non-negative integer k in the right range (no more than 31 or 63 respectively), computing that integer n mod 2<sup>k</sup> produces the same result as computing that integer and doing a bit-mask operation with k bits set. To get that number of set bits, we want to take 1&lt;&lt;k and subtract 1. If k is, say, 4, we get 1<<4 or 16. Subtracting 1, we get 15, or 0xf, which has four 1 bits in it.

So:

n % (1 &lt;&lt; k)

and:

n &amp; (1&lt;&lt;k - 1)

produce the same result. Concretely, when k==4, this is n%16 or n&amp;0xf. When k==5 this is n%32 or n&amp;0x1f. Try it for k==0 and k==63.

Introducing Go-the-language

We're now ready to consider doing all of this in Go. We note that int (plain, unadorned int) is guaranteed to be able to hold values between -2147483648 and +2147483647 (-0x80000000 through +0x7fffffff) respectively. It may extend all the way to -0x8000000000000000 through +0x7ffffffffffffff.

Meanwhile, int32 always handles the smaller range and int64 always handles the larger range. The plain int is a different type from these other two, but implements the same range as one of the two. We just don't know which one.

Our Int31 implementation returns a uniformly distributed random number in the 0..0x7ffffff range. (It does this by returning the upper 32 bits of r.Int63(), though this is an implementation detail.) Our Int63 implementation returns a uniformly distributed random number in the 0..0x7ffffffffffffff range.

The Intn function you show here:

func (r *Rand) Intn(n int) int {
    if n &lt;= 0 {
        panic(&quot;invalid argument to Intn&quot;)
    }
    if n &lt;= 1&lt;&lt;31-1 {
        return int(r.Int31n(int32(n)))
    }
    return int(r.Int63n(int64(n)))
}

just picks one of the two functions, based on the value of n: if it's less than or equal to 0x7fffffff (1&lt;&lt;31 - 1), the result fits in int32, so it uses int32(n) to convert n to int32, calls r.Int31n, and converts the result back to int. Otherwise, the value of n exceeds 0x7fffffff, implying that int has the larger range and we must use the larger-range generator, r.Int63n. The rest is the same except for types.

The code could just do:

return int(r.Int63n(int64(n)))

every time, but on 32-bit machines, where 64-bit arithmetic may be slow, this might be slow. (There's a lot of may and might here and if you were writing this yourself today, you should start by profiling / benchmarking the code. The Go authors did do this, though this was many years ago; at that time it was worth doing this fancy stuff.)

More bit-manipulation

The insides of both functions Int31n and Int63n are quite similar; the main difference is the types involved, and then in a few places, the maximum values. Again, the reason for this is at least partly historical: on some (mostly old now) computers, the Int63n variant is significantly slower than the Int32n variant. (In some non-Go language, we might write these as generics and then have the compiler generate a type-specific version automatically.) So let's just look at the Int63 variant:

func (r *Rand) Int63n(n int64) int64 {
    if n &lt;= 0 {
        panic(&quot;invalid argument to Int63n&quot;)
    }
    if n&amp;(n-1) == 0 { // n is power of two, can mask
        return r.Int63() &amp; (n - 1)
    }
    max := int64((1 &lt;&lt; 63) - 1 - (1&lt;&lt;63)%uint64(n))
    v := r.Int63()
    for v &gt; max {
        v = r.Int63()
    }
    return v % n
}

The argument n has type int64, so that its value will not exceed 2<sup>63</sup>-1 or 0x7fffffffffffffff or 9223372036854775807. But it could be negative, and negative values won't work right, so the first thing we do is test for that and panic if so. We also panic if the input is zero (this is something of a choice, but it's useful to note it now).

Next we have the n&amp;(n-1) == 0 test. This is a test for powers of two, with one slight flaw, and it works in many languages (those that have bit-masking):

  • A power of two is always represented as a single set bit, in the binary representation of a number. For instance, 2 itself is 00000001<sub>2</sub>, 4 is 00000010<sub>2</sub>, 8 is 00000100<sub>2</sub>, and so on, through 128 being 10000000<sub>2</sub>. (Since I only "drew" eight bits this series maxes out at 128.)

  • Subtracting 1 from that number causes a borrow: that bit goes to zero, and all the lesser bits become 1. For instance, 10000000<sub>2</sub> - 1 is 01111111<sub>2</sub>.

  • AND-ing these two together produces zero if there was just the single bit set initially. If not—for instance, if we have the value 130 or 10000010<sub>2</sub> initially, subtracting 1 produces 10000001<sub>2</sub>—there's no borrow out of the top bit, so the top bit is set in both inputs and therefore is set in the AND-ed result.

The slight flaw is that if the initial value is zero, then we have 0-1, which produces all-1s; 0&amp;0xffffffffffffffff is zero too, but zero is not an integer power of two. (2<sup>0</sup> is 1, not 0.) This minor flaw is not important for our purpose here, because we already made sure to panic for this case: it just doesn't happen.

Now we have the most complicated line of all:

    max := int64((1 &lt;&lt; 63) - 1 - (1&lt;&lt;63)%uint64(n))

The recurring 63s here are because we have a value range going from zero to 2<sup>63</sup>-1. 1&lt;&lt;63 - 1 is (still, again, always) 9223372036854775807 or 0x7fffffffffffffff. Meanwhile, 1&lt;&lt;63, without 1 subtracted from it, is 9223372036854775808 or 0x8000000000000000. This value does not fit into int64 but it does fit into uint64. So if we turn n into a uint64, we can compute uint64(9223372036854775808) % uint64(n), which is what the % expression does. By using uint64 for this calculation, we ensure that it doesn't overflow.

But: what is this calculation all about? Well, go back to our example with a generate() that produces values in [0..7]. When we wanted a number in [0..5], we had to discard both 6 and 7. That's what we're going for here: we want to find the value above which we should discard values.

If we were to take 8%6, we'd get 2. 8 is one bigger than the maximum that our 3-bit generate() would generate. 8%6 == 2 is the number of "high values" that we have to discard: 8-2 = 6 and we want to discard values that are 6 or more. Subtract 1 from this, and we get 7-2 = 5; we can accept numbers in this input range, from 0 to 5 inclusive.

So, this somewhat fancy calculation for setting max is just a way to find out what the maximum value we like is. Values that are greater than max need to be tossed out.

This particular calculation works nicely even if n is much less than our generator returns. For instance, suppose we had a four-bit generator, returning values in the [0..15] range, and we wanted a number in [0..2]. Our n is therefore 3 (to indicate that we want a number in [0..2]). We compute 16%3 to get 1. We then take 15 (one less than our maximum output value) - 1 to get 14 as our maximum acceptable value. That is, we would allow numbers in [0..14], but exclude 15.

With a 63-bit generator returning values in [0..9223372036854775807], and n==3, we would set max to 9223372036854775805. That's what we want: it throws out the two biasing values, 9223372036854775806 and 9223372036854775807.

The remainder of the code simply does that:

    v := r.Int63()
    for v &gt; max {
        v = r.Int63()
    }
    return v % n

We pick one Int63-range number. If it exceeds max, we pick another one and check again, until we pick one that is in the [0..max] range, inclusive of max.

Once we get a number that is in range, we use % n to shrink the range if needed. For instance, if the range is [0..2], we use v % 3. If v is (say) 14, 14%3 is 2. Our actual max is, again, 9223372036854775805, and whatever v is, between 0 and that, v%3 is between 0 and 2 and remains uniformly distributed, with no slight bias to 0 and 1 (9223372036854775806 would give us that one extra 0, and 9223372036854775807 would give us that one extra 1).

(Now repeat the above for int32 and 32 and 1&lt;&lt;32, for the Int31 function.)

huangapple
  • 本文由 发表于 2021年7月22日 02:23:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/68474676.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定