在Swift中更改随机分布的函数?

huangapple go评论52阅读模式
英文:

Function to change random distribution in Swift?

问题

Swift是否有一些内置方法可以改变随机数的分布?我想使用线性方程来定义分布,类似于y = k * x + m。当k = 0时,所有数字应该是均匀分布的。当k = 1时,分布应该遵循这条线,因此较小的x值将非常罕见,而较大的x值将很常见。我在Excel中尝试了不同的策略,最终得出了这段代码,似乎可以工作 - 但肯定有更简洁的方法在Swift中实现这个吧

作为一种注意:首先,我使用了一个ClosedRange<Double>的数组,而不是元组的方法,然后使用.contains。然后我将它改为了元组的数组,因为我的代码没有按预期工作。可能是另一个错误,但由于代码现在能正常工作,所以我就一直使用元组了。

```swift
import Foundation

/* 函数,根据线性分布(y = k * x + m)创建带有上下限的元组数组 */
func createDistributions(numbers: ClosedRange<Int>, k: Double) -> [(Double, Double)] {
    var dist = [(Double, Double)]()
    let m: Double = 0.5
    let nVal: Int = numbers.count
    var pStop: Double = 0.0

    for x in numbers {
        let t = (Double(x) + 0.5) / Double(nVal)
        let y = (k * (t - 0.5) + m) * 2.0 / Double(nVal)
        let start = pStop
        let stop = y + start
        
        dist.append((start, stop))
        pStop = stop
    }
    
    return dist
}

// 基于k值为1.0创建分布
var result = createDistributions(numbers: 0...34, k: 1.0)


// 循环十次,每次创建一个双精度随机数
for _ in 0...9 {
    let ran = Double.random(in: 0...1)
    
    // 通过检查下限和上限来确定随机数属于哪个索引数组
    for i in 0..<result.count {
        
        // 随机数属于第i个元素,打印i
        if ran >= result[i].0 && ran <= result[i].1 {
            print(i)
        }
    }
}
英文:

Does Swift have some built in methods to change the distribution of random numbers? I want to use a linear equation to define the distribution, something like y = k * x + m. When k = 0, all numbers should be equally distributed. When k = 1 the distribution should follow the line, so low x-values would be very rare while high x-values would be common. I played around in Excel and tried different strategies and finally came up with this code, which seems to work - but there must be a neater way to do this in Swift?

As a note: First I did use an array of ClosedRange<Double> instead of the tuples-approach, then used .contains. Then I changed it to an array of tuples because my code didn't work as expected. Probably another bug, but I stayed with tuples as the code works now.

import Foundation

/* function to create an array of tuples with upper and lower
 limits based on a linear distribution (y = k * x + m) */
func createDistributions(numbers: ClosedRange&lt;Int&gt;, k: Double) -&gt; [(Double, Double)] {
    var dist = [(Double, Double)]()
    let m: Double = 0.5
    let nVal: Int = numbers.count
    var pStop: Double = 0.0

    for x in numbers {
        let t = (Double(x) + 0.5) / Double(nVal)
        let y = (k * (t - 0.5) + m) * 2.0 / Double(nVal)
        let start = pStop
        let stop = y + start
        
        dist.append((start, stop))
        pStop = stop
    }
    
    return dist
}

// create distributions based on k-value of 1.0
var result = createDistributions(numbers: 0...34, k: 1.0)


// loop ten times, creating a Double random number each time
for _ in 0...9 {
    let ran = Double.random(in: 0...1)
    
    // check in which indexed array the random number belongs to by checking lower and upper limit
    for i in 0..&lt;result.count {
        
        // the random number belongs to the i:th element, print i
        if ran &gt;= result[i].0 &amp;&amp; ran &lt;= result[i].1 {
            print(i)
        }
    }
}

答案1

得分: 4

Your y = kx+m is a probability density function (PDF). A very good way to apply this to random number generation is an inverse transform sampling function. I'll walk through how to develop that, step by step, so that you can adapt it to your specific needs. In the general case, this would be done with first-year calculus, but for the linear case it's easy enough to do with basic algebra and some grade-school geometry. For this example, I'll be generating a random value between 0 and 1.

(For other Americans reading along: this is the slope-intercept form we learn as y = mx+b. Please don't get confused that m here is the intercept, not the slope. I hopefully haven't mixed them up anywhere in this answer.)

To experiment with this answer, see the GeoGebra worksheet that the images come from.

The TL;DR of all of this is:

let u = Double.random(in: 0...1)
if k == 0 {
    return u
} else {
    return (sqrt(k*k + k*(8*u - 4) + 4) + k - 2)/(2*k)
}

But learning why that's the answer is the real goal.

A PDF is a function whose area between two x-values is the probability that the value is between those values. This leads to the fact that the PDF must be positive for all values in its range, and the area under it must be exactly 1 (representing a 100% chance of selecting some value over the entire range).

But a quick look at an arbitrary version of this curve shows that it may not have the right area:

[![Graph of y=kx+m with k=0.4 and m=0.4][1]][1]

For a given value of k there is a specific value of m that is valid. We can work that out by computing the area in terms of k and m, setting it to 1, and then solving for m. The area of the graph is a rectangle with base 1 (the range 0-1 of values we'll select) and a height m, plus a triangle of base 1, and height k. So:

Area = Rectangle + Triangle = 1
       m + k/2 = 1
       m = 1 - k/2

And substituting back into F(x):

F(x) = kx + 1 - k/2

We are also constrained that m cannot be less than 0, which constrains k to the range [0,2]. When k is 0, then all values are equally likely. When k is 2, there is a linear relationship between a value and its likelihood.

[![Constrained PDF with a linear relationship][2]][2]

With a valid PDF, it is time to create a cumulative distribution function. This is a function that represents the likelihood that a randomly selected value is no greater than the given value. These functions are constrained for the same reasons as PDFs. They must monotonically increase from zero to one over the valid range.

[![CDF from 0 to 0.3 = 0.174][3]][3]

This area can be computed just like the full area, by summing a rectangle and a triangle:

CDF(x) = Rectangle + Triangle
       = mx + (x/2 * (F(x) - m))
       = ... some algebra later ...
       = (k*x^2)/2 + (1-k/2)*x

[![CDF function that intersects 0,0 and 1,1][4]][4]

Notice that this function correctly passes through (0,0) and (1,1) as it must, and is positive over the entire range. There is no chance of a value less than zero, and there is a 100% probability of a value less than or equal to one.

Almost there. An inverse transform sample applies the inverse of the CDF. That's not particularly complicated, but it is a lot of algebra, so let WolframAlpha do it:

solve y = (k*x^2)/2 + (1-k/2)*x for x
=>
x = y and k = 0
x = -(sqrt(k^2 + 8ky - 4k + 4) - k + 2)/(2k) and k!=0
x = (sqrt(k^2 + k(8y - 4) + 4) + k - 2)/(2k) and k!=0

For k=0, x=y. Elsewhere, there are two solutions. Only positive values make sense here, so ignore the negative one.

[![Inverse of CDF for k=1.5][5]][5]

The red line is the function you want (this is for k=1.5). Very long road to get here, but now the code is pretty easy:

// `k` ranges from 0 to 2, which is confusing. Map it to range 0...1
func randomValue(distribution d: Double) -> Double {
    assert((0...1).contains(d))
    let u = Double.random(in: 0...1)

    // k ranges from 0 to 2
    let k = d * 2

    if k == 0 {
        return u
    } else {
        return (sqrt(k*k + k*(8*u - 4) + 4) + k - 2)/(2*k)
    }
}

And just to test it out:

func testRun(distribution d: Double) {
    print("Distribution for \(d)")
    let n = 10_000

    // How many results begin with a given digit after the decimal point?
    var h: [Substring:Int] = [:]
    for _ in 0..<n {
        let value = randomValue(distribution: d)
        let firstDigit = "\(value)".prefix(3).suffix(1)
        h[firstDigit, default: 0] += 1
    }

    for digit in h.keys.sorted() {
        let ratio = Double(h[digit]!)/Double(n)
        print("\(digit) -> \(ratio.formatted(.percent.precision(.fractionLength(0))))")
    }
}

testRun(distribution: 0)
testRun(distribution: 0.5)
testRun(distribution: 1)

The result would look something like this:

Distribution for 0.0
0 -> 10%
1 -> 10%
2 -> 10%
3 -> 11%
4 -> 10%


<details>
<summary>英文:</summary>

Your `y = kx+m` is a [probability density function](https://en.wikipedia.org/wiki/Probability_density_function) (PDF). A very good way to apply this to random number generation is an [inverse transform sampling function](https://en.wikipedia.org/wiki/Inverse_transform_sampling). I&#39;ll walk through how to develop that, step by step, so that you can adapt it to your specific needs. In the general case, this would be done with first-year calculus, but for the linear case it&#39;s easy enough to do with basic algebra and some grade-school geometry. For this example, I&#39;ll be generating a random value between 0 and 1.

(For other Americans reading along: this is the slope-intercept form we learn as `y = mx+b`. Please don&#39;t get confused that `m` here is the intercept, not the slope. I hopefully haven&#39;t mixed them up anywhere in this answer.)

To experiment with this answer, see the [GeoGebra worksheet](https://www.geogebra.org/calculator/xpy6mum5) that the images come from.

The TL;DR of all of this is:

    let u = Double.random(in: 0...1)
    if k == 0 {
        return u
    } else {
        return (sqrt(k*k + k*(8*u - 4) + 4) + k - 2)/(2*k)
    }

But learning why that&#39;s the answer is the real goal.

A PDF is a function whose area between two x-values is the probability that the value is between those values. This leads to the fact that the PDF must be positive for all values in its range, and the area under it must be exactly 1 (representing a 100% chance of selecting some value over the entire range).

But a quick look at an arbitrary version of this curve shows that it may not have the right area:

[![Graph of y=kx+m with k=0.4 and m=0.4][1]][1]

For a given value of `k` there is a specific value of `m` that is valid. We can work that out by computing the area in terms of `k` and `m`, setting it to 1, and then solving for `m`. The area of the graph is a rectangle with base 1 (the range 0-1 of values we&#39;ll select) and a height `m`, plus a triangle of base 1, and height `k`. So:

    Area = Rectangle + Triangle = 1
           m + k/2 = 1
           m = 1 - k/2

And substituting back into F(x):

    F(x) = kx + 1 - k/2

We are also constrained that `m` cannot be less than 0, which constrains `k` to the range [0,2]. When `k` is 0, then all values are equally likely. When `k` is 2, there is a linear relationship between a value and its likelihood.

[![Constrained PDF with linear relationship][2]][2]

With a valid PDF, it is time to create a [cumulative distribution function](https://en.wikipedia.org/wiki/Cumulative_distribution_function). This is a function that represents the likelihood that a randomly selected value is no greater than the given value. These functions are constrained for the same reasons as PDFs. They must monotonically increase from zero to one over the valid range.

[![CDF from 0 to 0.3 = 0.174][3]][3]

This area can be computed just like the full area, by summing a rectangle and a triangle:

    CDF(x) = Rectangle + Triangle
           = mx + (x/2 * (F(x) - m))
           = ... some algebra later ...
           = (k*x^2)/2 + (1-k/2)*x

[![CDF function that intersects 0,0 and 1,1][4]][4]

Notice that this function correctly passes through (0,0) and (1,1) as it must, and is positive over the entire range. There is no chance of a value less than zero, and there is 100% probability of a value less than or equal to one.


Almost there. An inverse transform sample applies the inverse of the CDF. That&#39;s not particularly complicated, but it is a lot of algebra, so let [WolframAlpha](https://www.wolframalpha.com/input?i=solve+y+%3D+%28k*x%5E2%29%2F2+%2B+%281-k%2F2%29*x+for+x) do it:

    solve y = (k*x^2)/2 + (1-k/2)*x for x
    ==&gt;
    x = y and k = 0
    x = -(sqrt(k^2 + 8 k y - 4 k + 4) - k + 2)/(2 k) and k!=0
    x = (sqrt(k^2 + k (8 y - 4) + 4) + k - 2)/(2 k) and k!=0

For k=0, x=y. Elsewhere, there are two solutions. Only positive values make sense here, so ignore the negative one.

[![Inverse of CDF for k=1.5][5]][5]

The red line is the function you want (this is for k=1.5). Very long road to get here, but now the code is pretty easy:

// k ranges from 0 to 2, which is confusing. Map it to range 0...1
func randomValue(distribution d: Double) -> Double {
assert((0...1).contains(d))
let u = Double.random(in: 0...1)

// k ranges from 0 to 2
let k = d * 2

if k == 0 {
    return u
} else {
    return (sqrt(k*k + k*(8*u - 4) + 4) + k - 2)/(2*k)
}

}


And just to test it out:

func testRun(distribution d: Double) {
print("Distribution for (d)")
let n = 10_000

// How many results begin with a given digit after the decimal point?
var h: [Substring:Int] = [:]
for _ in 0..&lt;n {
    let value = randomValue(distribution: d)
    let firstDigit = &quot;\(value)&quot;.prefix(3).suffix(1)
    h[firstDigit, default: 0] += 1
}

for digit in h.keys.sorted() {
    let ratio = Double(h[digit]!)/Double(n)
    print(&quot;\(digit) -&gt; \(ratio.formatted(.percent.precision(.fractionLength(0))))&quot;)
}

}

testRun(distribution: 0)
testRun(distribution: 0.5)
testRun(distribution: 1)

===>
Distribution for 0.0
0 -> 10%
1 -> 10%
2 -> 10%
3 -> 11%
4 -> 10%
5 -> 10%
6 -> 10%
7 -> 10%
8 -> 10%
9 -> 10%
Distribution for 0.5
0 -> 6%
1 -> 6%
2 -> 7%
3 -> 9%
4 -> 10%
5 -> 11%
6 -> 11%
7 -> 13%
8 -> 13%
9 -> 14%
Distribution for 1.0
0 -> 1%
1 -> 3%
2 -> 5%
3 -> 7%
4 -> 9%
5 -> 11%
6 -> 13%
7 -> 15%
8 -> 17%
9 -> 19%


A linear equation can only push this so far. I don&#39;t believe you can get a larger difference between low-probability and high-probability values with only a linear PDF (a better mathematician may correct me here; this isn&#39;t my speciality). If you want that, I would explore applying this to larger-order polynomials. Rather than `F(x) = kx + m`, you could do the same thing with `F(x) = kx^2 + m` or even higher powers. This will take some first-year calculus, but the overall approach should be similar.

  [1]: https://i.stack.imgur.com/HEN20.png
  [2]: https://i.stack.imgur.com/syvJH.png
  [3]: https://i.stack.imgur.com/WRHWw.png
  [4]: https://i.stack.imgur.com/dDzsw.png
  [5]: https://i.stack.imgur.com/flgZA.png


</details>



huangapple
  • 本文由 发表于 2023年3月7日 20:09:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75661798.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定