英文:
weighting a flat list to the normal distribution
问题
我有一个字符串项的列表,长度可以是任意的。我需要将这个列表进行"归一化",使得每个项都成为正态分布的一部分,并将权重附加到字符串上。
除了下面的方法之外,还有什么更有效和数学/统计的方法可以实现这个目标?
func normalizeAppend(in []string, shuffle bool) []string {
var ret []string
if shuffle {
shuffleStrings(in)
}
l := len(in)
switch {
case remain(l, 3) == 0:
l3 := (l / 3)
var low, mid, high []string
for i, v := range in {
o := i + 1
switch {
case o <= l3:
low = append(low, v)
case o > l3 && o <= l3*2:
mid = append(mid, v)
case o >= l3*2:
high = append(high, v)
}
}
q1 := 1600 / len(low)
q2 := 6800 / len(mid)
q3 := 1600 / len(high)
for _, v := range low {
ret = append(ret, fmt.Sprintf("%s_%d", v, q1))
}
for _, v := range mid {
ret = append(ret, fmt.Sprintf("%s_%d", v, q2))
}
for _, v := range high {
ret = append(ret, fmt.Sprintf("%s_%d", v, q3))
}
case remain(l, 2) == 0 && l >= 4:
l4 := (l / 4)
var first, second, third, fourth []string
for i, v := range in {
o := i + 1
switch {
case o <= l4:
first = append(first, v)
case o > l4 && o <= l4*2:
second = append(second, v)
case o > l4*2 && o <= l4*3:
third = append(third, v)
case o > l4*3:
fourth = append(fourth, v)
}
}
q1 := 1600 / len(first)
q2 := 3400 / len(second)
q3 := 3400 / len(third)
q4 := 1600 / len(fourth)
for _, v := range first {
ret = append(ret, fmt.Sprintf("%s_%d", v, q1))
}
for _, v := range second {
ret = append(ret, fmt.Sprintf("%s_%d", v, q2))
}
for _, v := range third {
ret = append(ret, fmt.Sprintf("%s_%d", v, q3))
}
for _, v := range fourth {
ret = append(ret, fmt.Sprintf("%s_%d", v, q4))
}
default:
var first, second, third []string
q1 := (1 + math.Floor(float64(l)*.16))
q3 := (float64(l) - math.Floor(float64(l)*.16))
var o float64
for i, v := range in {
o = float64(i + 1)
switch {
case o <= q1:
first = append(first, v)
case o > q1 && o < q3:
second = append(second, v)
case o >= q3:
third = append(third, v)
}
}
lq1 := 1600 / len(first)
lq2 := 3400 / len(second)
lq3 := 1600 / len(third)
for _, v := range first {
ret = append(ret, fmt.Sprintf("%s_%d", v, lq1))
}
for _, v := range second {
ret = append(ret, fmt.Sprintf("%s_%d", v, lq2))
}
for _, v := range third {
ret = append(ret, fmt.Sprintf("%s_%d", v, lq3))
}
}
return ret
}
一些需要澄清的问题:
我有一个列表,将从该列表中多次选择一个项,并根据权重进行选择。一开始,列表中的每个项的权重都是1:
[a_1, b_1, c_1, d_1, e_1, f_1, g_1, h_1, i_1, j_1, k_1]
我希望找到一种更好的方法,使得选择的权重更接近正态分布:
[a_1, b_2, c_3, d_5, e_14, f_30, g_14, h_5, i_3, j_2, k_1]
或许我需要改变我的方法,采用更统计学上的方法。最重要的是,我想要控制从一个列表中进行选择的方式,其中之一就是确保返回的项近似于正态曲线。
英文:
I have list of string items of any length, I need to "normalize" this list so that each item is part of a normal distribution, appending the weight to the string.
What is more effective and mathematical/statistical way to go about this other than what I have below?
func normalizeAppend(in []string, shuffle bool) []string {
var ret []string
if shuffle {
shuffleStrings(in)
}
l := len(in)
switch {
case remain(l, 3) == 0:
l3 := (l / 3)
var low, mid, high []string
for i, v := range in {
o := i + 1
switch {
case o <= l3:
low = append(low, v)
case o > l3 && o <= l3*2:
mid = append(mid, v)
case o >= l3*2:
high = append(high, v)
}
}
q1 := 1600 / len(low)
q2 := 6800 / len(mid)
q3 := 1600 / len(high)
for _, v := range low {
ret = append(ret, fmt.Sprintf("%s_%d", v, q1))
}
for _, v := range mid {
ret = append(ret, fmt.Sprintf("%s_%d", v, q2))
}
for _, v := range high {
ret = append(ret, fmt.Sprintf("%s_%d", v, q3))
}
case remain(l, 2) == 0 && l >= 4:
l4 := (l / 4)
var first, second, third, fourth []string
for i, v := range in {
o := i + 1
switch {
case o <= l4:
first = append(first, v)
case o > l4 && o <= l4*2:
second = append(second, v)
case o > l4*2 && o <= l4*3:
third = append(third, v)
case o > l4*3:
fourth = append(fourth, v)
}
}
q1 := 1600 / len(first)
q2 := 3400 / len(second)
q3 := 3400 / len(third)
q4 := 1600 / len(fourth)
for _, v := range first {
ret = append(ret, fmt.Sprintf("%s_%d", v, q1))
}
for _, v := range second {
ret = append(ret, fmt.Sprintf("%s_%d", v, q2))
}
for _, v := range third {
ret = append(ret, fmt.Sprintf("%s_%d", v, q3))
}
for _, v := range fourth {
ret = append(ret, fmt.Sprintf("%s_%d", v, q4))
}
default:
var first, second, third []string
q1 := (1 + math.Floor(float64(l)*.16))
q3 := (float64(l) - math.Floor(float64(l)*.16))
var o float64
for i, v := range in {
o = float64(i + 1)
switch {
case o <= q1:
first = append(first, v)
case o > q1 && o < q3:
second = append(second, v)
case o >= q3:
third = append(third, v)
}
}
lq1 := 1600 / len(first)
lq2 := 3400 / len(second)
lq3 := 1600 / len(third)
for _, v := range first {
ret = append(ret, fmt.Sprintf("%s_%d", v, lq1))
}
for _, v := range second {
ret = append(ret, fmt.Sprintf("%s_%d", v, lq2))
}
for _, v := range third {
ret = append(ret, fmt.Sprintf("%s_%d", v, lq3))
}
}
return ret
}
Some requested clarification:
I have a list of items that will chosen from the list many times one at a time by weighted selection, to start with I have a list with (implied) weights of 1:
[a_1, b_1, c_1, d_1, e_1, f_1, g_1, h_1, i_1, j_1, k_1]
I'm looking for a better way to make that list into something producing a more 'normal' distribution of weighting for selection:
[a_1, b_2, c_3, d_5, e_14, f_30, g_14, h_5, i_3, j_2, k_1]
or perhaps it is likely I need to change my methods to something more grounded statistically. Bottom line is I want to control selection from a list of items in many ways, one of which here is ensuring that items are returned in way approximating a normal curve.
答案1
得分: 0
如果您只想计算给定列表的权重,那么您需要以下几个要素:
- 正态分布的均值
- 正态分布的方差
- 值的离散化器
第一个要素非常简单。您希望均值位于列表的中心。因此(假设从零开始索引):
均值 = (列表大小 - 1) / 2
第二个要素有些任意,并取决于您希望权重如何逐渐减小。正态分布的权重在距离均值超过 3 * 标准差
的地方几乎为零。因此,在大多数情况下,一个好的标准差可能是列表长度的四分之一到六分之一之间:
标准差 = (1/4 .. 1/6) * 列表大小
方差 = 标准差^2
假设您希望权重为整数,您需要将正态分布的权重离散化。最简单的方法是指定最大权重(位于均值位置的元素的权重)。
就是这样。位置 i
处元素的权重为:
权重[i] = round(最大权重 * exp(-(i - 均值)^2 / (2 * 方差)))
英文:
If you just want to calculate the weights for a given list, then you need the following things:
- The mean of the normal distribution
- The variance of the normal distribution
- A discretizer for the values
The first one is quite simple. You want the mean to be in the center of the list. Therefore (assuming zero-based indexing):
mean = (list.size - 1) / 2
The second is kind of arbitrary and depends on how steep you want your weights to fall off. Weights of the normal distribution are practically zero beyond a distance of 3 * standard_deviation
from the mean
. So a good standard deviation in most cases is probably something between a fourth and a sixth list length:
standard_deviation = (1/4 .. 1/6) * list.size
variance = standard_deviation^2
Assuming that you want integer weights, you need to discretize the weights from the normal distribution. The easiest way to do this is by specifying the maximum weight (of the element at the mean position).
That's it. The weight for an element at position i
is then:
weight[i] = round(max_weight * exp(-(i - mean)^2 / (2 * variance)))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论