2016年11月5日 03:50:59go评论118阅读模式

英文:

weighting a flat list to the normal distribution

问题

我有一个字符串项的列表，长度可以是任意的。我需要将这个列表进行"归一化"，使得每个项都成为正态分布的一部分，并将权重附加到字符串上。

除了下面的方法之外，还有什么更有效和数学/统计的方法可以实现这个目标？

func normalizeAppend(in []string, shuffle bool) []string {
    var ret []string
    if shuffle {
        shuffleStrings(in)
    }
    l := len(in)
    switch {
    case remain(l, 3) == 0:
        l3 := (l / 3)
        var low, mid, high []string
        for i, v := range in {
            o := i + 1
            switch {
            case o <= l3:
                low = append(low, v)
            case o > l3 && o <= l3*2:
                mid = append(mid, v)
            case o >= l3*2:
                high = append(high, v)
            }
        }
        q1 := 1600 / len(low)
        q2 := 6800 / len(mid)
        q3 := 1600 / len(high)
        for _, v := range low {
            ret = append(ret, fmt.Sprintf("%s_%d", v, q1))
        }
        for _, v := range mid {
            ret = append(ret, fmt.Sprintf("%s_%d", v, q2))
        }
        for _, v := range high {
            ret = append(ret, fmt.Sprintf("%s_%d", v, q3))
        }
    case remain(l, 2) == 0 && l >= 4:
        l4 := (l / 4)
        var first, second, third, fourth []string
        for i, v := range in {
            o := i + 1
            switch {
            case o <= l4:
                first = append(first, v)
            case o > l4 && o <= l4*2:
                second = append(second, v)
            case o > l4*2 && o <= l4*3:
                third = append(third, v)
            case o > l4*3:
                fourth = append(fourth, v)
            }
        }
        q1 := 1600 / len(first)
        q2 := 3400 / len(second)
        q3 := 3400 / len(third)
        q4 := 1600 / len(fourth)
        for _, v := range first {
            ret = append(ret, fmt.Sprintf("%s_%d", v, q1))
        }
        for _, v := range second {
            ret = append(ret, fmt.Sprintf("%s_%d", v, q2))
        }
        for _, v := range third {
            ret = append(ret, fmt.Sprintf("%s_%d", v, q3))
        }
        for _, v := range fourth {
            ret = append(ret, fmt.Sprintf("%s_%d", v, q4))
        }
    default:
        var first, second, third []string
        q1 := (1 + math.Floor(float64(l)*.16))
        q3 := (float64(l) - math.Floor(float64(l)*.16))
        var o float64
        for i, v := range in {
            o = float64(i + 1)
            switch {
            case o <= q1:
                first = append(first, v)
            case o > q1 && o < q3:
                second = append(second, v)
            case o >= q3:
                third = append(third, v)
            }
        }
        lq1 := 1600 / len(first)
        lq2 := 3400 / len(second)
        lq3 := 1600 / len(third)
        for _, v := range first {
            ret = append(ret, fmt.Sprintf("%s_%d", v, lq1))
        }
        for _, v := range second {
            ret = append(ret, fmt.Sprintf("%s_%d", v, lq2))
        }
        for _, v := range third {
            ret = append(ret, fmt.Sprintf("%s_%d", v, lq3))
        }
    }
    return ret
}

一些需要澄清的问题：

我有一个列表，将从该列表中多次选择一个项，并根据权重进行选择。一开始，列表中的每个项的权重都是1：

[a_1, b_1, c_1, d_1, e_1, f_1, g_1, h_1, i_1, j_1, k_1]

我希望找到一种更好的方法，使得选择的权重更接近正态分布：

[a_1, b_2, c_3, d_5, e_14, f_30, g_14, h_5, i_3, j_2, k_1]

或许我需要改变我的方法，采用更统计学上的方法。最重要的是，我想要控制从一个列表中进行选择的方式，其中之一就是确保返回的项近似于正态曲线。

英文:

I have list of string items of any length, I need to "normalize" this list so that each item is part of a normal distribution, appending the weight to the string.

What is more effective and mathematical/statistical way to go about this other than what I have below?

func normalizeAppend(in []string, shuffle bool) []string {
var ret []string
if shuffle {
shuffleStrings(in)
}
l := len(in)
switch {
case remain(l, 3) == 0:
l3 := (l / 3)
var low, mid, high []string
for i, v := range in {
o := i + 1
switch {
case o &lt;= l3:
low = append(low, v)
case o &gt; l3 &amp;&amp; o &lt;= l3*2:
mid = append(mid, v)
case o &gt;= l3*2:
high = append(high, v)
}
}
q1 := 1600 / len(low)
q2 := 6800 / len(mid)
q3 := 1600 / len(high)
for _, v := range low {
ret = append(ret, fmt.Sprintf(&quot;%s_%d&quot;, v, q1))
}
for _, v := range mid {
ret = append(ret, fmt.Sprintf(&quot;%s_%d&quot;, v, q2))
}
for _, v := range high {
ret = append(ret, fmt.Sprintf(&quot;%s_%d&quot;, v, q3))
}
case remain(l, 2) == 0 &amp;&amp; l &gt;= 4:
l4 := (l / 4)
var first, second, third, fourth []string
for i, v := range in {
o := i + 1
switch {
case o &lt;= l4:
first = append(first, v)
case o &gt; l4 &amp;&amp; o &lt;= l4*2:
second = append(second, v)
case o &gt; l4*2 &amp;&amp; o &lt;= l4*3:
third = append(third, v)
case o &gt; l4*3:
fourth = append(fourth, v)
}
}
q1 := 1600 / len(first)
q2 := 3400 / len(second)
q3 := 3400 / len(third)
q4 := 1600 / len(fourth)
for _, v := range first {
ret = append(ret, fmt.Sprintf(&quot;%s_%d&quot;, v, q1))
}
for _, v := range second {
ret = append(ret, fmt.Sprintf(&quot;%s_%d&quot;, v, q2))
}
for _, v := range third {
ret = append(ret, fmt.Sprintf(&quot;%s_%d&quot;, v, q3))
}
for _, v := range fourth {
ret = append(ret, fmt.Sprintf(&quot;%s_%d&quot;, v, q4))
}
default:
var first, second, third []string
q1 := (1 + math.Floor(float64(l)*.16))
q3 := (float64(l) - math.Floor(float64(l)*.16))
var o float64
for i, v := range in {
o = float64(i + 1)
switch {
case o &lt;= q1:
first = append(first, v)
case o &gt; q1 &amp;&amp; o &lt; q3:
second = append(second, v)
case o &gt;= q3:
third = append(third, v)
}
}
lq1 := 1600 / len(first)
lq2 := 3400 / len(second)
lq3 := 1600 / len(third)
for _, v := range first {
ret = append(ret, fmt.Sprintf(&quot;%s_%d&quot;, v, lq1))
}
for _, v := range second {
ret = append(ret, fmt.Sprintf(&quot;%s_%d&quot;, v, lq2))
}
for _, v := range third {
ret = append(ret, fmt.Sprintf(&quot;%s_%d&quot;, v, lq3))
}
}
return ret
}

Some requested clarification:

I have a list of items that will chosen from the list many times one at a time by weighted selection, to start with I have a list with (implied) weights of 1:

[a_1, b_1, c_1, d_1, e_1, f_1, g_1, h_1, i_1, j_1, k_1]

I'm looking for a better way to make that list into something producing a more 'normal' distribution of weighting for selection:

[a_1, b_2, c_3, d_5, e_14, f_30, g_14, h_5, i_3, j_2, k_1]

or perhaps it is likely I need to change my methods to something more grounded statistically. Bottom line is I want to control selection from a list of items in many ways, one of which here is ensuring that items are returned in way approximating a normal curve.

答案1

得分: 0

如果您只想计算给定列表的权重，那么您需要以下几个要素：

正态分布的均值
正态分布的方差
值的离散化器

第一个要素非常简单。您希望均值位于列表的中心。因此（假设从零开始索引）：

均值 = (列表大小 - 1) / 2

第二个要素有些任意，并取决于您希望权重如何逐渐减小。正态分布的权重在距离均值超过 3 * 标准差 的地方几乎为零。因此，在大多数情况下，一个好的标准差可能是列表长度的四分之一到六分之一之间：

标准差 = (1/4 .. 1/6) * 列表大小
方差 = 标准差^2

假设您希望权重为整数，您需要将正态分布的权重离散化。最简单的方法是指定最大权重（位于均值位置的元素的权重）。

就是这样。位置 i 处元素的权重为：

权重[i] = round(最大权重 * exp(-(i - 均值)^2 / (2 * 方差)))

英文:

If you just want to calculate the weights for a given list, then you need the following things:

The mean of the normal distribution
The variance of the normal distribution
A discretizer for the values

The first one is quite simple. You want the mean to be in the center of the list. Therefore (assuming zero-based indexing):

mean = (list.size - 1) / 2

The second is kind of arbitrary and depends on how steep you want your weights to fall off. Weights of the normal distribution are practically zero beyond a distance of 3 * standard_deviation from the mean. So a good standard deviation in most cases is probably something between a fourth and a sixth list length:

standard_deviation = (1/4 .. 1/6) * list.size
variance = standard_deviation^2

Assuming that you want integer weights, you need to discretize the weights from the normal distribution. The easiest way to do this is by specifying the maximum weight (of the element at the mean position).

That's it. The weight for an element at position i is then:

weight[i] = round(max_weight * exp(-(i - mean)^2 / (2 * variance)))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将平坦列表加权到正态分布中。

问题

答案1

通过SWIG从C++调用Go回调函数

在Golang中将PDF文件保存到SQL Server中

How to do text search in mgo?

如果值被设置且为假，可以使用Go模板。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。