2023年2月24日 08:45:26go评论67阅读模式

英文:

How to Algorithmically Generate a List of Probabilities

问题

请原谅我对统计学术语的不熟悉。

我已经得到一个任意的值列表要进行抽样，目前如下所示：
list_to_sample = [1, 2, 3, 4, 5]。
在这一点上，列表包含的内容并不重要，但列表的长度为5。

而且，我已经得到一个几乎任意的“帕累托式”概率列表，目前如下所示：
probability_list = [0.5, 0.3, 0.1, 0.05, 0.05]
（帕累托式因为它不遵循80-20规则，而是80/40，即前80%的概率选定值将在列表的前40%中。）

现在，我正在尝试将这个通用化，以便如果list_to_sample变得更长，比如：
[1, 2, 3, 4, 5, 6, 7, 8]
我可以扩展probability_list并保持相同的曲线。

我正在尝试使用np.pareto.pdf来生成类似于以下的概率列表：
[0.5, 0.3, 0.1, 0.05, 0.05]
并且列表的总和（概率的总和）等于1。

具体来说，我尝试过这样做：

import numpy as np

list_to_sample = [1, 2, 3, 4, 5]
output = np.array([pareto.pdf(x=list_to_sample, b=1, loc=0, scale=1)])

输出：

[[0.5        0.125      0.05555556 0.03125    0.02      ]]

我尝试改变参数，但没有成功。我希望通过改变参数可以使帕累托分布产生所期望的结果。到目前为止，没有成功。

也许有更好的函数来生成（或扩展）概率列表。

英文:

Please forgive my lack of statistical nomenclature.

I've been given an arbitrary list of values to sample, currently:
list_to_sample = [1, 2, 3, 4, 5].
At this point, it doesn't matter what the list contains, but that the length of list is 5.

And, I've been given a list of almost arbitrary "pareto-like" probabilities, currently:
probability_list = [0.5, 0.3, 0.1, 0.05, 0.05]
(pareto-like as it does not follow the 80-20, but rather 80/40 as the top 80% of probable selected values will be in the top 40% of the list.

I am now trying to generalize this, so that if list_to_sample gets longer, like:
[1, 2, 3, 4, 5, 6, 7, 8]
I can extend the probability_list and maintain the same curve.

I am trying to use np.pareto.pdf to produce a list of probabilities that is similar to:
[0.5, 0.3, 0.1, 0.05, 0.05]
and where the sum of the list (the sum of the probabilities) equals 1.

Specifically, I have tried this:

import numpy as np

list_to_sample = [1, 2, 3, 4, 5]
output = np.array([pareto.pdf(x=list_to_sample, b=1, loc=0, scale=1)])

Output:

[[0.5        0.125      0.05555556 0.03125    0.02      ]]

I have tried changing parameters to no avail. I was hopeful that by changing parameters I could get pareto to produce the desired result. So far, no luck.

Perhaps there is a better function to produce (or extend) a list of probabilities.

答案1

得分: 0

不需要使用帕累托分布吗？如果需要，我认为这个问题没有很好定义，因为list_sample中的项目会很重要，从你的问题中我看不出如何定义帕累托分布的所有参数。

如果你可以使用其他技巧，我建议使用简单的插值，例如三次样条插值。由于你说列表中的值并不重要，我们可以使用百分比值来处理。

import numpy as np
import scipy as sp

list_to_sample = [1, 2, 3, 4, 5]
probability_list = [0.5, 0.3, 0.1, 0.05, 0.05]

# --- 在开头添加零，以确保将零映射到零

x = np.array([0] + list_to_sample) / len(list_to_sample)
y = np.array([0] + probability_list).cumsum()

print("x:", x)  # -- [0.0  0.2  0.40  0.60  0.80  1.0]
print("y:", y)  # -- [0.0  0.5  0.80  0.90  0.95  1.0]

# - 样条

spline = sp.interpolate.CubicSpline(x, y)

new_values = np.arange(1, 11)
cprobs = spline(new_values / len(new_values))

print("新数值:", new_values)
print("累积概率:", cprobs)

# -- 前40%的总概率仍然是80%，
# -- 下面的输出已经四舍五入

# -- [   1    2    3    4    5    6    7    8    9   10]
# -- [0.27 0.50 0.68 0.80 0.87 0.90 0.93 0.95 0.97 1.00]

# - 要获取每个值的概率，只需对cprobs进行差分

probs = np.diff([0] + list(cprobs))
print("概率:", probs)

# -- [0.272 0.228 0.178 0.122 0.067 0.034 0.026 0.024 0.024 0.026]

希望这对你有所帮助。

英文:

Do you need to use the Pareto distribution? If so, I don´t think this problem is well-defined as the items in list_sample will matter and I don´t see from your question how you can define all the parameters of the Pareto distribution.

If you can use other techniques, I would go with a simple interpolation, for example the cubic spline. Since you said the values in the list don´t matter, we can work with the percentage values instead.

import numpy as np
import scipy as sp

list_to_sample = [1, 2, 3, 4, 5]
probability_list = [0.5, 0.3, 0.1, 0.05, 0.05]

# --- adding zero at the beginning to ensure the we map zero to zero

x = np.array([0] + list_to_sample) / len(list_to_sample)
y = np.array([0] + probability_list).cumsum()

print(&quot;x:&quot;, x)  # -- [0.0  0.2  0.40  0.60  0.80  1.0]
print(&quot;y:&quot;, y)  # -- [0.0  0.5  0.80  0.90  0.95  1.0]

# - spline

spline = sp.interpolate.CubicSpline(x, y)

new_values = np.arange(1, 11)
cprobs = spline(new_values / len(new_values))

print(&quot;New values:&quot;, new_values)
print(&quot;Cumulative probabilities:&quot;, cprobs)

# -- the top 40% still has an overall 80% probability,
# -- the output below is rounded

# -- [   1    2    3    4    5    6    7    8    9   10]
# -- [0.27 0.50 0.68 0.80 0.87 0.90 0.93 0.95 0.97 1.00]


# - to get the probability for each value we just diff cprobs

probs = np.diff([0] + list(cprobs))
print(&quot;Probabilities:&quot;, probs)

# -- [0.272 0.228 0.178 0.122 0.067 0.034 0.026 0.024 0.024 0.026]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何算法生成概率列表

问题

答案1

有没有设置默认绘图样式的方法？

如何在FastAPI中正确路由子页面？

从项目根目录获取文件路径。

如何使用Pandas删除输出中不需要的零。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论