2023年5月11日 20:04:02go评论67阅读模式

英文:

Python: Speed-up Code for mixed Distribution

问题

我已翻译代码部分：

我有以下函数以返回一个“混合分布”：

# M=[float]*k = 分布的中心值，
# S=[float]*k = 标准差，
# P=[float]*k = 每个值的概率（总和为1.0）
# rng = 随机数生成器
# n = 返回数组的长度 [float]*n
# 返回 [float]*n

def mixed_normal(rng, n, M, S, P):
    # 参见 https://en.wikipedia.org/wiki/Mixture_model
    idx = np.random.choice(len(M), p=P, replace=True, size=n)
    return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)

这个函数可以这样调用：

rng = np.random.default_rng()
def mixed_normal_3(rng, n):
    data = [(-5, 0, 5), (1, 1, 1), (1/3, 1/3, 1/3)]
    return mixed_normal(rng, n, *data)

使用 `n=10**6` 调用：

print(mixed_normal_3(rng, 10**6))

这段代码用于生成混合正态分布，并包含了性能改进的讨论。

英文:

i have the following function to return a "mixed distribution":

#M=[float]*k = Center Value of distribution,
#S=[float]*k = Standard deviations,
#P=[float]*k = probability for each value (Sum is 1.0)
#rng = Random number generator
#n = len of return array [float]*n
#return [float]*n

def mixed_normal(rng, n, M, S, P):
    #See https://en.wikipedia.org/wiki/Mixture_model
    idx = np.random.choice(len(M), p=P, replace=True, size=n)
    return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)

which is called like:

rng = np.random.default_rng()
def mixed_normal_3(rng, n):
    data = [(-5, 0, 5), (1, 1, 1), (1/3, 1/3, 1/3)]
    return mixed_normal(rng, n, *data)

with n=10**6.

But, the implementation is too slow! Currently it takes around 350s on my machine. I need to get it down to approx 30s.

I consider changing

return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)

from a "for-loop" to a "single numpy-call".

But, i can not come up with a working solution!

Minimal working example

import numpy as np

rng = np.random.default_rng()

def mixed_normal(rng, n, M, S, P):
    #See https://en.wikipedia.org/wiki/Mixture_model
    idx = np.random.choice(len(M), p=P, replace=True, size=n) 
    # Needs to be optimized
    return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)		

def mixed_normal_3(rng, n):
    data = [(-5, 0, 5), (1, 1, 1), (1/3, 1/3, 1/3)]
    return mixed_normal(rng, n, *data)
	
# [float]*(10**6) expected	
print( mixed_normal_3( rng , 10**6 ) );

答案1

得分: 1

我通过'预切片'列表M和S来修复它：

def mixed_normal(rng, n, M, S, P):
    idx = np.array(np.random.choice(len(M), p=P, replace=True, size=n))    
    return rng.normal(np.array(M)[idx.astype(int)],np.array(S)[idx.astype(int)],n);

列表M和S会根据在idx中生成的随机索引扩展到大小n：

M = [0,1,2]
idx = [0,0,1,1,0,0,1,1,2]
M[idx] = [0,0,1,1,0,0,1,1,2]

然后，这些'扩展'列表传递给RNG。

这将执行时间从350秒提高到40秒，适用于我的测试用例。

英文:

I fixed it by 'pre-slicing' the lists, M and S:

def mixed_normal(rng, n, M, S, P):
    idx = np.array(np.random.choice(len(M), p=P, replace=True, size=n))    
    return rng.normal(np.array(M)[idx.astype(int)],np.array(S)[idx.astype(int)],n);

The lists, M and S, are expanded to size n by taking all elements according to the random indices generated in idx – where idx has size n:

M = [0,1,2]
idx = [0,0,1,1,0,0,1,1,2]
M[idx] = [0,0,1,1,0,0,1,1,2]

These 'expanded' lists are then passed to the RNG.

This improved execution time from 350s down to 40s for my test-cases.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python：加速混合分布的代码

问题

答案1

String indices must be integers error – 字符串索引必须是整数错误

从数据框递归映射到Python字典

计算百分比变化，通过增加窗口大小直到期间。

如何高效地构建一个用于从命令行分阶段运行的Python脚本？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论