英文:
Python: Speed-up Code for mixed Distribution
问题
我已翻译代码部分:
我有以下函数以返回一个“混合分布”:
# M=[float]*k = 分布的中心值,
# S=[float]*k = 标准差,
# P=[float]*k = 每个值的概率(总和为1.0)
# rng = 随机数生成器
# n = 返回数组的长度 [float]*n
# 返回 [float]*n
def mixed_normal(rng, n, M, S, P):
# 参见 https://en.wikipedia.org/wiki/Mixture_model
idx = np.random.choice(len(M), p=P, replace=True, size=n)
return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)
这个函数可以这样调用:
rng = np.random.default_rng()
def mixed_normal_3(rng, n):
data = [(-5, 0, 5), (1, 1, 1), (1/3, 1/3, 1/3)]
return mixed_normal(rng, n, *data)
使用 `n=10**6` 调用:
print(mixed_normal_3(rng, 10**6))
这段代码用于生成混合正态分布,并包含了性能改进的讨论。
英文:
i have the following function to return a "mixed distribution":
#M=[float]*k = Center Value of distribution,
#S=[float]*k = Standard deviations,
#P=[float]*k = probability for each value (Sum is 1.0)
#rng = Random number generator
#n = len of return array [float]*n
#return [float]*n
def mixed_normal(rng, n, M, S, P):
#See https://en.wikipedia.org/wiki/Mixture_model
idx = np.random.choice(len(M), p=P, replace=True, size=n)
return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)
which is called like:
rng = np.random.default_rng()
def mixed_normal_3(rng, n):
data = [(-5, 0, 5), (1, 1, 1), (1/3, 1/3, 1/3)]
return mixed_normal(rng, n, *data)
with n=10**6
.
But, the implementation is too slow! Currently it takes around 350s on my machine. I need to get it down to approx 30s.
I consider changing
return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)
from a "for-loop" to a "single numpy-call".
But, i can not come up with a working solution!
Minimal working example
import numpy as np
rng = np.random.default_rng()
def mixed_normal(rng, n, M, S, P):
#See https://en.wikipedia.org/wiki/Mixture_model
idx = np.random.choice(len(M), p=P, replace=True, size=n)
# Needs to be optimized
return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)
def mixed_normal_3(rng, n):
data = [(-5, 0, 5), (1, 1, 1), (1/3, 1/3, 1/3)]
return mixed_normal(rng, n, *data)
# [float]*(10**6) expected
print( mixed_normal_3( rng , 10**6 ) );
答案1
得分: 1
我通过'预切片'列表M
和S
来修复它:
def mixed_normal(rng, n, M, S, P):
idx = np.array(np.random.choice(len(M), p=P, replace=True, size=n))
return rng.normal(np.array(M)[idx.astype(int)],np.array(S)[idx.astype(int)],n);
列表M
和S
会根据在idx
中生成的随机索引扩展到大小n
:
M = [0,1,2]
idx = [0,0,1,1,0,0,1,1,2]
M[idx] = [0,0,1,1,0,0,1,1,2]
然后,这些'扩展'列表传递给RNG。
这将执行时间从350秒提高到40秒,适用于我的测试用例。
英文:
I fixed it by 'pre-slicing' the lists, M
and S
:
def mixed_normal(rng, n, M, S, P):
idx = np.array(np.random.choice(len(M), p=P, replace=True, size=n))
return rng.normal(np.array(M)[idx.astype(int)],np.array(S)[idx.astype(int)],n);
The lists, M
and S
, are expanded to size n
by taking all elements according to the random indices generated in idx
– where idx
has size n
:
M = [0,1,2]
idx = [0,0,1,1,0,0,1,1,2]
M[idx] = [0,0,1,1,0,0,1,1,2]
These 'expanded' lists are then passed to the RNG.
This improved execution time from 350s down to 40s for my test-cases.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论