Python:加速混合分布的代码

huangapple go评论60阅读模式
英文:

Python: Speed-up Code for mixed Distribution

问题

我已翻译代码部分:

我有以下函数以返回一个混合分布”:

# M=[float]*k = 分布的中心值,
# S=[float]*k = 标准差,
# P=[float]*k = 每个值的概率(总和为1.0)
# rng = 随机数生成器
# n = 返回数组的长度 [float]*n
# 返回 [float]*n

def mixed_normal(rng, n, M, S, P):
    # 参见 https://en.wikipedia.org/wiki/Mixture_model
    idx = np.random.choice(len(M), p=P, replace=True, size=n)
    return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)

这个函数可以这样调用

rng = np.random.default_rng()
def mixed_normal_3(rng, n):
    data = [(-5, 0, 5), (1, 1, 1), (1/3, 1/3, 1/3)]
    return mixed_normal(rng, n, *data)

使用 `n=10**6` 调用

print(mixed_normal_3(rng, 10**6))

这段代码用于生成混合正态分布,并包含了性能改进的讨论。

英文:

i have the following function to return a "mixed distribution":

#M=[float]*k = Center Value of distribution,
#S=[float]*k = Standard deviations,
#P=[float]*k = probability for each value (Sum is 1.0)
#rng = Random number generator
#n = len of return array [float]*n
#return [float]*n

def mixed_normal(rng, n, M, S, P):
    #See https://en.wikipedia.org/wiki/Mixture_model
    idx = np.random.choice(len(M), p=P, replace=True, size=n)
    return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)

which is called like:

rng = np.random.default_rng()
def mixed_normal_3(rng, n):
    data = [(-5, 0, 5), (1, 1, 1), (1/3, 1/3, 1/3)]
    return mixed_normal(rng, n, *data)

with n=10**6.

But, the implementation is too slow! Currently it takes around 350s on my machine. I need to get it down to approx 30s.

I consider changing

return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)

from a "for-loop" to a "single numpy-call".

But, i can not come up with a working solution!

Minimal working example

import numpy as np

rng = np.random.default_rng()

def mixed_normal(rng, n, M, S, P):
    #See https://en.wikipedia.org/wiki/Mixture_model
    idx = np.random.choice(len(M), p=P, replace=True, size=n) 
    # Needs to be optimized
    return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)		

def mixed_normal_3(rng, n):
    data = [(-5, 0, 5), (1, 1, 1), (1/3, 1/3, 1/3)]
    return mixed_normal(rng, n, *data)
	
# [float]*(10**6) expected	
print( mixed_normal_3( rng , 10**6 ) );

答案1

得分: 1

我通过'预切片'列表MS来修复它:

def mixed_normal(rng, n, M, S, P):
    idx = np.array(np.random.choice(len(M), p=P, replace=True, size=n))    
    return rng.normal(np.array(M)[idx.astype(int)],np.array(S)[idx.astype(int)],n);

列表MS会根据在idx中生成的随机索引扩展到大小n

M = [0,1,2]
idx = [0,0,1,1,0,0,1,1,2]
M[idx] = [0,0,1,1,0,0,1,1,2]

然后,这些'扩展'列表传递给RNG。

这将执行时间从350秒提高到40秒,适用于我的测试用例。

英文:

I fixed it by 'pre-slicing' the lists, M and S:

def mixed_normal(rng, n, M, S, P):
    idx = np.array(np.random.choice(len(M), p=P, replace=True, size=n))    
    return rng.normal(np.array(M)[idx.astype(int)],np.array(S)[idx.astype(int)],n);

The lists, M and S, are expanded to size n by taking all elements according to the random indices generated in idx – where idx has size n:

M = [0,1,2]
idx = [0,0,1,1,0,0,1,1,2]
M[idx] = [0,0,1,1,0,0,1,1,2]

These 'expanded' lists are then passed to the RNG.

This improved execution time from 350s down to 40s for my test-cases.

huangapple
  • 本文由 发表于 2023年5月11日 20:04:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76227461.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定