numpy polyfit 替代品以提高速度

huangapple go评论53阅读模式
英文:

numpy polyfit alternative for speed

问题

我正在多次使用numpy的polyfit进行计算,以获取两个数据集之间的斜率。然而,它执行这些计算的速度不够快,不符合期望。

关于这些计算有两点需要注意:

1)在调用numpy.polyfit(x,y,n)时,x的值将始终是相同的斜率值,而

2)n的值为1。因此,这是一个线性回归。

我知道有许多不同的替代方法,包括numpy.polynomial.polynomial.polyfit(x,y,n),但它们似乎提供了相同的慢速性能。我尝试过使用np.linalg,但运行不正常。因此,我想知道是否有什么替代方法可以加速计算?

英文:

I am utilizing numpy's polyfit a numerous amount of times to run calculations and get the slope between two datasets. However, the speed at which it performs these calculations are not fast enough for what would be desired.

Two things to note about the calculations:

  1. The value for x in the call numpy.polyfit(x,y,n) will always be the same slope value, and

  2. The value for n = 1. So it is a linear regression.

I know there are many different alternatives, including numpy.polynomial.polynomial.polyfit(x,y,n), but they seem to provide the same slow speed performance. I have had little luck getting np.linalg to work properly. Therefore, I am wondering what might be an alternative to speed up calculations?

答案1

得分: 3

以下是翻译好的内容:

"Using numpy.linalg.lstsq, this could look like:"(使用numpy.linalg.lstsq函数,代码如下:)

import numpy as np

def lstsq(x, y):
    a = np.stack((x, np.ones_like(x)), axis=1)
    return np.linalg.lstsq(a, y)[0]

"This offers a slight speed improvement over polyfit."(这比polyfit稍微快一点。)

"To obtain a significant speed increase (at the expense of numerical stability - for a summary of methods see Numerical methods for linear least squares) you can instead solve the normal equations:"(要获得显著的速度提升(代价是数值稳定性 - 有关方法的摘要,请参阅最小二乘法的数值方法),您可以解正规方程:)

def normal(x, y):
    a = np.stack((x, np.ones_like(x)), axis=1)
    aT = a.T
    return np.linalg.solve(aT@a, aT@y)

"As you say that x is constant, you can precompute a.T@a providing a further speed increase:"(正如您所说,x是常数,您可以预先计算a.T@a,从而进一步提高速度:)

def normal2(aT, aTa,  y):
    return np.linalg.solve(aTa, aT@y)

"Make up some test data and time:"(生成一些测试数据并计时:)

rng = np.random.default_rng()
N = 1000

x = rng.random(N)
y = rng.random(N)

a = np.stack((x, np.ones_like(x)), axis=1)
aT = a.T
aTa = aT@a

assert np.allclose(lstsq(x, y), np.polyfit(x, y, 1))
assert np allclose(normal(x, y), np.polyfit(x, y, 1))
assert np.allclose(normal2(aT, aTa, y), np.polyfit(x, y, 1))
%timeit np.polyfit(x, y, 1) 
%timeit lstsq(x, y)
%timeit normal(x, y)
%timeit normal2(aT, aTa, y)

"Output:"(输出:)

256 µs ± 270 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
220 µs ± 1.87 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
20.2 µs ± 32.3 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
6.54 µs ± 13.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
英文:

As others have commented, this can be done using linear least squares.
Using numpy.linalg.lstsq, this could look like:

import numpy as np

def lstsq(x, y):
    a = np.stack((x, np.ones_like(x)), axis=1)
    return np.linalg.lstsq(a, y)[0]

This offers a slight speed improvement over polyfit. To obtain a significant speed increase (at the expense of numerical stability - for a summary of methods see Numerical methods for linear least squares) you can instead solve the normal equations:

def normal(x, y):
    a = np.stack((x, np.ones_like(x)), axis=1)
    aT = a.T
    return np.linalg.solve(aT@a, aT@y)

As you say that x is constant, you can precompute a.T@a providing a further speed increase:

def normal2(aT, aTa,  y):
    return np.linalg.solve(aTa, aT@y)

Make up some test data and time:

rng = np.random.default_rng()
N = 1000

x = rng.random(N)
y = rng.random(N)

a = np.stack((x, np.ones_like(x)), axis=1)
aT = a.T
aTa = aT@a

assert np.allclose(lstsq(x, y), np.polyfit(x, y, 1))
assert np.allclose(normal(x, y), np.polyfit(x, y, 1))
assert np.allclose(normal2(aT, aTa, y), np.polyfit(x, y, 1))
%timeit np.polyfit(x, y, 1) 
%timeit lstsq(x, y)
%timeit normal(x, y)
%timeit normal2(aT, aTa, y)

Output:

256 µs ± 270 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
220 µs ± 1.87 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
20.2 µs ± 32.3 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
6.54 µs ± 13.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

huangapple
  • 本文由 发表于 2023年2月7日 04:41:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75366367.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定