计算非均匀域的移动平均值

huangapple go评论71阅读模式
英文:

Compute moving average with non-uniform domain

问题

https://stackoverflow.com/q/14313510/850781 讨论的情况是观测值是 等间距 的情况,即索引等同于整数范围。

在我的情况下,观测值是在任意时间点出现的,它们之间的时间间隔可以是任意浮点数。例如,

import pandas as pd
import numpy as np

df = pd.DataFrame({"y":np.random.uniform(size=100)}, index=np.random.uniform(size=100)).sort_index()

我想在 df 中添加一个名为 yavg 的列,其在给定索引值 x0 处的值为

sum(df.y[x]*f(x0-x) for x in df.index) / sum(f(x0-x) for x in df.index)

对于给定的函数 f,例如,

def f(x):
    return np.exp(-x*x)

如何以最小的工作量(最好是纯 numpy)来完成这个任务?

英文:

https://stackoverflow.com/q/14313510/850781 discusses the situation when the observations are equally spaced, i.e., the index is equivalent to an integer range.

In my case, the observations come at arbitrary times and the interval between them can be an arbitrary float. E.g.,

import pandas as pd
import numpy as np

df = pd.DataFrame({"y":np.random.uniform(size=100)}, index=np.random.uniform(size=100)).sort_index()

I want to add a column yavg to df whose value at a give index value x0 is

sum(df.y[x]*f(x0-x) for x in df.index) / sum(f(x0-x) for x in df.index)

for a given function f, e.g.,

def f(x):
    return np.exp(-x*x)

How do I do this with a minimal effort (preferably in pure numpy)?

答案1

得分: 1

以下是您要翻译的代码部分:

index_np_arr = df.index.values
weighted_sum = np.sum(df['y'].values[:, np.newaxis] * f(index_np_arr - index_np_arr[:, np.newaxis]), axis=0)
entire_sum = np.sum(f(index_np_arr[:, np.newaxis] - index_np_arr), axis=0)

df['yavg'] = pd.Series(weighted_sum / entire_sum, index=df.index)

完整代码:

import pandas as pd
import numpy as np

def f(x):
    return np.exp(-x*x)

df = pd.DataFrame({"y": np.random.uniform(size=100)}, index=np.random.uniform(size=100)).sort_index()

index_np_arr = df.index.values
weighted_sum = np.sum(df['y'].values[:, np.newaxis] * f(index_np_arr - index_np_arr[:, np.newaxis]), axis=0)
entire_sum = np.sum(f(index_np_arr[:, np.newaxis] - index_np_arr), axis=0)

df['yavg'] = pd.Series(weighted_sum / entire_sum, index=df.index)

请注意:此代码具有高内存使用量,因为您将创建一个形状为(n, n)的数组来使用矢量化函数计算总和,但可能比迭代所有x值的值更快。

英文:

I think you can do something like this:

index_np_arr = df.index.values
weighted_sum = np.sum(df['y'].values[:, np.newaxis] * f(index_np_arr - index_np_arr[:, np.newaxis]), axis=0)
entire_sum = np.sum(f(index_np_arr[:, np.newaxis] - index_np_arr), axis=0)


df['yavg'] = pd.Series(weighted_sum/entire_sum, index=df.index)

Basically:

  • index_np_arr is the np.array of all possible x0 values;
  • entire_sum would, get the denominator for all values in the index by repeating the vector n times, where n is the number of indexes and then subtracting for each x0. Finally it would sum it all up;
  • weighted_sum would do almost the same thing except before we sum we would multiply by the y vector.

Complete code:

import pandas as pd
import numpy as np

def f(x):
    return np.exp(-x*x)

df = pd.DataFrame({"y":np.random.uniform(size=100)}, index=np.random.uniform(size=100)).sort_index()

index_np_arr = df.index.values
weighted_sum = np.sum(df['y'].values[:, np.newaxis] * f(index_np_arr - index_np_arr[:, np.newaxis]), axis=0)
entire_sum = np.sum(f(index_np_arr[:, np.newaxis] - index_np_arr), axis=0)


df['yavg'] = pd.Series(weighted_sum/entire_sum, index=df.index)

Note: This code does have a high memory usage because you will create an array of shape (n, n) for computing the sums using vectorized functions, but is probably faster than iterating over all values of x.

huangapple
  • 本文由 发表于 2023年4月11日 06:43:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75981262.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定