英文:
Compute moving average with non-uniform domain
问题
https://stackoverflow.com/q/14313510/850781 讨论的情况是观测值是 等间距 的情况,即索引等同于整数范围。
在我的情况下,观测值是在任意时间点出现的,它们之间的时间间隔可以是任意浮点数。例如,
import pandas as pd
import numpy as np
df = pd.DataFrame({"y":np.random.uniform(size=100)}, index=np.random.uniform(size=100)).sort_index()
我想在 df
中添加一个名为 yavg
的列,其在给定索引值 x0
处的值为
sum(df.y[x]*f(x0-x) for x in df.index) / sum(f(x0-x) for x in df.index)
对于给定的函数 f
,例如,
def f(x):
return np.exp(-x*x)
如何以最小的工作量(最好是纯 numpy
)来完成这个任务?
英文:
https://stackoverflow.com/q/14313510/850781 discusses the situation when the observations are equally spaced, i.e., the index is equivalent to an integer range.
In my case, the observations come at arbitrary times and the interval between them can be an arbitrary float. E.g.,
import pandas as pd
import numpy as np
df = pd.DataFrame({"y":np.random.uniform(size=100)}, index=np.random.uniform(size=100)).sort_index()
I want to add a column yavg
to df
whose value at a give index value x0
is
sum(df.y[x]*f(x0-x) for x in df.index) / sum(f(x0-x) for x in df.index)
for a given function f
, e.g.,
def f(x):
return np.exp(-x*x)
How do I do this with a minimal effort (preferably in pure numpy
)?
答案1
得分: 1
以下是您要翻译的代码部分:
index_np_arr = df.index.values
weighted_sum = np.sum(df['y'].values[:, np.newaxis] * f(index_np_arr - index_np_arr[:, np.newaxis]), axis=0)
entire_sum = np.sum(f(index_np_arr[:, np.newaxis] - index_np_arr), axis=0)
df['yavg'] = pd.Series(weighted_sum / entire_sum, index=df.index)
完整代码:
import pandas as pd
import numpy as np
def f(x):
return np.exp(-x*x)
df = pd.DataFrame({"y": np.random.uniform(size=100)}, index=np.random.uniform(size=100)).sort_index()
index_np_arr = df.index.values
weighted_sum = np.sum(df['y'].values[:, np.newaxis] * f(index_np_arr - index_np_arr[:, np.newaxis]), axis=0)
entire_sum = np.sum(f(index_np_arr[:, np.newaxis] - index_np_arr), axis=0)
df['yavg'] = pd.Series(weighted_sum / entire_sum, index=df.index)
请注意:此代码具有高内存使用量,因为您将创建一个形状为(n, n)
的数组来使用矢量化函数计算总和,但可能比迭代所有x
值的值更快。
英文:
I think you can do something like this:
index_np_arr = df.index.values
weighted_sum = np.sum(df['y'].values[:, np.newaxis] * f(index_np_arr - index_np_arr[:, np.newaxis]), axis=0)
entire_sum = np.sum(f(index_np_arr[:, np.newaxis] - index_np_arr), axis=0)
df['yavg'] = pd.Series(weighted_sum/entire_sum, index=df.index)
Basically:
index_np_arr
is thenp.array
of all possiblex0
values;entire_sum
would, get the denominator for all values in the index by repeating the vector n times, where n is the number of indexes and then subtracting for eachx0
. Finally it would sum it all up;weighted_sum
would do almost the same thing except before we sum we would multiply by the y vector.
Complete code:
import pandas as pd
import numpy as np
def f(x):
return np.exp(-x*x)
df = pd.DataFrame({"y":np.random.uniform(size=100)}, index=np.random.uniform(size=100)).sort_index()
index_np_arr = df.index.values
weighted_sum = np.sum(df['y'].values[:, np.newaxis] * f(index_np_arr - index_np_arr[:, np.newaxis]), axis=0)
entire_sum = np.sum(f(index_np_arr[:, np.newaxis] - index_np_arr), axis=0)
df['yavg'] = pd.Series(weighted_sum/entire_sum, index=df.index)
Note: This code does have a high memory usage because you will create an array of shape (n, n)
for computing the sums using vectorized functions, but is probably faster than iterating over all values of x
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论