英文:
Python - Numpy or Pandas(also possible) broadcasting
问题
I understand your request. Here is the translated content:
我有一个充满数字的numpy二维数组(也可以使用pandas DataFrame),我需要在其中的一个列中创建/替换最后n行的数字为该列最后n行的平均值。我的numpy数组形状很大,类似于[10000:10000]。
# 例子(形状有限,仅供说明):
# Numpy数组:
[[10, 30, 8, 1],
[11, 5, 19, 12],
[12, 18, 15, 6],
[13, 10, 21, 9],
[14, 67, 14, 2],
[15, 13, 12, 6]]
# 平均数为n = 3
因此,代码应该在迭代中取最后3个数字并计算平均值。
# Numpy数组:
[[12.5, 23.5, 14.83333333, 5.833333333],
[12, 10.33333333, 18.33333333, 9],
[13, 31.66666667, 16.66666667, 5.666666667],
[14, 30, 15.66666667, 5.333333333]]
# 解释:
- 14是数字15、14、13的平均值
- 18.33333333是数字21、15、19的平均值
- 9是数字9、6、12的平均值
结果应该是函数获取列维度中的最后n个值,并计算其平均值。
我能够通过两个for循环和标准的Python代码实现,但是这需要很长时间。
英文:
I have numpy array-2D (pandas DatarFame can be also used) full of numbers and I need to create / replace those numbers with mean for last n rows in one column. I have huge numpy array.shape like [10000:10000]
Example (limited shape just for explanation):
Numpy Array:
[[10, 30, 8, 1],
[11, 5, 19, 12],
[12, 18, 15, 6],
[13, 10, 21, 9],
[14, 67, 14, 2],
[15, 13, 12, 6]]
Average by n = 3
So the code should take last 3 numbers in iteration and crate average
Numpy Array:
[[12.5, 23.5, 14.83333333, 5.833333333],
[12, 10.33333333, 18.33333333, 9],
[13, 31.66666667, 16.66666667, 5.666666667],
[14, 30, 15.66666667, 5.333333333]]
Explanation:
- 14 is average of numbers 15,14,13
- 18.33333333 is average of numbers 21, 15, 19
- 9 is average of numbers 9, 6, 12
Result should be that function takes n-last values in column dimension and make average of it.
I was able to do it through 2 for loops and standard python code, but it takes a lot of time.
答案1
得分: 1
你不需要循环遍历你的数据。使用Pandas,你可以使用rolling_mean
来实现:
import pandas as pd
import numpy as np
arr = np.array([[10, 30, 8, 1],
[11, 5, 19, 12],
[12, 18, 15, 6],
[13, 10, 21, 9],
[14, 67, 14, 2],
[15, 13, 12, 6]])
n = 3
df = pd.DataFrame(arr)
out = df.rolling(n).mean().iloc[n-1:]
print(out)
# 输出
0 1 2 3
2 11.0 17.666667 14.000000 6.333333
3 12.0 11.000000 18.333333 9.000000
4 13.0 31.666667 16.666667 5.666667
5 14.0 30.000000 15.666667 5.666667
只使用numpy,你可以这样做:
# 从 https://stackoverflow.com/q/14313510/15239951 适配而来
out = np.cumsum(arr, axis=0)
out[n:] -= out[:-n]
out = out[n-1:] / n
print(out)
# 输出
array([[11. , 17.66666667, 14. , 6.33333333],
[12. , 11. , 18.33333333, 9. ],
[13. , 31.66666667, 16.66666667, 5.66666667],
[14. , 30. , 15.66666667, 5.66666667]])
英文:
You don't need to loop over your data. With Pandas, you can do a rolling_mean
:
import pandas as pd
import numpy as np
arr = np.array([[10, 30, 8, 1],
[11, 5, 19, 12],
[12, 18, 15, 6],
[13, 10, 21, 9],
[14, 67, 14, 2],
[15, 13, 12, 6]])
n = 3
df = pd.DataFrame(arr)
out = df.rolling(n).mean().iloc[n-1:]
print(out)
# Output
0 1 2 3
2 11.0 17.666667 14.000000 6.333333
3 12.0 11.000000 18.333333 9.000000
4 13.0 31.666667 16.666667 5.666667
5 14.0 30.000000 15.666667 5.666667
With numpy only, you can do:
# Adapted from https://stackoverflow.com/q/14313510/15239951
out = np.cumsum(arr, axis=0)
out[n:] -= out[:-n]
out = out[n-1:] / n
print(out)
# Output
array([[11. , 17.66666667, 14. , 6.33333333],
[12. , 11. , 18.33333333, 9. ],
[13. , 31.66666667, 16.66666667, 5.66666667],
[14. , 30. , 15.66666667, 5.66666667]])
答案2
得分: 0
我也通过循环解决了这个问题,你可以与你的解决方案进行比较。
def process(data,n):
for i in range(n):
data[:][i]=np.mean(data[:][i:i+n],axis=0)
return data[:][:-n+1]
英文:
I also solve this problem by loops and u can compare with yours.
def process(data,n):
for i in range(n):
data[:][i]=np.mean(data[:][i:i+n],axis=0)
return data[:][:-n+1]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论