Python – Numpy或者Pandas(也可以)广播

huangapple go评论65阅读模式
英文:

Python - Numpy or Pandas(also possible) broadcasting

问题

I understand your request. Here is the translated content:

我有一个充满数字的numpy二维数组也可以使用pandas DataFrame),我需要在其中的一个列中创建/替换最后n行的数字为该列最后n行的平均值我的numpy数组形状很大类似于[10000:10000]

# 例子(形状有限,仅供说明):
# Numpy数组:

    [[10, 30, 8, 1],
     [11, 5, 19, 12],
     [12, 18, 15, 6],
     [13, 10, 21, 9],
     [14, 67, 14, 2],
     [15, 13, 12, 6]]

# 平均数为n = 3
因此代码应该在迭代中取最后3个数字并计算平均值

# Numpy数组:

    [[12.5, 23.5, 14.83333333, 5.833333333],
     [12, 10.33333333, 18.33333333, 9],
     [13, 31.66666667, 16.66666667, 5.666666667],
     [14, 30, 15.66666667, 5.333333333]]

# 解释:
- 14是数字151413的平均值
- 18.33333333是数字211519的平均值
- 9是数字9612的平均值

结果应该是函数获取列维度中的最后n个值并计算其平均值

我能够通过两个for循环和标准的Python代码实现但是这需要很长时间
英文:

I have numpy array-2D (pandas DatarFame can be also used) full of numbers and I need to create / replace those numbers with mean for last n rows in one column. I have huge numpy array.shape like [10000:10000]

Example (limited shape just for explanation):

Numpy Array:

[[10, 30, 8, 1],
 [11, 5, 19, 12],
 [12, 18, 15, 6],
 [13, 10, 21, 9],
 [14, 67, 14, 2],
 [15, 13, 12, 6]]

Average by n = 3

So the code should take last 3 numbers in iteration and crate average

Numpy Array:

[[12.5, 23.5, 14.83333333, 5.833333333],
 [12, 10.33333333, 18.33333333, 9],
 [13, 31.66666667, 16.66666667, 5.666666667],
 [14, 30, 15.66666667, 5.333333333]]

Explanation:

  • 14 is average of numbers 15,14,13
  • 18.33333333 is average of numbers 21, 15, 19
  • 9 is average of numbers 9, 6, 12

Result should be that function takes n-last values in column dimension and make average of it.

I was able to do it through 2 for loops and standard python code, but it takes a lot of time.

答案1

得分: 1

你不需要循环遍历你的数据。使用Pandas,你可以使用rolling_mean来实现:

import pandas as pd
import numpy as np

arr = np.array([[10, 30,  8,  1],
                [11,  5, 19, 12],
                [12, 18, 15,  6],
                [13, 10, 21,  9],
                [14, 67, 14,  2],
                [15, 13, 12,  6]])

n = 3
df = pd.DataFrame(arr)
out = df.rolling(n).mean().iloc[n-1:]
print(out)

# 输出
      0          1          2         3
2  11.0  17.666667  14.000000  6.333333
3  12.0  11.000000  18.333333  9.000000
4  13.0  31.666667  16.666667  5.666667
5  14.0  30.000000  15.666667  5.666667

只使用numpy,你可以这样做:

# 从 https://stackoverflow.com/q/14313510/15239951 适配而来
out = np.cumsum(arr, axis=0)
out[n:] -= out[:-n]
out = out[n-1:] / n
print(out)

# 输出
array([[11.        , 17.66666667, 14.        ,  6.33333333],
       [12.        , 11.        , 18.33333333,  9.        ],
       [13.        , 31.66666667, 16.66666667,  5.66666667],
       [14.        , 30.        , 15.66666667,  5.66666667]])
英文:

You don't need to loop over your data. With Pandas, you can do a rolling_mean:

import pandas as pd
import numpy as np

arr = np.array([[10, 30,  8,  1],
                [11,  5, 19, 12],
                [12, 18, 15,  6],
                [13, 10, 21,  9],
                [14, 67, 14,  2],
                [15, 13, 12,  6]])

n = 3
df = pd.DataFrame(arr)
out = df.rolling(n).mean().iloc[n-1:]
print(out)

# Output
      0          1          2         3
2  11.0  17.666667  14.000000  6.333333
3  12.0  11.000000  18.333333  9.000000
4  13.0  31.666667  16.666667  5.666667
5  14.0  30.000000  15.666667  5.666667

With numpy only, you can do:

# Adapted from https://stackoverflow.com/q/14313510/15239951
out = np.cumsum(arr, axis=0)
out[n:] -= out[:-n]
out = out[n-1:] / n
print(out)

# Output
array([[11.        , 17.66666667, 14.        ,  6.33333333],
       [12.        , 11.        , 18.33333333,  9.        ],
       [13.        , 31.66666667, 16.66666667,  5.66666667],
       [14.        , 30.        , 15.66666667,  5.66666667]])

答案2

得分: 0

我也通过循环解决了这个问题,你可以与你的解决方案进行比较。

def process(data,n):
    for i in range(n):
        data[:][i]=np.mean(data[:][i:i+n],axis=0)
    return data[:][:-n+1]
英文:

I also solve this problem by loops and u can compare with yours.

def process(data,n):
    for i in range(n):
        data[:][i]=np.mean(data[:][i:i+n],axis=0)
    return data[:][:-n+1]

huangapple
  • 本文由 发表于 2023年5月10日 21:28:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76219037.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定