2023年5月10日 21:28:24go评论70阅读模式

英文:

Python - Numpy or Pandas(also possible) broadcasting

问题

I understand your request. Here is the translated content:

我有一个充满数字的numpy二维数组（也可以使用pandas DataFrame），我需要在其中的一个列中创建/替换最后n行的数字为该列最后n行的平均值。我的numpy数组形状很大，类似于[10000:10000]。

# 例子（形状有限，仅供说明）：
# Numpy数组：

    [[10, 30, 8, 1],
     [11, 5, 19, 12],
     [12, 18, 15, 6],
     [13, 10, 21, 9],
     [14, 67, 14, 2],
     [15, 13, 12, 6]]

# 平均数为n = 3
因此，代码应该在迭代中取最后3个数字并计算平均值。

# Numpy数组：

    [[12.5, 23.5, 14.83333333, 5.833333333],
     [12, 10.33333333, 18.33333333, 9],
     [13, 31.66666667, 16.66666667, 5.666666667],
     [14, 30, 15.66666667, 5.333333333]]

# 解释：
- 14是数字15、14、13的平均值
- 18.33333333是数字21、15、19的平均值
- 9是数字9、6、12的平均值

结果应该是函数获取列维度中的最后n个值，并计算其平均值。

我能够通过两个for循环和标准的Python代码实现，但是这需要很长时间。

英文:

I have numpy array-2D (pandas DatarFame can be also used) full of numbers and I need to create / replace those numbers with mean for last n rows in one column. I have huge numpy array.shape like [10000:10000]

Example (limited shape just for explanation):

Numpy Array:

[[10, 30, 8, 1],
 [11, 5, 19, 12],
 [12, 18, 15, 6],
 [13, 10, 21, 9],
 [14, 67, 14, 2],
 [15, 13, 12, 6]]

Average by n = 3

So the code should take last 3 numbers in iteration and crate average

Numpy Array:

[[12.5, 23.5, 14.83333333, 5.833333333],
 [12, 10.33333333, 18.33333333, 9],
 [13, 31.66666667, 16.66666667, 5.666666667],
 [14, 30, 15.66666667, 5.333333333]]

Explanation:

14 is average of numbers 15,14,13
18.33333333 is average of numbers 21, 15, 19
9 is average of numbers 9, 6, 12

Result should be that function takes n-last values in column dimension and make average of it.

I was able to do it through 2 for loops and standard python code, but it takes a lot of time.

答案1

得分: 1

你不需要循环遍历你的数据。使用Pandas，你可以使用rolling_mean来实现：

import pandas as pd
import numpy as np

arr = np.array([[10, 30,  8,  1],
                [11,  5, 19, 12],
                [12, 18, 15,  6],
                [13, 10, 21,  9],
                [14, 67, 14,  2],
                [15, 13, 12,  6]])

n = 3
df = pd.DataFrame(arr)
out = df.rolling(n).mean().iloc[n-1:]
print(out)

# 输出
      0          1          2         3
2  11.0  17.666667  14.000000  6.333333
3  12.0  11.000000  18.333333  9.000000
4  13.0  31.666667  16.666667  5.666667
5  14.0  30.000000  15.666667  5.666667

只使用numpy，你可以这样做：

# 从 https://stackoverflow.com/q/14313510/15239951 适配而来
out = np.cumsum(arr, axis=0)
out[n:] -= out[:-n]
out = out[n-1:] / n
print(out)

# 输出
array([[11.        , 17.66666667, 14.        ,  6.33333333],
       [12.        , 11.        , 18.33333333,  9.        ],
       [13.        , 31.66666667, 16.66666667,  5.66666667],
       [14.        , 30.        , 15.66666667,  5.66666667]])

英文:

You don't need to loop over your data. With Pandas, you can do a rolling_mean:

import pandas as pd
import numpy as np

arr = np.array([[10, 30,  8,  1],
                [11,  5, 19, 12],
                [12, 18, 15,  6],
                [13, 10, 21,  9],
                [14, 67, 14,  2],
                [15, 13, 12,  6]])

n = 3
df = pd.DataFrame(arr)
out = df.rolling(n).mean().iloc[n-1:]
print(out)

# Output
      0          1          2         3
2  11.0  17.666667  14.000000  6.333333
3  12.0  11.000000  18.333333  9.000000
4  13.0  31.666667  16.666667  5.666667
5  14.0  30.000000  15.666667  5.666667

With numpy only, you can do:

# Adapted from https://stackoverflow.com/q/14313510/15239951
out = np.cumsum(arr, axis=0)
out[n:] -= out[:-n]
out = out[n-1:] / n
print(out)

# Output
array([[11.        , 17.66666667, 14.        ,  6.33333333],
       [12.        , 11.        , 18.33333333,  9.        ],
       [13.        , 31.66666667, 16.66666667,  5.66666667],
       [14.        , 30.        , 15.66666667,  5.66666667]])

答案2

得分: 0

我也通过循环解决了这个问题，你可以与你的解决方案进行比较。

def process(data,n):
    for i in range(n):
        data[:][i]=np.mean(data[:][i:i+n],axis=0)
    return data[:][:-n+1]

英文:

I also solve this problem by loops and u can compare with yours.

def process(data,n):
    for i in range(n):
        data[:][i]=np.mean(data[:][i:i+n],axis=0)
    return data[:][:-n+1]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python – Numpy或者Pandas（也可以）广播

问题

Example (limited shape just for explanation):

Numpy Array:

Average by n = 3

Numpy Array:

Explanation:

答案1

答案2

Create subplot, by overlapping two dataframes of different shapes and column names, for every group/id,

遇到在尝试调整文件中的图像大小时出现权限错误？

优先考虑非线性系统中的方程。

Python unittest 模拟 InfluxDB

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论