2023年3月7日 22:42:07go评论92阅读模式

英文:

Calculate median values for every certain rows in pandas df

问题

I have the following df:

df = pd.DataFrame({
    "value": [10,20,30,40,50,60,70,80,90,100]
})

I need to calculate median values for every n rows. Ideally, to write a function where I can pass pd.Series and n as arguments. So, if n=2, my function should return:

median
15
35
55
75
95

if n=3, then it should return:

median
20
50
80
100

In this example when n=3, the last returned value is 100, however in my real dataset, I have a df with thousands of rows, and I want to set my n to 10 or 20. So, as the last median value, it should return the median of n%10.

I included a similar function below just for reference from link, it calculates the mean in the same manner I explained, but I need to tweak it to calculate the median.

def find_mean(col, rows):
    """
    col: pd.Series
    rows: number of rows
    """
    if isinstance(col, pd.Series):
        col = col.to_numpy()
    mod = col.shape[0] % rows
    if  mod != 0:
        exclude = col[-mod:]
        keep = col[: len(col) - mod]
        out = keep.reshape((int(keep.shape[0]/rows), int(rows))).mean(1)
        out = np.hstack((out, exclude.mean())) 
    else:       
        out = col.reshape((int(col.shape[0]/rows), int(rows))).mean(1)
    return out

英文:

I have the following df:

df = pd.DataFrame({
    &quot;value&quot;: [10,20,30,40,50,60,70,80,90,100]
})

I need to calculate median values for every n rows. Ideally, to write a function where I can pass pd.Series and n as arguments. So, if n=2, my function should return:

median
15
35
55
75
95

if n=3, then it should return:

median
20
50
80
100

I included a similar function below just for reference from link, it calculates the mean in the same manner I explained, but I need to tweak it to calculate the median.

def find_mean(col, rows):
    &quot;&quot;&quot;
    col: pd.Series
    rows: number of rows 
    &quot;&quot;&quot;
    if isinstance(col, pd.Series):
        col = col.to_numpy()
    mod = col.shape[0] % rows
    
    if  mod != 0:
        exclude = col[-mod:]
        keep = col[: len(col) - mod]
        out = keep.reshape((int(keep.shape[0]/rows), int(rows))).mean(1)
        out = np.hstack((out, exclude.mean())) 
    else:       
        out = col.reshape((int(col.shape[0]/rows), int(rows))).mean(1)
    return out

答案1

得分: 3

你可以使用 groupby：

N = 3
df.groupby(np.arange(len(df))//N)['value'].median()

输出：

0     20
1     50
2     80
3    100
Name: value, dtype: int64

英文:

You can use groupby:

N = 3
df.groupby(np.arange(len(df))//N)[&#39;value&#39;].median()

Output:

0     20
1     50
2     80
3    100
Name: value, dtype: int64

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

计算pandas DataFrame中每个特定行的中位数值。

问题

答案1

处理我的数据框，使用条件 – Python Jupyter 笔记本

能否将我的复杂for循环转换为嵌套的列表推导式？

从HTML中使用Zip_longest()函数数据获取Django模型的ID

提取线字符串和圆形多边形几何图形相交时的阴影区域坐标如何？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。