Pandas DataFrame.groupby().agg() 问题

huangapple go评论120阅读模式
英文:

Pandas DataFrame.groupby().agg() issue

问题

我尝试使用pandas的groupby()和agg()方法,但遇到了一些问题。

我需要对某些列进行求和(使用agg({'column': sum})),对其他列进行加权平均,其中权重在一列中,还需要对剩余的列进行平均值计算。

我想在列C上进行加权平均,权重是列B中的值。

然后,对于在agg()中未指定的列(F、G等等,我有很多这些列),我想应用.mean()方法,最终保留所有列。

您能帮助我吗?
谢谢

我尝试过以下代码:

df = df.groupby(['Date', 'Hour']).agg({'A': 'sum',
                                       'B': 'sum',
                                       'C': 加权平均?, 
                                       'D': 'sum',
                                       'E': 'mean'}).reset_index()

但我不知道如何正确编写它。

英文:

I'm trying to use pandas groupby().agg() but I have some issues.

Date	           Year Month Week Hour A	B 	    C	D	      E   F	 G ..
mercoledì 5 aprile 2023	4	14	5	6	6	144,79	0	868,74 	  6	  36
mercoledì 5 aprile 2023	4	14	6	214	214	144,79	0	30985,0   6	  214
mercoledì 5 aprile 2023	4	14	6	6	6	144,79	0	868,74	  6	  36
mercoledì 5 aprile 2023	4	14	7	220	220	180,26	0	39657,2	  220 48
mercoledì 5 aprile 2023	4	14	7	100	100	180,26	146	18026	  100 10
mercoledì 5 aprile 2023	4	14	8	220	220	225,2	0	49544	  220 48
mercoledì 5 aprile 2023	4	14	8	57	57	2,2 	146	129,38    6	  57	

I have to sum some columns (and that goes with agg({'column':sum}), do a weighted average on others, with the weights being in a column, and have to mean() the remaining columns.

df = df.groupby(['Date','Hour']).agg({'A':'sum', 
                                      'B':'sum',
                                      'C': weighted average?, 
                                      'D':'sum', 
                                      'E':'mean'}).reset_index()

I wanna do the weighted average on C, and the weights are the values in column B.

Then, for the non-indicated columns in .agg() (F,G, and so on, I have many of them) I wanna apply the method .mean(), keeping all the columns in the end.

Can you help me?
Thank you

Tried this:

df = df.groupby(['Date','Hour']).agg({'A':'sum', 
                                      'B':'sum',
                                      'C': weighted average?, 
                                      'D':'sum', 
                                      'E':'mean'}).reset_index()

But I don't know how to properly code it

答案1

得分: 1

以下是翻译好的部分:

import pandas as pd

data = {
    "Date": ["mercoledì 5 aprile 2023", "mercoledì 5 aprile 2023", "mercoledì 5 aprile 2023", "mercoledì 5 aprile 2023", "mercoledì 5 aprile 2023", "mercoledì 5 aprile 2023"],
    "Year": [2023, 2023, 2023, 2023, 2023, 2023],
    "Month": [4, 4, 4, 4, 4, 4],
    "Week": [14, 14, 14, 14, 14, 14],
    "Hour": [5, 6, 6, 7, 7, 8],
    "A": [6, 214, 6, 220, 100, 220],
    "B": [6, 214, 6, 220, 100, 57],
    "C": [144.79, 144.79, 144.79, 180.26, 180.26, 2.2],
    "D": [0, 0, 0, 0, 146, 146],
    "E": [868.74, 30985.0, 868.74, 39657.2, 18026.0, 129.38],
    "F": [6, 214, 36, 48, 10, 57],
    "G": [36, 214, 36, 48, 10, 57],
}

df = pd.DataFrame(data)

print(df)



# Calculate sum of A, mean of B, and weighted mean of C using B as weights
result = df.groupby(['Date','Hour']).agg({
'A': 'sum', 
'B': 'sum',
'C': lambda x: (df['B'] * df['C']).sum() / df['B'].sum(),
'D': 'sum', 
'E': 'mean'
})

print(result.reset_index())

请注意,代码中的HTML实体(如"')没有被翻译,因为它们是代码的一部分,不需要翻译。

英文:

Something like this:

import pandas as pd

data = {
    "Date": ["mercoledì 5 aprile 2023", "mercoledì 5 aprile 2023", "mercoledì 5 aprile 2023", "mercoledì 5 aprile 2023", "mercoledì 5 aprile 2023", "mercoledì 5 aprile 2023"],
    "Year": [2023, 2023, 2023, 2023, 2023, 2023],
    "Month": [4, 4, 4, 4, 4, 4],
    "Week": [14, 14, 14, 14, 14, 14],
    "Hour": [5, 6, 6, 7, 7, 8],
    "A": [6, 214, 6, 220, 100, 220],
    "B": [6, 214, 6, 220, 100, 57],
    "C": [144.79, 144.79, 144.79, 180.26, 180.26, 2.2],
    "D": [0, 0, 0, 0, 146, 146],
    "E": [868.74, 30985.0, 868.74, 39657.2, 18026.0, 129.38],
    "F": [6, 214, 36, 48, 10, 57],
    "G": [36, 214, 36, 48, 10, 57],
}

df = pd.DataFrame(data)

print(df)



# Calculate sum of A, mean of B, and weighted mean of C using B as weights
result = df.groupby(['Date','Hour']).agg({
'A': 'sum', 
'B': 'sum',
'C': lambda x: (df['B'] * df['C']).sum() / df['B'].sum(),
'D': 'sum', 
'E': 'mean'
})

print(result.reset_index())

Date  Hour    A    B           C    D         E
0  mercoledì 5 aprile 2023     5    6    6  150.134561    0    868.74
1  mercoledì 5 aprile 2023     6  220  220  150.134561    0  15926.87
2  mercoledì 5 aprile 2023     7  320  320  150.134561  146  28841.60
3  mercoledì 5 aprile 2023     8  220   57  150.134561  146    129.38

答案2

得分: 0

不能直接使用agg来计算加权平均值,因为这需要两列*。

一种方法是在计算前/后进行预处理。加权平均值等于sum(C*B)/sum(B)

out = (df.eval('C = C*B')
         .groupby(['Date', 'Hour'])
         .agg({'A': 'sum', 
               'B': 'sum',
               'C': 'sum', 
               'D': 'sum', 
               'E': 'mean'})
         .eval('C = C/B')
         .reset_index()
     )

*注意:如果您已经使用B/C计算不同的聚合值,您需要使用它们的副本。

要处理所有列,您可以使用一个字典:

d = {c: 'mean' for c in df.columns.difference(['Date', 'Hour'])}
for c in ['A', 'B', 'C', 'D']:
    d[c] = 'sum'

out = (df.eval('C = C*B')
         .groupby(['Date', 'Hour'], as_index=False)
         .agg(d)
         .eval('C = C/B')
     )

*您可以使用groupby.apply来计算加权平均值,但这应该作为单独的操作进行。

英文:

You cannot compute a weighted average with agg directly as this requires two columns*.

One way would be to pre-/post-process the computation. The weighted average is equal to sum(C*B)/sum(B):

out = (df.eval('C = C*B')
         .groupby(['Date', 'Hour'])
         .agg({'A': 'sum', 
               'B': 'sum',
               'C': 'sum', 
               'D': 'sum', 
               'E': 'mean'})
         .eval('C = C/B')
         .reset_index()
     )

NB. If you were already computing a different aggregation with B/C you would need to use copies of them.

To handle all columns you can use a dictionary:

d = {c: 'mean' for c in df.columns.difference(['Date', 'Hour'])}
for c in ['A', 'B', 'C', 'D']:
    d[c] = 'sum'

out = (df.eval('C = C*B')
         .groupby(['Date', 'Hour'], as_index=False)
         .agg(d)
         .eval('C = C/B')
     )

* you can however compute the weighted average with groupby.apply, but this should be done as a separate operation.

huangapple
  • 本文由 发表于 2023年8月10日 20:08:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76875612.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定