英文:
Tricky groupby several columns of a similar prefix while taking the sum based off of categorical values within a column (Pandas)
问题
我想对几列进行分组,如果前缀相似的话,并根据列内的分类值进行求和。
数据
name type size
AA:3400 5
AA:3401 FALSE 1
AA:3402 FALSE 2
AA:3404 FALSE 0
AA:3409 FALSE 1
AA:3410 FALSE 8
AA:3412 FALSE 9
BB:3400 TRUE 4
BB:3401 FALSE 7
期望结果
name type size
AA TRUE 0
AA FALSE 21
AA 5
BB TRUE 4
BB FALSE 7
BB
正在进行的操作
df.groupby(['name', 'type'], dropna=False, as_index=False)['size'].sum()
但是,如果值具有相同的前缀,我该如何分组呢?欢迎任何建议。
英文:
I am looking to groupby several columns if the prefix is similar and take the sum based off of categorical values within a column.
Data
name type size
AA:3400 5
AA:3401 FALSE 1
AA:3402 FALSE 2
AA:3404 FALSE 0
AA:3409 FALSE 1
AA:3410 FALSE 8
AA:3412 FALSE 9
BB:3400 TRUE 4
BB:3401 FALSE 7
Desired
name type size
AA TRUE 0
AA FALSE 21
AA 5
BB TRUE 4
BB FALSE 7
BB
Doing
df.groupby(['name', 'type'], dropna=False, as_index=False)['size'].sum()
However, how can I group if the value has the same prefix? Any suggestion is appreciated.
答案1
得分: 3
以下是您要翻译的代码部分:
out = (
df.assign(type=df["type"].astype(
pd.CategoricalDtype(["TRUE", "FALSE"], ordered=True)))
.groupby([df["name"].str.split(":").str[0], "type"],
dropna=False, group_keys=False)["size"].sum().reset_index()
)
输出结果:
print(out)
name type size
0 AA TRUE 0
1 AA FALSE 21
2 AA NaN 5
3 BB TRUE 4
4 BB FALSE 7
5 BB NaN 0
请注意,翻译结果仅包括代码和输出的部分,没有其他内容。
英文:
You can try:
out = (
df.assign(type= df["type"].astype(
pd.CategoricalDtype(["TRUE", "FALSE"], ordered=True)))
.groupby([df["name"].str.split(":").str[0], "type"],
dropna=False, group_keys=False)["size"].sum().reset_index()
)
Output:
print(out)
name type size
0 AA TRUE 0
1 AA FALSE 21
2 AA NaN 5
3 BB TRUE 4
4 BB FALSE 7
5 BB NaN 0
答案2
得分: 3
以下是您要翻译的内容:
就像 @Timeless 的解决方案一样,我会这样做:
df['type'] = df['type'].astype('category')
df_out = df.groupby([df['name'].str[:2], 'type'],
dropna=False,
observed=False)['size'].sum().reset_index()
print(df_out)
输出:
name type size
0 AA False 21
1 AA True 0
2 AA NaN 5
3 BB False 7
4 BB True 4
5 BB NaN 0
英文:
Much like @Timeless solution, I'd do it like this:
df['type'] = df['type'].astype('category')
df_out = df.groupby([df['name'].str[:2], 'type'],
dropna=False,
observed=False)['size'].sum().reset_index()
print(df_out)
Output:
name type size
0 AA False 21
1 AA True 0
2 AA NaN 5
3 BB False 7
4 BB True 4
5 BB NaN 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论