Pandas 聚合和对列名应用条件

huangapple go评论52阅读模式
英文:

Pandas aggregating and applying a condition on column names

问题

我有这个DataFrame

id language is_bruiser is_tank is_support
0 4578 fr True False False
1 121 de True True False
2 1216 fr True False False
3 542 de False True False
3 1542 de True False False

我想要的是获得这个字典,其中语言作为键,is_columns的名称作为子键,并聚合ids

{
"de": {"is_bruiser": [121, 1542], "is_tank": [121, 542]},
"fr": {"is_bruiser": [1216, 4578]}
}

我尝试了一些解决方案,但迄今为止我的瓶颈是在列名上应用预聚合条件。

英文:

I've got this DataFrame

id language is_bruiser is_tank is_support
0 4578 fr True False False
1 121 de True True False
2 1216 fr True False False
3 542 de False True False
3 1542 de True False False

And what I want to achieve is getting this dict which language as ak ey and the name of the is_columns a,d aggregating the ids

 {
  "de" :{"is_bruiser" : [121,1542], "is_tank" : [121,542 },
  "fr": {"is_bruiser" : [1216,4578] } 
  }

Have been trying few solutions but my bottleneck so far is applying the pre aggregation condition on the column names

答案1

得分: 1

以下是翻译好的部分:

from collections import defaultdict

def get_dict(group):
    dct = defaultdict(list)
    bool_cols = filter(lambda x: x.startswith("is"), group.columns)
    
    for column in bool_cols:
        for _, row in group.iterrows():
            if row[column]:
                dct[column].append(row["id"])
    return dict(dct)
        
aggregated_df = df \
    .groupby("language") \
    .apply(get_dict)

dict(zip(aggregated_df.index, aggregated_df))

输出:

{'de': {'is_bruiser': ['121', '1542'], 'is_tank': ['121', '542']},
 'fr': {'is_bruiser': ['4578', '1216']}}

注意:输出中的单引号(')是英文单引号,而不是 HTML 实体编码。

英文:

You can apply a function to each group. Dictionary with result can be calculated for each group. And them grouped dataframe can be formatted as a final dictionary.

from collections import defaultdict

def get_dict(group):
    dct = defaultdict(list)
    bool_cols = filter(lambda x: x.startswith("is"), group.columns)
    
    for column in bool_cols:
        for _, row in group.iterrows():
            if row[column]:
                dct[column].append(row["id"])
    return dict(dct)
        
aggregated_df = df \
    .groupby("language") \
    .apply(get_dict)

dict(zip(aggregated_df.index, aggregated_df))

Output:

{'de': {'is_bruiser': ['121', '1542'], 'is_tank': ['121', '542']},
 'fr': {'is_bruiser': ['4578', '1216']}}

答案2

得分: 1

另一种解决方案:

d1 = dict()
for language in df['language'].unique():
    df_filtered = df[df['language'] == language]
    d2 = dict()
    for col in ['is_bruiser', 'is_tank', 'is_support']:
        values = list(df_filtered[df_filtered[col]]['id'])
        if values:
            d2[col] = values
    d1[language] = d2
英文:

Another solution:

d1 = dict()
for language in df['language'].unique():
    df_filtered = df[df['language'] == language]
    d2 = dict()
    for col in ['is_bruiser', 'is_tank', 'is_support']:
        values = list(df_filtered[df_filtered[col]]['id'])
        if values:
            d2[col] = values
    d1[language] = d2

答案3

得分: 0

你可以在你的语言和布尔列中进行迭代:

result = {}
for lang in df["language"].unique():
    result[lang] = {}
    for col in ["is_bruiser", "is_tank", "is_support"]:
        ids = df.loc[(df["language"] == lang) & (df[col]), "id"].tolist()
        if ids:
            result[lang][col] = ids

然后你会得到你想要的结果:

print(result)
# 输出如下:
# {'fr': {'is_bruiser': [4578, 1216]}, 'de': {'is_bruiser': [121, 1542], 'is_tank': [121, 542]}}
英文:

you can iterate in your languages and your boolean columns :

result = {} for lang in df["language"].unique():
    result[lang] = {}
    for col in ["is_bruiser", "is_tank", "is_support"]:
        ids = df.loc[(df["language"] == lang) & (df[col]), "id"].tolist()
        if ids:
            result[lang][col] = ids

Then you'll get what you want :

print(result)
> {'fr': {'is_bruiser': [4578, 1216]}, 'de': {'is_bruiser': [121, 1542], 'is_tank': [121, 542]}}

huangapple
  • 本文由 发表于 2023年7月12日 23:36:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76672285.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定