Pandas 聚合和对列名应用条件

huangapple go评论72阅读模式
英文:

Pandas aggregating and applying a condition on column names

问题

我有这个DataFrame

id language is_bruiser is_tank is_support
0 4578 fr True False False
1 121 de True True False
2 1216 fr True False False
3 542 de False True False
3 1542 de True False False

我想要的是获得这个字典,其中语言作为键,is_columns的名称作为子键,并聚合ids

{
"de": {"is_bruiser": [121, 1542], "is_tank": [121, 542]},
"fr": {"is_bruiser": [1216, 4578]}
}

我尝试了一些解决方案,但迄今为止我的瓶颈是在列名上应用预聚合条件。

英文:

I've got this DataFrame

id language is_bruiser is_tank is_support
0 4578 fr True False False
1 121 de True True False
2 1216 fr True False False
3 542 de False True False
3 1542 de True False False

And what I want to achieve is getting this dict which language as ak ey and the name of the is_columns a,d aggregating the ids

  1. {
  2. "de" :{"is_bruiser" : [121,1542], "is_tank" : [121,542 },
  3. "fr": {"is_bruiser" : [1216,4578] }
  4. }

Have been trying few solutions but my bottleneck so far is applying the pre aggregation condition on the column names

答案1

得分: 1

以下是翻译好的部分:

  1. from collections import defaultdict
  2. def get_dict(group):
  3. dct = defaultdict(list)
  4. bool_cols = filter(lambda x: x.startswith("is"), group.columns)
  5. for column in bool_cols:
  6. for _, row in group.iterrows():
  7. if row[column]:
  8. dct[column].append(row["id"])
  9. return dict(dct)
  10. aggregated_df = df \
  11. .groupby("language") \
  12. .apply(get_dict)
  13. dict(zip(aggregated_df.index, aggregated_df))

输出:

  1. {'de': {'is_bruiser': ['121', '1542'], 'is_tank': ['121', '542']},
  2. 'fr': {'is_bruiser': ['4578', '1216']}}

注意:输出中的单引号(')是英文单引号,而不是 HTML 实体编码。

英文:

You can apply a function to each group. Dictionary with result can be calculated for each group. And them grouped dataframe can be formatted as a final dictionary.

  1. from collections import defaultdict
  2. def get_dict(group):
  3. dct = defaultdict(list)
  4. bool_cols = filter(lambda x: x.startswith("is"), group.columns)
  5. for column in bool_cols:
  6. for _, row in group.iterrows():
  7. if row[column]:
  8. dct[column].append(row["id"])
  9. return dict(dct)
  10. aggregated_df = df \
  11. .groupby("language") \
  12. .apply(get_dict)
  13. dict(zip(aggregated_df.index, aggregated_df))

Output:

  1. {'de': {'is_bruiser': ['121', '1542'], 'is_tank': ['121', '542']},
  2. 'fr': {'is_bruiser': ['4578', '1216']}}

答案2

得分: 1

另一种解决方案:

  1. d1 = dict()
  2. for language in df['language'].unique():
  3. df_filtered = df[df['language'] == language]
  4. d2 = dict()
  5. for col in ['is_bruiser', 'is_tank', 'is_support']:
  6. values = list(df_filtered[df_filtered[col]]['id'])
  7. if values:
  8. d2[col] = values
  9. d1[language] = d2
英文:

Another solution:

  1. d1 = dict()
  2. for language in df['language'].unique():
  3. df_filtered = df[df['language'] == language]
  4. d2 = dict()
  5. for col in ['is_bruiser', 'is_tank', 'is_support']:
  6. values = list(df_filtered[df_filtered[col]]['id'])
  7. if values:
  8. d2[col] = values
  9. d1[language] = d2

答案3

得分: 0

你可以在你的语言和布尔列中进行迭代:

  1. result = {}
  2. for lang in df["language"].unique():
  3. result[lang] = {}
  4. for col in ["is_bruiser", "is_tank", "is_support"]:
  5. ids = df.loc[(df["language"] == lang) & (df[col]), "id"].tolist()
  6. if ids:
  7. result[lang][col] = ids

然后你会得到你想要的结果:

  1. print(result)
  2. # 输出如下:
  3. # {'fr': {'is_bruiser': [4578, 1216]}, 'de': {'is_bruiser': [121, 1542], 'is_tank': [121, 542]}}
英文:

you can iterate in your languages and your boolean columns :

  1. result = {} for lang in df["language"].unique():
  2. result[lang] = {}
  3. for col in ["is_bruiser", "is_tank", "is_support"]:
  4. ids = df.loc[(df["language"] == lang) & (df[col]), "id"].tolist()
  5. if ids:
  6. result[lang][col] = ids

Then you'll get what you want :

  1. print(result)
  2. > {'fr': {'is_bruiser': [4578, 1216]}, 'de': {'is_bruiser': [121, 1542], 'is_tank': [121, 542]}}

huangapple
  • 本文由 发表于 2023年7月12日 23:36:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76672285.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定