英文:
Pandas aggregating and applying a condition on column names
问题
我有这个DataFrame
id | language | is_bruiser | is_tank | is_support | |
---|---|---|---|---|---|
0 | 4578 | fr | True | False | False |
1 | 121 | de | True | True | False |
2 | 1216 | fr | True | False | False |
3 | 542 | de | False | True | False |
3 | 1542 | de | True | False | False |
我想要的是获得这个字典,其中语言作为键,is_columns的名称作为子键,并聚合ids
{
"de": {"is_bruiser": [121, 1542], "is_tank": [121, 542]},
"fr": {"is_bruiser": [1216, 4578]}
}
我尝试了一些解决方案,但迄今为止我的瓶颈是在列名上应用预聚合条件。
英文:
I've got this DataFrame
id | language | is_bruiser | is_tank | is_support | |
---|---|---|---|---|---|
0 | 4578 | fr | True | False | False |
1 | 121 | de | True | True | False |
2 | 1216 | fr | True | False | False |
3 | 542 | de | False | True | False |
3 | 1542 | de | True | False | False |
And what I want to achieve is getting this dict which language as ak ey and the name of the is_columns a,d aggregating the ids
{
"de" :{"is_bruiser" : [121,1542], "is_tank" : [121,542 },
"fr": {"is_bruiser" : [1216,4578] }
}
Have been trying few solutions but my bottleneck so far is applying the pre aggregation condition on the column names
答案1
得分: 1
以下是翻译好的部分:
from collections import defaultdict
def get_dict(group):
dct = defaultdict(list)
bool_cols = filter(lambda x: x.startswith("is"), group.columns)
for column in bool_cols:
for _, row in group.iterrows():
if row[column]:
dct[column].append(row["id"])
return dict(dct)
aggregated_df = df \
.groupby("language") \
.apply(get_dict)
dict(zip(aggregated_df.index, aggregated_df))
输出:
{'de': {'is_bruiser': ['121', '1542'], 'is_tank': ['121', '542']},
'fr': {'is_bruiser': ['4578', '1216']}}
注意:输出中的单引号(')是英文单引号,而不是 HTML 实体编码。
英文:
You can apply a function to each group. Dictionary with result can be calculated for each group. And them grouped dataframe can be formatted as a final dictionary.
from collections import defaultdict
def get_dict(group):
dct = defaultdict(list)
bool_cols = filter(lambda x: x.startswith("is"), group.columns)
for column in bool_cols:
for _, row in group.iterrows():
if row[column]:
dct[column].append(row["id"])
return dict(dct)
aggregated_df = df \
.groupby("language") \
.apply(get_dict)
dict(zip(aggregated_df.index, aggregated_df))
Output:
{'de': {'is_bruiser': ['121', '1542'], 'is_tank': ['121', '542']},
'fr': {'is_bruiser': ['4578', '1216']}}
答案2
得分: 1
另一种解决方案:
d1 = dict()
for language in df['language'].unique():
df_filtered = df[df['language'] == language]
d2 = dict()
for col in ['is_bruiser', 'is_tank', 'is_support']:
values = list(df_filtered[df_filtered[col]]['id'])
if values:
d2[col] = values
d1[language] = d2
英文:
Another solution:
d1 = dict()
for language in df['language'].unique():
df_filtered = df[df['language'] == language]
d2 = dict()
for col in ['is_bruiser', 'is_tank', 'is_support']:
values = list(df_filtered[df_filtered[col]]['id'])
if values:
d2[col] = values
d1[language] = d2
答案3
得分: 0
你可以在你的语言和布尔列中进行迭代:
result = {}
for lang in df["language"].unique():
result[lang] = {}
for col in ["is_bruiser", "is_tank", "is_support"]:
ids = df.loc[(df["language"] == lang) & (df[col]), "id"].tolist()
if ids:
result[lang][col] = ids
然后你会得到你想要的结果:
print(result)
# 输出如下:
# {'fr': {'is_bruiser': [4578, 1216]}, 'de': {'is_bruiser': [121, 1542], 'is_tank': [121, 542]}}
英文:
you can iterate in your languages and your boolean columns :
result = {} for lang in df["language"].unique():
result[lang] = {}
for col in ["is_bruiser", "is_tank", "is_support"]:
ids = df.loc[(df["language"] == lang) & (df[col]), "id"].tolist()
if ids:
result[lang][col] = ids
Then you'll get what you want :
print(result)
> {'fr': {'is_bruiser': [4578, 1216]}, 'de': {'is_bruiser': [121, 1542], 'is_tank': [121, 542]}}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论