2023年7月12日 23:36:24go评论72阅读模式

英文:

Pandas aggregating and applying a condition on column names

问题

我有这个DataFrame

	id	language	is_bruiser	is_tank	is_support
0	4578	fr	True	False	False
1	121	de	True	True	False
2	1216	fr	True	False	False
3	542	de	False	True	False
3	1542	de	True	False	False

我想要的是获得这个字典，其中语言作为键，is_columns的名称作为子键，并聚合ids

{
"de": {"is_bruiser": [121, 1542], "is_tank": [121, 542]},
"fr": {"is_bruiser": [1216, 4578]}
}

我尝试了一些解决方案，但迄今为止我的瓶颈是在列名上应用预聚合条件。

英文:

I've got this DataFrame

	id	language	is_bruiser	is_tank	is_support
0	4578	fr	True	False	False
1	121	de	True	True	False
2	1216	fr	True	False	False
3	542	de	False	True	False
3	1542	de	True	False	False

And what I want to achieve is getting this dict which language as ak ey and the name of the is_columns a,d aggregating the ids

 {
  &quot;de&quot; :{&quot;is_bruiser&quot; : [121,1542], &quot;is_tank&quot; : [121,542 },
  &quot;fr&quot;: {&quot;is_bruiser&quot; : [1216,4578] } 
  }

Have been trying few solutions but my bottleneck so far is applying the pre aggregation condition on the column names

答案1

得分: 1

以下是翻译好的部分：

from collections import defaultdict
def get_dict(group):
    dct = defaultdict(list)
    bool_cols = filter(lambda x: x.startswith("is"), group.columns)
    
    for column in bool_cols:
        for _, row in group.iterrows():
            if row[column]:
                dct[column].append(row["id"])
    return dict(dct)
        
aggregated_df = df \
    .groupby("language") \
    .apply(get_dict)
dict(zip(aggregated_df.index, aggregated_df))

输出：

{'de': {'is_bruiser': ['121', '1542'], 'is_tank': ['121', '542']},
 'fr': {'is_bruiser': ['4578', '1216']}}

注意：输出中的单引号（'）是英文单引号，而不是 HTML 实体编码。

英文:

You can apply a function to each group. Dictionary with result can be calculated for each group. And them grouped dataframe can be formatted as a final dictionary.

from collections import defaultdict
def get_dict(group):
    dct = defaultdict(list)
    bool_cols = filter(lambda x: x.startswith(&quot;is&quot;), group.columns)
    
    for column in bool_cols:
        for _, row in group.iterrows():
            if row[column]:
                dct[column].append(row[&quot;id&quot;])
    return dict(dct)
        
aggregated_df = df \
    .groupby(&quot;language&quot;) \
    .apply(get_dict)
dict(zip(aggregated_df.index, aggregated_df))

Output:

{&#39;de&#39;: {&#39;is_bruiser&#39;: [&#39;121&#39;, &#39;1542&#39;], &#39;is_tank&#39;: [&#39;121&#39;, &#39;542&#39;]},
 &#39;fr&#39;: {&#39;is_bruiser&#39;: [&#39;4578&#39;, &#39;1216&#39;]}}

答案2

得分: 1

另一种解决方案：

d1 = dict()
for language in df['language'].unique():
    df_filtered = df[df['language'] == language]
    d2 = dict()
    for col in ['is_bruiser', 'is_tank', 'is_support']:
        values = list(df_filtered[df_filtered[col]]['id'])
        if values:
            d2[col] = values
    d1[language] = d2

英文:

Another solution:

d1 = dict()
for language in df[&#39;language&#39;].unique():
    df_filtered = df[df[&#39;language&#39;] == language]
    d2 = dict()
    for col in [&#39;is_bruiser&#39;, &#39;is_tank&#39;, &#39;is_support&#39;]:
        values = list(df_filtered[df_filtered[col]][&#39;id&#39;])
        if values:
            d2[col] = values
    d1[language] = d2

答案3

得分: 0

你可以在你的语言和布尔列中进行迭代：

result = {}
for lang in df["language"].unique():
    result[lang] = {}
    for col in ["is_bruiser", "is_tank", "is_support"]:
        ids = df.loc[(df["language"] == lang) & (df[col]), "id"].tolist()
        if ids:
            result[lang][col] = ids

然后你会得到你想要的结果：

print(result)
# 输出如下：
# {'fr': {'is_bruiser': [4578, 1216]}, 'de': {'is_bruiser': [121, 1542], 'is_tank': [121, 542]}}

英文:

you can iterate in your languages and your boolean columns :

result = {} for lang in df[&quot;language&quot;].unique():
    result[lang] = {}
    for col in [&quot;is_bruiser&quot;, &quot;is_tank&quot;, &quot;is_support&quot;]:
        ids = df.loc[(df[&quot;language&quot;] == lang) &amp; (df[col]), &quot;id&quot;].tolist()
        if ids:
            result[lang][col] = ids

Then you'll get what you want :

print(result)
&gt; {&#39;fr&#39;: {&#39;is_bruiser&#39;: [4578, 1216]}, &#39;de&#39;: {&#39;is_bruiser&#39;: [121, 1542], &#39;is_tank&#39;: [121, 542]}}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas 聚合和对列名应用条件

问题

答案1

答案2

答案3

计算定义错误？

如何使用Python高效生成字符串’AABBBCCCCCDDDDDEEEEE’的所有不重复排列？

如何迭代HTML文件并将特定数据解析到数据框中？

在行上出现Pandas键错误，尽管该键存在。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。