2023年7月14日 02:00:44go评论63阅读模式

英文:

After using groupby in Pandas how to find what percentage of a column is negative

问题

我有一个数据框，类似于这样，有两列：name（姓名）和score（分数）。示例如下：

   name  score
0  John   0.50
1 George   0.20
2  John  -0.94
3  Paul  -0.20
4  Mary   0.44
5  Paul  -0.60
6  Mary  -0.89

我想要按姓名分组，并计算他们分数为负数的百分比，结果应该如下所示：

     name  negative_score_percent
0    John                    50%
1  George                     0%
2    Paul                   100%
3    Mary                    50%

我尝试了以下方法：

df.groupby('name')['score'] < 0

然后我会将其除以总分数数量，如下所示：

no_of_scores = df.groupby('name')['score'].count()

然后合并这两个结果并进行除法，但是我遇到了一个错误：

TypeError: '<' not supported between instances of 'Series' and 'int'

我不确定发生了什么。

英文:

I have a df something like this with two columns: name and score. Example

name     score
John       0.5
George     0.2
John      -0.94
Paul      -0.2
Mary       0.44
Paul      -0.6
Mary      -0.89

I want to groupby name and get the what percentage of their scores are negative, so should look like

name      negative_score_percent
John      50%
George    0%
Paul      100%
Mary      50%

I tried:

df.groupby(&#39;name&#39;)([&#39;score&#39;] &lt; 0).sum()

after which I would just divide it by the total number of scores like this:

no_of_scores = df.groupby(&#39;name&#39;)[&#39;scores&#39;].count()

and then merge the two and divide but I get an error:

TypeError: &#39;&lt;&#39; not supported between instances of &#39;list&#39; and &#39;int&#39;

and I'm not sure what is happening.

答案1

得分: 1

你可以使用lambda来实现类似的操作：

result = df.groupby('name')['score'].apply(lambda x: (x < 0).mean() * 100).reset_index(name='negative_score_percent')

print(result)

在上面的示例中，mean()可能不太明显，但它只是sum(x) / len(x)的缩写，这在这种情况下表示小于0的值的数量除以总值的数量。

英文:

You can do something like that using lambda:

result = df.groupby(&#39;name&#39;)[&#39;score&#39;].apply(lambda x: (x &lt; 0).mean() * 100).reset_index(name=&#39;negative_score_percent&#39;)

print(result)

In the example above mean() might be not that obvious, but it is just a short version of sum(x) / len (x) which, in this case, gives number of values below 0 divided by total number of values.

答案2

得分: 1

首先识别负值，然后进行groupby：

out = (df['score'].lt(0)
 .groupby(df['name'], sort=False)
 .agg(negative_score_percent='mean')
 .mul(100).reset_index()
 )

输出：

     name  negative_score_percent
0    John                    50.0
1  George                     0.0
2    Paul                   100.0
3    Mary                    50.0

英文:

First identify the negative values, then groupby:

out = (df[&#39;score&#39;].lt(0)
 .groupby(df[&#39;name&#39;], sort=False)
 .agg(negative_score_percent=&#39;mean&#39;)
 .mul(100).reset_index()
 )

Output:

     name  negative_score_percent
0    John                    50.0
1  George                     0.0
2    Paul                   100.0
3    Mary                    50.0

答案3

得分: 0

似乎您拼写了语法错误。

> df.groupby('name')(['score'] < 0).sum()

在第二个括号中，您只是将列表 ['score'] 与整数 0 进行了比较。

您要寻找的解决方案可以以类似以下方式编写：

df_neg = (df
            [df.score < 0]
            .groupby('name')
            .count()
            .rename(columns={'score': 'neg_score'}))
df_all = (df
            .groupby('name')
            .count()
            .rename(columns={'score': 'all_score'}))

(pd
    .merge(
        df_neg,
        df_all,
        on='name',
        how='right'
        )
    .fillna(0)
    .astype(int)
    .assign(
        negative_score_percent=lambda df_: (df_.neg_score / df_.all_score) * 100
        )
    .drop(
        ['neg_score', 'all_score'],
        axis=1
        )
)

希望这有所帮助。

英文:

Seems like you mispelled syntax

> df.groupby('name')(['score'] < 0).sum()

and in second brackets you simply compare list ['score'] with int 0.

Solution you're looking for can be written somehow like this:

df_neg = (df
            [df.score &lt; 0]
            .groupby(&#39;name&#39;)
            .count()
            .rename(columns={&#39;score&#39;: &#39;neg_score&#39;}))
df_all = (df
            .groupby(&#39;name&#39;)
            .count()
            .rename(columns={&#39;score&#39;: &#39;all_score&#39;}))

(pd
    .merge(
        df_neg,
        df_all,
        on=&#39;name&#39;,
        how=&#39;right&#39;
        )
    .fillna(0)
    .astype(int)
    .assign(
        negative_score_percent=lambda df_: (df_.neg_score / df_.all_score) * 100
        )
    .drop(
        [&#39;neg_score&#39;, &#39;all_score&#39;],
        axis=1
        )
    )

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Pandas中使用groupby后，如何找到一列中的负数所占的百分比。

问题

答案1

答案2

答案3

使用`from_dict`函数并选择`orient=”index”`选项后，删除行索引。

无法将包含多列的数据框设置为单列。

Pyspark: 如何使用不同条件和不同列连接两个不同的数据集？

Kusto，使用series_decompose_anomalies进行异常检测，然后基于分组运行。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论