在Pandas中使用groupby后,如何找到一列中的负数所占的百分比。

huangapple go评论63阅读模式
英文:

After using groupby in Pandas how to find what percentage of a column is negative

问题

我有一个数据框,类似于这样,有两列:name(姓名)和score(分数)。示例如下:

   name  score
0  John   0.50
1 George   0.20
2  John  -0.94
3  Paul  -0.20
4  Mary   0.44
5  Paul  -0.60
6  Mary  -0.89

我想要按姓名分组,并计算他们分数为负数的百分比,结果应该如下所示:

     name  negative_score_percent
0    John                    50%
1  George                     0%
2    Paul                   100%
3    Mary                    50%

我尝试了以下方法:

df.groupby('name')['score'] < 0

然后我会将其除以总分数数量,如下所示:

no_of_scores = df.groupby('name')['score'].count()

然后合并这两个结果并进行除法,但是我遇到了一个错误:

TypeError: '<' not supported between instances of 'Series' and 'int'

我不确定发生了什么。

英文:

I have a df something like this with two columns: name and score. Example

name     score
John       0.5
George     0.2
John      -0.94
Paul      -0.2
Mary       0.44
Paul      -0.6
Mary      -0.89

I want to groupby name and get the what percentage of their scores are negative, so should look like

name      negative_score_percent
John      50%
George    0%
Paul      100%
Mary      50%

I tried:

df.groupby(&#39;name&#39;)([&#39;score&#39;] &lt; 0).sum()

after which I would just divide it by the total number of scores like this:

no_of_scores = df.groupby(&#39;name&#39;)[&#39;scores&#39;].count()

and then merge the two and divide but I get an error:

TypeError: &#39;&lt;&#39; not supported between instances of &#39;list&#39; and &#39;int&#39;

and I'm not sure what is happening.

答案1

得分: 1

你可以使用lambda来实现类似的操作:

result = df.groupby('name')['score'].apply(lambda x: (x < 0).mean() * 100).reset_index(name='negative_score_percent')

print(result)

在上面的示例中,mean()可能不太明显,但它只是sum(x) / len(x)的缩写,这在这种情况下表示小于0的值的数量除以总值的数量。

英文:

You can do something like that using lambda:

result = df.groupby(&#39;name&#39;)[&#39;score&#39;].apply(lambda x: (x &lt; 0).mean() * 100).reset_index(name=&#39;negative_score_percent&#39;)

print(result)

In the example above mean() might be not that obvious, but it is just a short version of sum(x) / len (x) which, in this case, gives number of values below 0 divided by total number of values.

答案2

得分: 1

首先识别负值,然后进行groupby

out = (df['score'].lt(0)
 .groupby(df['name'], sort=False)
 .agg(negative_score_percent='mean')
 .mul(100).reset_index()
 )

输出:

     name  negative_score_percent
0    John                    50.0
1  George                     0.0
2    Paul                   100.0
3    Mary                    50.0
英文:

First identify the negative values, then groupby:

out = (df[&#39;score&#39;].lt(0)
 .groupby(df[&#39;name&#39;], sort=False)
 .agg(negative_score_percent=&#39;mean&#39;)
 .mul(100).reset_index()
 )

Output:

     name  negative_score_percent
0    John                    50.0
1  George                     0.0
2    Paul                   100.0
3    Mary                    50.0

答案3

得分: 0

似乎您拼写了语法错误。

> df.groupby('name')(['score'] &lt; 0).sum()

在第二个括号中,您只是将列表 ['score'] 与整数 0 进行了比较。

您要寻找的解决方案可以以类似以下方式编写:

df_neg = (df
            [df.score < 0]
            .groupby('name')
            .count()
            .rename(columns={'score': 'neg_score'}))
df_all = (df
            .groupby('name')
            .count()
            .rename(columns={'score': 'all_score'}))

(pd
    .merge(
        df_neg,
        df_all,
        on='name',
        how='right'
        )
    .fillna(0)
    .astype(int)
    .assign(
        negative_score_percent=lambda df_: (df_.neg_score / df_.all_score) * 100
        )
    .drop(
        ['neg_score', 'all_score'],
        axis=1
        )
)

希望这有所帮助。

英文:

Seems like you mispelled syntax

> df.groupby(&#39;name&#39;)([&#39;score&#39;] &lt; 0).sum()

and in second brackets you simply compare list ['score'] with int 0.

Solution you're looking for can be written somehow like this:

df_neg = (df
            [df.score &lt; 0]
            .groupby(&#39;name&#39;)
            .count()
            .rename(columns={&#39;score&#39;: &#39;neg_score&#39;}))
df_all = (df
            .groupby(&#39;name&#39;)
            .count()
            .rename(columns={&#39;score&#39;: &#39;all_score&#39;}))

(pd
    .merge(
        df_neg,
        df_all,
        on=&#39;name&#39;,
        how=&#39;right&#39;
        )
    .fillna(0)
    .astype(int)
    .assign(
        negative_score_percent=lambda df_: (df_.neg_score / df_.all_score) * 100
        )
    .drop(
        [&#39;neg_score&#39;, &#39;all_score&#39;],
        axis=1
        )
    )

huangapple
  • 本文由 发表于 2023年7月14日 02:00:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76682114.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定