英文:
After using groupby in Pandas how to find what percentage of a column is negative
问题
我有一个数据框,类似于这样,有两列:name(姓名)和score(分数)。示例如下:
name score
0 John 0.50
1 George 0.20
2 John -0.94
3 Paul -0.20
4 Mary 0.44
5 Paul -0.60
6 Mary -0.89
我想要按姓名分组,并计算他们分数为负数的百分比,结果应该如下所示:
name negative_score_percent
0 John 50%
1 George 0%
2 Paul 100%
3 Mary 50%
我尝试了以下方法:
df.groupby('name')['score'] < 0
然后我会将其除以总分数数量,如下所示:
no_of_scores = df.groupby('name')['score'].count()
然后合并这两个结果并进行除法,但是我遇到了一个错误:
TypeError: '<' not supported between instances of 'Series' and 'int'
我不确定发生了什么。
英文:
I have a df something like this with two columns: name and score. Example
name score
John 0.5
George 0.2
John -0.94
Paul -0.2
Mary 0.44
Paul -0.6
Mary -0.89
I want to groupby name and get the what percentage of their scores are negative, so should look like
name negative_score_percent
John 50%
George 0%
Paul 100%
Mary 50%
I tried:
df.groupby('name')(['score'] < 0).sum()
after which I would just divide it by the total number of scores like this:
no_of_scores = df.groupby('name')['scores'].count()
and then merge the two and divide but I get an error:
TypeError: '<' not supported between instances of 'list' and 'int'
and I'm not sure what is happening.
答案1
得分: 1
你可以使用lambda
来实现类似的操作:
result = df.groupby('name')['score'].apply(lambda x: (x < 0).mean() * 100).reset_index(name='negative_score_percent')
print(result)
在上面的示例中,mean()
可能不太明显,但它只是sum(x) / len(x)
的缩写,这在这种情况下表示小于0的值的数量除以总值的数量。
英文:
You can do something like that using lambda
:
result = df.groupby('name')['score'].apply(lambda x: (x < 0).mean() * 100).reset_index(name='negative_score_percent')
print(result)
In the example above mean()
might be not that obvious, but it is just a short version of sum(x) / len (x)
which, in this case, gives number of values below 0 divided by total number of values.
答案2
得分: 1
首先识别负值,然后进行groupby
:
out = (df['score'].lt(0)
.groupby(df['name'], sort=False)
.agg(negative_score_percent='mean')
.mul(100).reset_index()
)
输出:
name negative_score_percent
0 John 50.0
1 George 0.0
2 Paul 100.0
3 Mary 50.0
英文:
First identify the negative values, then groupby
:
out = (df['score'].lt(0)
.groupby(df['name'], sort=False)
.agg(negative_score_percent='mean')
.mul(100).reset_index()
)
Output:
name negative_score_percent
0 John 50.0
1 George 0.0
2 Paul 100.0
3 Mary 50.0
答案3
得分: 0
似乎您拼写了语法错误。
> df.groupby('name')(['score'] < 0).sum()
在第二个括号中,您只是将列表 ['score'] 与整数 0 进行了比较。
您要寻找的解决方案可以以类似以下方式编写:
df_neg = (df
[df.score < 0]
.groupby('name')
.count()
.rename(columns={'score': 'neg_score'}))
df_all = (df
.groupby('name')
.count()
.rename(columns={'score': 'all_score'}))
(pd
.merge(
df_neg,
df_all,
on='name',
how='right'
)
.fillna(0)
.astype(int)
.assign(
negative_score_percent=lambda df_: (df_.neg_score / df_.all_score) * 100
)
.drop(
['neg_score', 'all_score'],
axis=1
)
)
希望这有所帮助。
英文:
Seems like you mispelled syntax
> df.groupby('name')(['score'] < 0).sum()
and in second brackets you simply compare list ['score'] with int 0.
Solution you're looking for can be written somehow like this:
df_neg = (df
[df.score < 0]
.groupby('name')
.count()
.rename(columns={'score': 'neg_score'}))
df_all = (df
.groupby('name')
.count()
.rename(columns={'score': 'all_score'}))
(pd
.merge(
df_neg,
df_all,
on='name',
how='right'
)
.fillna(0)
.astype(int)
.assign(
negative_score_percent=lambda df_: (df_.neg_score / df_.all_score) * 100
)
.drop(
['neg_score', 'all_score'],
axis=1
)
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论