英文:
How to rank only duplicated rows and without Nan?
问题
我有一张包含数据的表格:
如何仅对重复值进行排名(不考虑NaN)?
我的当前输出很遗憾也对唯一值进行了排名:
我需要的输出是:
代码示例:
谢谢!
英文:
I have a table with data:
Col1
0 1.0
1 1.0
2 1.0
3 2.0
4 3.0
5 4.0
6 NaN
How can I rank only duplicated values (without taking into account NaN as well)?
My current output is where unfortunately unique values are ranked as well:
Col1 Rn
0 1.0 1.0
1 1.0 2.0
2 1.0 3.0
3 2.0 1.0
4 3.0 1.0
5 4.0 1.0
6 NaN NaN
The output I need is:
Col1 Rn
0 1.0 1.0
1 1.0 2.0
2 1.0 3.0
3 2.0 NaN
4 3.0 NaN
5 4.0 NaN
6 NaN NaN
Example of the code:
import numpy as np
import pandas as pd
df = pd.DataFrame([[1],
[1],
[1],
[2],
[3],
[4],
[np.NaN]], columns=['Col1'])
print(df)
# Adding row_number for each pair:
df['Rn'] = df[df['Col1'].notnull()].groupby('Col1')['Col1'].rank(method="first", ascending=True)
print(df)
# I managed to select only necessary rows for mask, but how can I apply it along with groupby?:
m = df.dropna().loc[df['Col1'].duplicated(keep=False)]
print(m)
Thank you!
答案1
得分: 2
尝试:
m = df['Col1'].duplicated(keep=False)
df['Rn'] = df[m].groupby('Col1')['Col1'].rank(method="first", ascending=True)
print(df)
打印:
Col1 Rn
0 1.0 1.0
1 1.0 2.0
2 1.0 3.0
3 2.0 NaN
4 3.0 NaN
5 4.0 NaN
6 NaN NaN
英文:
Try:
m = df['Col1'].duplicated(keep=False)
df['Rn'] = df[m].groupby('Col1')['Col1'].rank(method="first", ascending=True)
print(df)
Prints:
Col1 Rn
0 1.0 1.0
1 1.0 2.0
2 1.0 3.0
3 2.0 NaN
4 3.0 NaN
5 4.0 NaN
6 NaN NaN
答案2
得分: 1
你可以识别duplicated
值,并仅计算这些值的rank
:
# 识别重复行
m = df['Col1'].duplicated(keep=False)
# 仅对这些值计算rank
df['Rn'] = df.loc[m, 'Col1'].rank(method='first', ascending=True)
请注意,如果您想增加重复值的计数,您可以使用groupby.cumcount
:
m = df['Col1'].duplicated(keep=False)
df['Rn'] = df.loc[m, ['Col1']].groupby('Col1').cumcount().add(1)
输出:
Col1 Rn
0 1.0 1.0
1 1.0 2.0
2 1.0 3.0
3 2.0 NaN
4 3.0 NaN
5 4.0 NaN
6 NaN NaN
英文:
You can identify the duplicated
values and only compute the rank
for those:
# identify duplicated rows
m = df['Col1'].duplicated(keep=False)
# compute the rank only for those
df['Rn'] = df.loc[m, 'Col1'].rank(method='first', ascending=True)
Note thank if you want to increment a count of the duplicates, you can use groupby.cumcount
:
m = df['Col1'].duplicated(keep=False)
df['Rn'] = df.loc[m, ['Col1']].groupby('Col1').cumcount().add(1)
Output:
Col1 Rn
0 1.0 1.0
1 1.0 2.0
2 1.0 3.0
3 2.0 NaN
4 3.0 NaN
5 4.0 NaN
6 NaN NaN
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论