2023年6月15日 02:15:45go评论119阅读模式

英文:

How to rank only duplicated rows and without Nan?

问题

我有一张包含数据的表格：

如何仅对重复值进行排名（不考虑NaN）？

我的当前输出很遗憾也对唯一值进行了排名：

我需要的输出是：

代码示例：

谢谢！

英文:

I have a table with data:

How can I rank only duplicated values (without taking into account NaN as well)?
My current output is where unfortunately unique values are ranked as well:

   Col1   Rn
0   1.0  1.0
1   1.0  2.0
2   1.0  3.0
3   2.0  1.0
4   3.0  1.0
5   4.0  1.0
6   NaN  NaN

The output I need is:

   Col1   Rn
0   1.0  1.0
1   1.0  2.0
2   1.0  3.0
3   2.0  NaN
4   3.0  NaN
5   4.0  NaN
6   NaN  NaN

Example of the code:

import numpy as np
import pandas as pd
df = pd.DataFrame([[1],
                   [1],
                   [1],
                   [2],
                   [3],
                   [4],
                   [np.NaN]], columns=[&#39;Col1&#39;])
print(df)
# Adding row_number for each pair:
df[&#39;Rn&#39;] = df[df[&#39;Col1&#39;].notnull()].groupby(&#39;Col1&#39;)[&#39;Col1&#39;].rank(method=&quot;first&quot;, ascending=True)
print(df)
# I managed to select only necessary rows for mask, but how can I apply it along with groupby?:
m = df.dropna().loc[df[&#39;Col1&#39;].duplicated(keep=False)]
print(m)

Thank you!

答案1

得分: 2

尝试：

m = df['Col1'].duplicated(keep=False)
df['Rn'] = df[m].groupby('Col1')['Col1'].rank(method="first", ascending=True)
print(df)

打印：

   Col1   Rn
0   1.0  1.0
1   1.0  2.0
2   1.0  3.0
3   2.0  NaN
4   3.0  NaN
5   4.0  NaN
6   NaN  NaN

英文:

Try:

m = df[&#39;Col1&#39;].duplicated(keep=False)
df[&#39;Rn&#39;] = df[m].groupby(&#39;Col1&#39;)[&#39;Col1&#39;].rank(method=&quot;first&quot;, ascending=True)
print(df)

Prints:

   Col1   Rn
0   1.0  1.0
1   1.0  2.0
2   1.0  3.0
3   2.0  NaN
4   3.0  NaN
5   4.0  NaN
6   NaN  NaN

答案2

得分: 1

你可以识别duplicated值，并仅计算这些值的rank：

# 识别重复行
m = df['Col1'].duplicated(keep=False)
# 仅对这些值计算rank
df['Rn'] = df.loc[m, 'Col1'].rank(method='first', ascending=True)

请注意，如果您想增加重复值的计数，您可以使用groupby.cumcount：

m = df['Col1'].duplicated(keep=False)
df['Rn'] = df.loc[m, ['Col1']].groupby('Col1').cumcount().add(1)

输出：

   Col1   Rn
0   1.0  1.0
1   1.0  2.0
2   1.0  3.0
3   2.0  NaN
4   3.0  NaN
5   4.0  NaN
6   NaN  NaN

英文:

You can identify the duplicated values and only compute the rank for those:

# identify duplicated rows
m = df[&#39;Col1&#39;].duplicated(keep=False)
# compute the rank only for those
df[&#39;Rn&#39;] = df.loc[m, &#39;Col1&#39;].rank(method=&#39;first&#39;, ascending=True)

Note thank if you want to increment a count of the duplicates, you can use groupby.cumcount:

m = df[&#39;Col1&#39;].duplicated(keep=False)
df[&#39;Rn&#39;] = df.loc[m, [&#39;Col1&#39;]].groupby(&#39;Col1&#39;).cumcount().add(1)

Output:

   Col1   Rn
0   1.0  1.0
1   1.0  2.0
2   1.0  3.0
3   2.0  NaN
4   3.0  NaN
5   4.0  NaN
6   NaN  NaN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何仅对重复的行进行排名，而不包括NaN值？

问题

答案1

答案2

如何在VSCode中调试Python？

Pandas 更新先前的记录，因为未来的峰值不可能。

‘DataFrame’ 对象没有 ‘merge’ 属性

正则表达式来查找源和目标

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。