英文:
How to remove duplicate rows in R based on condition?
问题
以下是翻译好的内容:
我有以下数据:
df <- data.frame(id = c("001", "001", "001", "002", "002", "003", "003"),
x = c(0, 0, 0, 0, 1, 0, 1))
数据的性质是,某些 id
只可能有 x = 0
的行。在给定的 id
中,如果 x = 1
,那么只会在该 id
的最后一行出现。我想要删除每个 id
的重复行,但是如果某个 id
的情况是 x = 1
,则只保留该行。
期望的输出:
id x
001 0
002 1
003 1
最好使用 tidyverse
来解决。谢谢!
英文:
I have the following data:
df <- data.frame(id = c("001", "001", "001", "002", "002", "003", "003"),
x = c(0, 0, 0, 0, 1, 0, 1))
id x
001 0
001 0
001 0
002 0
002 1
003 0
003 1
The nature of the data is such that it is possible for some id
to only have x = 0
rows. In the case where x = 1
for a given id
, it only occurs once, and that too in the last row for that id
. I want to remove duplicate rows for each id
, but in case x = 1
for an id
, I want to keep only that row.
The desired output:
id x
001 0
002 1
003 1
A tidyverse
solution is preferable. Thanks!
答案1
得分: 5
在基本的 R 中,你可以使用 aggregate
函数:
aggregate(x ~ id, df, max)
id x
1 001 0
2 002 1
3 003 1
英文:
in base R you could use aggregate
function:
aggregate(x ~ id, df, max)
id x
1 001 0
2 002 1
3 003 1
答案2
得分: 4
可能是slice_max
df %>%
slice_max(x, by = id) %>%
distinct()
或者(来自@r2evans的评论)
df %>%
slice_max(x, by = id, with_ties = FALSE)
这将得到
id x
1 001 0
2 002 1
3 003 1
英文:
Probably slice_max
df %>%
slice_max(x, by = id) %>%
distinct()
or (as comments from @r2evans)
df %>%
slice_max(x, by = id, with_ties = FALSE)
which gives
id x
1 001 0
2 002 1
3 003 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论