如何根据条件在R中删除重复的行?

huangapple go评论86阅读模式
英文:

How to remove duplicate rows in R based on condition?

问题

以下是翻译好的内容:

我有以下数据:

df <- data.frame(id = c("001", "001", "001", "002", "002", "003", "003"),
                 x = c(0, 0, 0, 0, 1, 0, 1))

数据的性质是,某些 id 只可能有 x = 0 的行。在给定的 id 中,如果 x = 1,那么只会在该 id 的最后一行出现。我想要删除每个 id 的重复行,但是如果某个 id 的情况是 x = 1,则只保留该行。

期望的输出:

 id x
001 0
002 1
003 1

最好使用 tidyverse 来解决。谢谢!

英文:

I have the following data:

df &lt;- data.frame(id = c(&quot;001&quot;, &quot;001&quot;, &quot;001&quot;, &quot;002&quot;, &quot;002&quot;, &quot;003&quot;, &quot;003&quot;),
                 x = c(0, 0, 0, 0, 1, 0, 1))

 id x
001 0
001 0
001 0
002 0
002 1
003 0
003 1

The nature of the data is such that it is possible for some id to only have x = 0 rows. In the case where x = 1 for a given id, it only occurs once, and that too in the last row for that id. I want to remove duplicate rows for each id, but in case x = 1 for an id, I want to keep only that row.

The desired output:

 id x
001 0
002 1
003 1

A tidyverse solution is preferable. Thanks!

答案1

得分: 5

在基本的 R 中,你可以使用 aggregate 函数:

aggregate(x ~ id, df, max)
   id x
1 001 0
2 002 1
3 003 1
英文:

in base R you could use aggregate function:

aggregate(x ~ id, df, max)
   id x
1 001 0
2 002 1
3 003 1

答案2

得分: 4

可能是slice_max

df %>%
    slice_max(x, by = id) %>%
    distinct()

或者(来自@r2evans的评论)

df %>%
    slice_max(x, by = id, with_ties = FALSE)

这将得到

   id x
1 001 0
2 002 1
3 003 1
英文:

Probably slice_max

df %&gt;%
    slice_max(x, by = id) %&gt;%
    distinct()

or (as comments from @r2evans)

df %&gt;%
    slice_max(x, by = id, with_ties = FALSE)

which gives

   id x
1 001 0
2 002 1
3 003 1

huangapple
  • 本文由 发表于 2023年8月4日 23:04:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76837137.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定