根据最新行条件筛选表格

huangapple go评论133阅读模式
英文:

Filtering table based on the latest row condition

问题

我可以使用 dplyr 来选择在最后一个可用日期(4/1)对所有X列都有非零数据的用户吗?在这种情况下,用户2 应该被移除。谢谢。

英文:

I have a table like the following:

  1. date user X1 X2 X3
  2. 1/1 1 0 3 34
  3. 2/1 1 0 7 65
  4. 3/1 1 0 0 0
  5. 4/1 1 25 4 65
  6. 1/1 2 285 0 0
  7. 2/1 2 0 0 0
  8. 3/1 2 0 54 0
  9. 4/1 2 0 0 0

How can I use dplyr to select the users that have non-zero data only at the last available date (4/1) for all Xs ?? In this case user 2 should be removed. Thanks

答案1

得分: 4

使用if_any来保留一个组,只要该组的最后一行中所选列中有一个值不等于0:

  1. library(dplyr) #1.1.0+
  2. df %>%
  3. filter(if_any(X1:X3, ~ .x[n()] != 0), .by = user)
  4. # date user X1 X2 X3
  5. # 1 1/1 1 0 3 34
  6. # 2 2/1 1 0 7 65
  7. # 3 3/1 1 0 0 0
  8. # 4 4/1 1 25 4 65

希望这对你有所帮助。

英文:

With if_any to keep a group if any of the selected column in the last row for a group has a value different from 0:

  1. library(dplyr) #1.1.0+
  2. df %>%
  3. filter(if_any(X1:X3, ~ .x[n()] != 0), .by = user)
  4. # date user X1 X2 X3
  5. # 1 1/1 1 0 3 34
  6. # 2 2/1 1 0 7 65
  7. # 3 3/1 1 0 0 0
  8. # 4 4/1 1 25 4 65

答案2

得分: 2

使用dplyr,我们可以计算last记录的rowSums

  1. library(dplyr)
  2. # or across(X1:X3, last) if you only have positive values
  3. df %>% filter(rowSums(across(X1:X3, ~last(abs(.x)))) != 0, .by = user)
  4. date user X1 X2 X3
  5. 1 1/1 1 0 3 34
  6. 2 2/1 1 0 7 65
  7. 3 3/1 1 0 0 0
  8. 4 4/1 1 25 4 65

请注意,代码部分没有进行翻译。

英文:

With dplyr, we can calculate the rowSums of the last record.

  1. library(dplyr)
  2. # or across(X1:X3, last) if you only have positive values
  3. df %>% filter(rowSums(across(X1:X3, ~last(abs(.x)))) != 0, .by = user)
  4. date user X1 X2 X3
  5. 1 1/1 1 0 3 34
  6. 2 2/1 1 0 7 65
  7. 3 3/1 1 0 0 0
  8. 4 4/1 1 25 4 65

答案3

得分: 1

另一个选项是使用 anyc_across 来检查值是否为0以及最后的 row_number,如下所示:

  1. library(dplyr)
  2. df %>%
  3. group_by(user) %>%
  4. filter(any(c_across(starts_with("X")) != 0 & row_number() == n()))
  5. #> # A tibble: 4 × 5
  6. #> # Groups: user [1]
  7. #> date user X1 X2 X3
  8. #> <chr> <int> <int> <int> <int>
  9. #> 1 1/1 1 0 3 34
  10. #> 2 2/1 1 0 7 65
  11. #> 3 3/1 1 0 0 0
  12. #> 4 4/1 1 25 4 65

创建于2023年3月15日,使用 reprex v2.0.2

英文:

Another option using any with c_across to check if the values are 0 and the last row_number like this:

  1. library(dplyr)
  2. df %&gt;%
  3. group_by(user) %&gt;%
  4. filter(any(c_across(starts_with(&quot;X&quot;)) != 0 &amp; row_number() == n()))
  5. #&gt; # A tibble: 4 &#215; 5
  6. #&gt; # Groups: user [1]
  7. #&gt; date user X1 X2 X3
  8. #&gt; &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;
  9. #&gt; 1 1/1 1 0 3 34
  10. #&gt; 2 2/1 1 0 7 65
  11. #&gt; 3 3/1 1 0 0 0
  12. #&gt; 4 4/1 1 25 4 65

<sup>Created on 2023-03-15 with reprex v2.0.2</sup>

答案4

得分: 0

虽然楼主明显更喜欢dplyr,但这里提供了一个data.table的解决方案。

  1. library(data.table)
  2. setDT(df)
  3. df[, .SD[any(.SD[.N, X1:X3] != 0)], user]
  4. user date X1 X2 X3
  5. 1: 1 1/1 0 3 34
  6. 2: 1 2/1 0 7 65
  7. 3: 1 3/1 0 0 0
  8. 4: 1 4/1 25 4 65
英文:

Although the OP clearly prefers dplyr for completeness a data.table solution

  1. library(data.table)
  2. setDT(df)
  3. df[, .SD[any(.SD[.N, X1:X3] != 0)], user]
  4. user date X1 X2 X3
  5. 1: 1 1/1 0 3 34
  6. 2: 1 2/1 0 7 65
  7. 3: 1 3/1 0 0 0
  8. 4: 1 4/1 25 4 65

huangapple
  • 本文由 发表于 2023年3月15日 19:33:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/75744134.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定