英文:
Filtering table based on the latest row condition
问题
我可以使用 dplyr
来选择在最后一个可用日期(4/1)对所有X列都有非零数据的用户吗?在这种情况下,用户2
应该被移除。谢谢。
英文:
I have a table like the following:
date user X1 X2 X3
1/1 1 0 3 34
2/1 1 0 7 65
3/1 1 0 0 0
4/1 1 25 4 65
1/1 2 285 0 0
2/1 2 0 0 0
3/1 2 0 54 0
4/1 2 0 0 0
How can I use dplyr
to select the users that have non-zero data only at the last available date (4/1) for all Xs ?? In this case user 2
should be removed. Thanks
答案1
得分: 4
使用if_any
来保留一个组,只要该组的最后一行中所选列中有一个值不等于0:
library(dplyr) #1.1.0+
df %>%
filter(if_any(X1:X3, ~ .x[n()] != 0), .by = user)
# date user X1 X2 X3
# 1 1/1 1 0 3 34
# 2 2/1 1 0 7 65
# 3 3/1 1 0 0 0
# 4 4/1 1 25 4 65
希望这对你有所帮助。
英文:
With if_any
to keep a group if any of the selected column in the last row for a group has a value different from 0:
library(dplyr) #1.1.0+
df %>%
filter(if_any(X1:X3, ~ .x[n()] != 0), .by = user)
# date user X1 X2 X3
# 1 1/1 1 0 3 34
# 2 2/1 1 0 7 65
# 3 3/1 1 0 0 0
# 4 4/1 1 25 4 65
答案2
得分: 2
使用dplyr
,我们可以计算last
记录的rowSums
。
library(dplyr)
# or across(X1:X3, last) if you only have positive values
df %>% filter(rowSums(across(X1:X3, ~last(abs(.x)))) != 0, .by = user)
date user X1 X2 X3
1 1/1 1 0 3 34
2 2/1 1 0 7 65
3 3/1 1 0 0 0
4 4/1 1 25 4 65
请注意,代码部分没有进行翻译。
英文:
With dplyr
, we can calculate the rowSums
of the last
record.
library(dplyr)
# or across(X1:X3, last) if you only have positive values
df %>% filter(rowSums(across(X1:X3, ~last(abs(.x)))) != 0, .by = user)
date user X1 X2 X3
1 1/1 1 0 3 34
2 2/1 1 0 7 65
3 3/1 1 0 0 0
4 4/1 1 25 4 65
答案3
得分: 1
另一个选项是使用 any
和 c_across
来检查值是否为0以及最后的 row_number
,如下所示:
library(dplyr)
df %>%
group_by(user) %>%
filter(any(c_across(starts_with("X")) != 0 & row_number() == n()))
#> # A tibble: 4 × 5
#> # Groups: user [1]
#> date user X1 X2 X3
#> <chr> <int> <int> <int> <int>
#> 1 1/1 1 0 3 34
#> 2 2/1 1 0 7 65
#> 3 3/1 1 0 0 0
#> 4 4/1 1 25 4 65
创建于2023年3月15日,使用 reprex v2.0.2
英文:
Another option using any
with c_across
to check if the values are 0 and the last row_number
like this:
library(dplyr)
df %>%
group_by(user) %>%
filter(any(c_across(starts_with("X")) != 0 & row_number() == n()))
#> # A tibble: 4 × 5
#> # Groups: user [1]
#> date user X1 X2 X3
#> <chr> <int> <int> <int> <int>
#> 1 1/1 1 0 3 34
#> 2 2/1 1 0 7 65
#> 3 3/1 1 0 0 0
#> 4 4/1 1 25 4 65
<sup>Created on 2023-03-15 with reprex v2.0.2</sup>
答案4
得分: 0
虽然楼主明显更喜欢dplyr
,但这里提供了一个data.table
的解决方案。
library(data.table)
setDT(df)
df[, .SD[any(.SD[.N, X1:X3] != 0)], user]
user date X1 X2 X3
1: 1 1/1 0 3 34
2: 1 2/1 0 7 65
3: 1 3/1 0 0 0
4: 1 4/1 25 4 65
英文:
Although the OP clearly prefers dplyr
for completeness a data.table
solution
library(data.table)
setDT(df)
df[, .SD[any(.SD[.N, X1:X3] != 0)], user]
user date X1 X2 X3
1: 1 1/1 0 3 34
2: 1 2/1 0 7 65
3: 1 3/1 0 0 0
4: 1 4/1 25 4 65
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论