2023年3月15日 19:33:31go评论187阅读模式

英文:

Filtering table based on the latest row condition

问题

我可以使用 dplyr 来选择在最后一个可用日期（4/1）对所有X列都有非零数据的用户吗？在这种情况下，用户2 应该被移除。谢谢。

英文:

I have a table like the following:

date  user  X1 X2 X3
1/1     1    0  3 34 
2/1     1    0  7 65
3/1     1    0  0  0
4/1     1   25  4 65
1/1     2  285  0  0
2/1     2    0  0  0
3/1     2    0 54  0
4/1     2    0  0  0

How can I use dplyr to select the users that have non-zero data only at the last available date (4/1) for all Xs ?? In this case user 2 should be removed. Thanks

答案1

得分: 4

使用if_any来保留一个组，只要该组的最后一行中所选列中有一个值不等于0：

library(dplyr) #1.1.0+
df %>%
  filter(if_any(X1:X3, ~ .x[n()] != 0), .by = user)

#   date user X1 X2 X3
# 1  1/1    1  0  3 34
# 2  2/1    1  0  7 65
# 3  3/1    1  0  0  0
# 4  4/1    1 25  4 65

希望这对你有所帮助。

英文:

With if_any to keep a group if any of the selected column in the last row for a group has a value different from 0:

library(dplyr) #1.1.0+
df %&gt;%
  filter(if_any(X1:X3, ~ .x[n()] != 0), .by = user)

#   date user X1 X2 X3
# 1  1/1    1  0  3 34
# 2  2/1    1  0  7 65
# 3  3/1    1  0  0  0
# 4  4/1    1 25  4 65

答案2

得分: 2

使用dplyr，我们可以计算last记录的rowSums。

library(dplyr)

# or across(X1:X3, last) if you only have positive values
df %>% filter(rowSums(across(X1:X3, ~last(abs(.x)))) != 0, .by = user)

  date user X1 X2 X3
1  1/1    1  0  3 34
2  2/1    1  0  7 65
3  3/1    1  0  0  0
4  4/1    1 25  4 65

请注意，代码部分没有进行翻译。

英文:

With dplyr, we can calculate the rowSums of the last record.

library(dplyr)

# or across(X1:X3, last) if you only have positive values
df %&gt;% filter(rowSums(across(X1:X3, ~last(abs(.x)))) != 0, .by = user)

  date user X1 X2 X3
1  1/1    1  0  3 34
2  2/1    1  0  7 65
3  3/1    1  0  0  0
4  4/1    1 25  4 65

答案3

得分: 1

另一个选项是使用 any 和 c_across 来检查值是否为0以及最后的 row_number，如下所示：

library(dplyr)
df %>%
  group_by(user) %>%
  filter(any(c_across(starts_with("X")) != 0 & row_number() == n()))
#> # A tibble: 4 × 5
#> # Groups:   user [1]
#>   date   user    X1    X2    X3
#>   <chr> <int> <int> <int> <int>
#> 1 1/1       1     0     3    34
#> 2 2/1       1     0     7    65
#> 3 3/1       1     0     0     0
#> 4 4/1       1    25     4    65

^{创建于2023年3月15日，使用 reprex v2.0.2}

英文:

Another option using any with c_across to check if the values are 0 and the last row_number like this:

library(dplyr)
df %&gt;%
  group_by(user) %&gt;%
  filter(any(c_across(starts_with(&quot;X&quot;)) != 0 &amp; row_number() == n()))
#&gt; # A tibble: 4 &#215; 5
#&gt; # Groups:   user [1]
#&gt;   date   user    X1    X2    X3
#&gt;   &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;
#&gt; 1 1/1       1     0     3    34
#&gt; 2 2/1       1     0     7    65
#&gt; 3 3/1       1     0     0     0
#&gt; 4 4/1       1    25     4    65

<sup>Created on 2023-03-15 with reprex v2.0.2</sup>

答案4

得分: 0

虽然楼主明显更喜欢dplyr，但这里提供了一个data.table的解决方案。

library(data.table)

setDT(df)

df[, .SD[any(.SD[.N, X1:X3] != 0)], user]

   user date X1 X2 X3
1:    1  1/1  0  3 34
2:    1  2/1  0  7 65
3:    1  3/1  0  0  0
4:    1  4/1 25  4 65

英文:

Although the OP clearly prefers dplyr for completeness a data.table solution

library(data.table)

setDT(df)

df[, .SD[any(.SD[.N, X1:X3] != 0)], user]

   user date X1 X2 X3
1:    1  1/1  0  3 34
2:    1  2/1  0  7 65
3:    1  3/1  0  0  0
4:    1  4/1 25  4 65

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据最新行条件筛选表格

问题

答案1

答案2

答案3

答案4

如何在R中更改多个文件中特定列的名称？

代码块和输出的背景颜色以及边框颜色在Rmarkdown Beamer中。

在R中将Levene检验和双向方差分析放入用户定义函数中。

修正Gif中的图表位置 – 用动态轴标签播放图表

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论