2023年7月13日 16:15:35go评论118阅读模式

英文:

Filter rows based on conditions in R

问题

我需要根据多个条件过滤数据框中的行。
条件是在group_by之后，ID和No不是NA或0，然后Date是最新的。

这是我的数据：

df <- data.frame(ID = rep(c(479,147,643,228,501), c(3,3,1,2,2)),
           No = c("A009145","A009146","0","A146754",NA,"A156764",NA,NA,NA,NA,NA),
           Date = c("2012-06-28","2020-01-10","2020-01-10","2014-11-28",NA,NA,"2019-04-13","2017-10-27","2017-12-01",NA,NA),
           point = c(25.2,31.1,31.1,21.5,18.6,18.6,27.77,18.52,18.52,26.3,26.3))

我想要的结果如下：

    ID        No        Date   point
2  479   A009146  2020-01-10    31.1
4  147   A146754  2014-11-28    21.5
7  643        NA  2019-04-13    27.77
9  228        NA  2017-12-01    18.52
10 501        NA          NA    26.3
11 501        NA          NA    26.3

例如，在ID 479中，

    ID        No        Date   point
1  479   A009145  2012-06-28    25.2 (移除，因为Date不是最新的)
2  479   A009146  2020-01-10    31.1
3  479         0  2020-01-10    31.1 (移除，因为No是0)

例如，在ID 228中，当No都是NA时，将保留具有最新Date的行。

    ID        No        Date   point
8  228        NA  2017-10-27    18.52 (移除，因为Date不是最新的)
9  228        NA  2017-12-01    18.52

例如，在ID 501中，No和Date都是NA，因此保留它们。

这是我尝试过的代码：

df %>%
  group_by(ID) %>%
  filter(!(No == "0" | is.na(No))) %>%
  slice(which.max(as.Date(Date, "%Y-%m-%d")))

但我得到了以下结果：

# A tibble: 2 × 4
# Groups:   ID [2]
     ID No      Date       point
  <dbl> <chr>   <chr>      <dbl>
1   147 A146754 2014-11-28  21.5
2   479 A009146 2020-01-10  31.1

任何帮助将不胜感激。

英文:

I need to filter rows from dataframe based on multiple conditions.
Condition is after group_by ID, and No is not NA or 0, then Date is latest.

Here is my data like:

df &lt;- data.frame(ID = rep(c(479,147,643,228,501), c(3,3,1,2,2)),
           No = c(&quot;A009145&quot;,&quot;A009146&quot;,&quot;0&quot;,&quot;A146754&quot;,NA,&quot;A156764&quot;,NA,NA,NA,NA,NA),
           Date = c(&quot;2012-06-28&quot;,&quot;2020-01-10&quot;,&quot;2020-01-10&quot;,&quot;2014-11-28&quot;,NA,NA,&quot;2019-04-13&quot;,&quot;2017-10-27&quot;,&quot;2017-12-01&quot;,NA,NA),
           point = c(25.2,31.1,31.1,21.5,18.6,18.6,27.77,18.52,18.52,26.3,26.3))
    ID        No        Date   point
1  479   A009145  2012-06-28    25.2
2  479   A009146  2020-01-10    31.1
3  479         0  2020-01-10    31.1
4  147   A146754  2014-11-28    21.5
5  147        NA          NA    18.6
6  147   A156764          NA    18.6
7  643        NA  2019-04-13    27.77
8  228        NA  2017-10-27    18.52
9  228        NA  2017-12-01    18.52
10 501        NA          NA    26.3
11 501        NA          NA    26.3

I want to get result like:

    ID        No        Date   point
2  479   A009146  2020-01-10    31.1
4  147   A146754  2014-11-28    21.5
7  643        NA  2019-04-13    27.77
9  228        NA  2017-12-01    18.52
10 501        NA          NA    26.3
11 501        NA          NA    26.3

For example in ID 479,

    ID        No        Date   point
1  479   A009145  2012-06-28    25.2 (romove because Date is not latest)
2  479   A009146  2020-01-10    31.1
3  479         0  2020-01-10    31.1 (romove because No is 0)

For example in ID 228, when the both of No is NA, will keep the row with latest Date.

    ID        No        Date   point
8  228        NA  2017-10-27    18.52(romove because Date is not latest)
9  228        NA  2017-12-01    18.52

For example in ID 501, both of No and Date is NA, so keep them both.

Here is my code I have tried:

df %&gt;%
  group_by(ID) %&gt;%
  filter(!(No == &quot;0&quot; | is.na(No))) %&gt;%
  slice(which.max(as.Date(Date, &quot;%Y-%m-%d&quot;)))

But I got like

# A tibble: 2 &#215; 4
# Groups:   ID [2]
     ID No      Date       point
  &lt;dbl&gt; &lt;chr&gt;   &lt;chr&gt;      &lt;dbl&gt;
1   147 A146754 2014-11-28  21.5
2   479 A009146 2020-01-10  31.1

Any help will be appreciated.

答案1

得分: 2

使用dplyr，您可以使用以下两个变量按顺序使用slice_max()：

!(is.na(No) | No == '0')
Date。

请注意，要按多个变量排序，您需要将它们包装在数据框或tibble中。

library(dplyr)
df %>%
  slice_max(order_by = tibble(!(is.na(No) | No == '0'), Date),
                  by = ID)
#     ID      No       Date point
# 2  479 A009146 2020-01-10 31.10
# 4  147 A146754 2014-11-28 21.50
# 7  643    <NA> 2019-04-13 27.77
# 9  228    <NA> 2017-12-01 18.52
# 10 501    <NA>       <NA> 26.30
# 11 501    <NA>       <NA> 26.30

英文:

With dplyr, you can use slice_max() ordered by the following 2 variables:

!(is.na(No) | No == '0')
Date.

Note that to order by multiple variables, you need to wrap them in a data frame or tibble.

library(dplyr)
df %&gt;%
  slice_max(order_by = tibble(!(is.na(No) | No == &#39;0&#39;), Date),
                  by = ID)
#     ID      No       Date point
# 2  479 A009146 2020-01-10 31.10
# 4  147 A146754 2014-11-28 21.50
# 7  643    &lt;NA&gt; 2019-04-13 27.77
# 9  228    &lt;NA&gt; 2017-12-01 18.52
# 10 501    &lt;NA&gt;       &lt;NA&gt; 26.30
# 11 501    &lt;NA&gt;       &lt;NA&gt; 26.30

答案2

得分: -1

使用tidyverse库中的dplyr

# 根据条件筛选
cells %>%
    dplyr::filter(!is.na(No)) %>%
    dplyr::filter(No != 0)

您可以使用逗号在单个筛选中筛选多个条件。它会同时满足所有条件。

更多信息请参阅dplyr库。

在R中处理电子表格时非常有用，可以在这里找到详细信息链接。


<details>
<summary>英文:</summary>
Using the library dplyr from tidyverse

filter by

cells %>%
dplyr::filter(!is.na(No)) %>%
dplyr::filter(No != 0)


You can filter multiple conditions in a single filter using commas. It will AND all conditions.
More info in the [dplyr libray][1].
Useful for working with spreadsheets in R / where I learnt this 
[2].
  [1]: https://dplyr.tidyverse.org
  [2]: https://nacnudus.github.io/spreadsheet-munging-strategies/
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

基于条件筛选行在 R 中

问题

答案1

答案2

filter by

如何向plot_model添加CI(置信区间)的”whiskers”（线条）和垂直线？

st_buffer 函数与几何图形 sfc_POINT。

在R数据框中的矩阵/数组乘法。

使用ggdag和ggplot的标签。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。