英文:
Filter rows based on conditions in R
问题
我需要根据多个条件过滤数据框中的行。
条件是在group_by
之后,ID
和No
不是NA
或0
,然后Date
是最新的。
这是我的数据:
df <- data.frame(ID = rep(c(479,147,643,228,501), c(3,3,1,2,2)),
No = c("A009145","A009146","0","A146754",NA,"A156764",NA,NA,NA,NA,NA),
Date = c("2012-06-28","2020-01-10","2020-01-10","2014-11-28",NA,NA,"2019-04-13","2017-10-27","2017-12-01",NA,NA),
point = c(25.2,31.1,31.1,21.5,18.6,18.6,27.77,18.52,18.52,26.3,26.3))
我想要的结果如下:
ID No Date point
2 479 A009146 2020-01-10 31.1
4 147 A146754 2014-11-28 21.5
7 643 NA 2019-04-13 27.77
9 228 NA 2017-12-01 18.52
10 501 NA NA 26.3
11 501 NA NA 26.3
例如,在ID 479中,
ID No Date point
1 479 A009145 2012-06-28 25.2 (移除,因为Date不是最新的)
2 479 A009146 2020-01-10 31.1
3 479 0 2020-01-10 31.1 (移除,因为No是0)
例如,在ID 228中,当No
都是NA
时,将保留具有最新Date
的行。
ID No Date point
8 228 NA 2017-10-27 18.52 (移除,因为Date不是最新的)
9 228 NA 2017-12-01 18.52
例如,在ID 501中,No
和Date
都是NA
,因此保留它们。
这是我尝试过的代码:
df %>%
group_by(ID) %>%
filter(!(No == "0" | is.na(No))) %>%
slice(which.max(as.Date(Date, "%Y-%m-%d")))
但我得到了以下结果:
# A tibble: 2 × 4
# Groups: ID [2]
ID No Date point
<dbl> <chr> <chr> <dbl>
1 147 A146754 2014-11-28 21.5
2 479 A009146 2020-01-10 31.1
任何帮助将不胜感激。
英文:
I need to filter rows from dataframe based on multiple conditions.
Condition is after group_by
ID
, and No
is not NA
or 0
, then Date
is latest.
Here is my data like:
df <- data.frame(ID = rep(c(479,147,643,228,501), c(3,3,1,2,2)),
No = c("A009145","A009146","0","A146754",NA,"A156764",NA,NA,NA,NA,NA),
Date = c("2012-06-28","2020-01-10","2020-01-10","2014-11-28",NA,NA,"2019-04-13","2017-10-27","2017-12-01",NA,NA),
point = c(25.2,31.1,31.1,21.5,18.6,18.6,27.77,18.52,18.52,26.3,26.3))
ID No Date point
1 479 A009145 2012-06-28 25.2
2 479 A009146 2020-01-10 31.1
3 479 0 2020-01-10 31.1
4 147 A146754 2014-11-28 21.5
5 147 NA NA 18.6
6 147 A156764 NA 18.6
7 643 NA 2019-04-13 27.77
8 228 NA 2017-10-27 18.52
9 228 NA 2017-12-01 18.52
10 501 NA NA 26.3
11 501 NA NA 26.3
I want to get result like:
ID No Date point
2 479 A009146 2020-01-10 31.1
4 147 A146754 2014-11-28 21.5
7 643 NA 2019-04-13 27.77
9 228 NA 2017-12-01 18.52
10 501 NA NA 26.3
11 501 NA NA 26.3
For example in ID 479,
ID No Date point
1 479 A009145 2012-06-28 25.2 (romove because Date is not latest)
2 479 A009146 2020-01-10 31.1
3 479 0 2020-01-10 31.1 (romove because No is 0)
For example in ID 228, when the both of No
is NA
, will keep the row with latest Date
.
ID No Date point
8 228 NA 2017-10-27 18.52(romove because Date is not latest)
9 228 NA 2017-12-01 18.52
For example in ID 501, both of No
and Date
is NA
, so keep them both.
Here is my code I have tried:
df %>%
group_by(ID) %>%
filter(!(No == "0" | is.na(No))) %>%
slice(which.max(as.Date(Date, "%Y-%m-%d")))
But I got like
# A tibble: 2 × 4
# Groups: ID [2]
ID No Date point
<dbl> <chr> <chr> <dbl>
1 147 A146754 2014-11-28 21.5
2 479 A009146 2020-01-10 31.1
Any help will be appreciated.
答案1
得分: 2
使用dplyr
,您可以使用以下两个变量按顺序使用slice_max()
:
!(is.na(No) | No == '0')
Date
。
请注意,要按多个变量排序,您需要将它们包装在数据框或tibble中。
library(dplyr)
df %>%
slice_max(order_by = tibble(!(is.na(No) | No == '0'), Date),
by = ID)
# ID No Date point
# 2 479 A009146 2020-01-10 31.10
# 4 147 A146754 2014-11-28 21.50
# 7 643 <NA> 2019-04-13 27.77
# 9 228 <NA> 2017-12-01 18.52
# 10 501 <NA> <NA> 26.30
# 11 501 <NA> <NA> 26.30
英文:
With dplyr
, you can use slice_max()
ordered by the following 2 variables:
!(is.na(No) | No == '0')
Date
.
Note that to order by multiple variables, you need to wrap them in a data frame or tibble.
library(dplyr)
df %>%
slice_max(order_by = tibble(!(is.na(No) | No == '0'), Date),
by = ID)
# ID No Date point
# 2 479 A009146 2020-01-10 31.10
# 4 147 A146754 2014-11-28 21.50
# 7 643 <NA> 2019-04-13 27.77
# 9 228 <NA> 2017-12-01 18.52
# 10 501 <NA> <NA> 26.30
# 11 501 <NA> <NA> 26.30
答案2
得分: -1
使用tidyverse库中的dplyr
# 根据条件筛选
cells %>%
dplyr::filter(!is.na(No)) %>%
dplyr::filter(No != 0)
您可以使用逗号在单个筛选中筛选多个条件。它会同时满足所有条件。
更多信息请参阅dplyr库。
在R中处理电子表格时非常有用,可以在这里找到详细信息链接。
<details>
<summary>英文:</summary>
Using the library dplyr from tidyverse
filter by
cells %>%
dplyr::filter(!is.na(No)) %>%
dplyr::filter(No != 0)
You can filter multiple conditions in a single filter using commas. It will AND all conditions.
More info in the [dplyr libray][1].
Useful for working with spreadsheets in R / where I learnt this [2].
[1]: https://dplyr.tidyverse.org
[2]: https://nacnudus.github.io/spreadsheet-munging-strategies/
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论