英文:
Filter a column in R based on another column with 2 criteria
问题
我有一个名为“vouchers”的R数据框中的列 - “issue_slip”,其中包含以下值/行:
发行单:
IS/001,
IS/001,
IS/001,
IS/002,
IS/002,
IS/002
还有另一列“rec_status”,其值为0或1。
每个issue_slip行都可以具有rec_status为0或1。
我想只保留那些具有所有rec_status为0或0和1的issue_slips --> 删除具有所有rec_status为1的issue_slip行。
例如,
- IS/001 - 1,
- IS/001 - 0,
- IS/001 - 1
应该显示,并且不会被过滤掉,因为至少有一行具有rec_status = 1。
我尝试使用filter和subset函数,但无法弄清楚如何在同一列中进行筛选。
英文:
I have a column - "issue_slip" in R dataframe - "vouchers" with values/rows such as
Issue slip:
IS/001,
IS/001,
IS/001,
IS/002,
IS/002,
IS/002
and another column "rec_status" with values 0 or 1.
Each issue_slip row can have rec_status 0 or 1.
I would like to keep only those issue_slips that have all rec_status as 0 OR 0 or 1 --> remove issue_slip rows that have all rec_status as 1.
For example,
- IS/001 - 1,
- IS/001 - 0 ,
- IS/001 - 1
should show up and not get filtered out because at least one row has rec_status = 1
I tried using the filter and subset functions but could not figure out how to go about filtering this in the same column
答案1
得分: 1
以下是翻译好的部分:
样本数据
```r
quux <- data.frame(issue_slip = c("IS/001", "IS/001", "IS/001", "IS/002", "IS/002", "IS/002"), rec_status = c(0, 0, 1, 1, 1, 1))
quux
# issue_slip rec_status
# 1 IS/001 0
# 2 IS/001 0
# 3 IS/001 1
# 4 IS/002 1
# 5 IS/002 1
# 6 IS/002 1
基本 R
ind <- ave(quux$rec_status, quux$issue_slip, FUN = function(z) any(z %in% 0)) > 0
ind
# [1] TRUE TRUE TRUE FALSE FALSE FALSE
quux[ind,]
# issue_slip rec_status
# 1 IS/001 0
# 2 IS/001 0
# 3 IS/001 1
dplyr
library(dplyr)
quux %>%
group_by(issue_slip) %>%
filter(any(rec_status %in% 0)) %>%
ungroup()
# # A tibble: 3 × 2
# issue_slip rec_status
# <chr> <dbl>
# 1 IS/001 0
# 2 IS/001 0
# 3 IS/001 1
data.table
library(data.table)
as.data.table(quux)[, .SD[any(rec_status %in% 0),], by = issue_slip]
# issue_slip rec_status
# <char> <num>
# 1: IS/001 0
# 2: IS/001 0
# 3: IS/001 1
注意,我使用 rec_status %in% 0
而不是 rec_status == 0
有一个原因:因为我们没有示例数据(通常即使有数据,也不能保证没有任何 NA
),我不能确定数据中是否存在 NA
。请注意,NA == 0
将返回 NA
本身,通常会导致非防御性的代码失败,但 NA %in% 0
返回假,这通常是我们需要的(我推测这是我们想要的情况)。
英文:
Sample data
quux <- data.frame(issue_slip = c("IS/001", "IS/001", "IS/001", "IS/002", "IS/002", "IS/002"), rec_status = c(0, 0, 1, 1, 1, 1))
quux
# issue_slip rec_status
# 1 IS/001 0
# 2 IS/001 0
# 3 IS/001 1
# 4 IS/002 1
# 5 IS/002 1
# 6 IS/002 1
base R
ind <- ave(quux$rec_status, quux$issue_slip, FUN = function(z) any(z %in% 0)) > 0
ind
# [1] TRUE TRUE TRUE FALSE FALSE FALSE
quux[ind,]
# issue_slip rec_status
# 1 IS/001 0
# 2 IS/001 0
# 3 IS/001 1
dplyr
library(dplyr)
quux %>%
group_by(issue_slip) %>%
filter(any(rec_status %in% 0)) %>%
ungroup()
# # A tibble: 3 × 2
# issue_slip rec_status
# <chr> <dbl>
# 1 IS/001 0
# 2 IS/001 0
# 3 IS/001 1
data.table
library(data.table)
as.data.table(quux)[, .SD[any(rec_status %in% 0),], by = issue_slip]
# issue_slip rec_status
# <char> <num>
# 1: IS/001 0
# 2: IS/001 0
# 3: IS/001 1
Note, I'm using rec_status %in% 0
instead of rec_status == 0
for a reason: since we have no sample data (and often even when we do), I have no assurance that there are not any NA
s in the data; note that NA == 0
will return NA
itself and therefore often fail non-defensive code, but NA %in% 0
returns false, which is often what we need (and I'm inferring it's what we want here).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论