英文:
Filter rows based on column name
问题
我有一个数据集,其中一些列以“Ref”开头。我想选择所有以“Ref”开头的列,并仅检查这些列中的条件。不幸的是,由于数据是在另一个步骤中生成的,我不知道有多少列以“Ref”开头,只知道会有。所有“Ref”列都包含数字。我想要检查的条件是,列中的任何数字是否不等于3或-1。
使用上面的数据集,行4到6应该被条件选中,因为它不符合条件。v1、b1和c1中的数据应该被排除,因为它们不是以“Ref”开头的。
我可以使用if-else语句,但那需要我将“Ref”列硬编码到语句中。如何使它足够通用,以便它可以选择符合条件的行,而不知道数据集中的所有“Ref”列?
英文:
I have a dataset and some of the columns start with "Ref". I want to select all the columns that with "Ref" and check a condition in those columns only. Unfortunately, as the data is generated in another step, I don't know how many columns will start with "Ref", just that it will. All the "Ref" columns contain numbers. The condition want I want to check if any of the numbers in the column are not equal to 3 or -1.
Ref_1 <- c(3,3,3,4,3,9)
Ref_2 <- c(-1,-1,-1,-1,8,3)
Ref_3 <- c(3,-1,3,3,3,3)
v1 <- c(2,4,3,1,-1,2)
b1 <- c(2,4,3,1,-1,2)
c1 <- c(2,4,3,1,-1,2)
df <- data.frame(Ref_1,Ref_2, Ref_3, v1, v1, c1)
Using the dataset above, row 4 - 6 should be picked up by the condition as it fails. The data in v1, b1 and c1 should be excluded because it didn't start with "Ref".
I can use a if-else statement, but that will require me to hard code the "Ref" columns into the statement. How do I make it general enough so that it will pick out the rows that meet that condition without knowing all the "Ref" columns in the dataset?
答案1
得分: 4
## 使用dplyr
library(dplyr)
df %>%
select(starts_with("Ref")) %>%
filter(if_any(everything(), \(x) !x %in% c(3, -1)))
# Ref_1 Ref_2 Ref_3
# 1 4 -1 3
# 2 3 8 3
# 3 9 3 3
## 使用基本R
result = df[startsWith(names(df), "Ref")]
result[rowSums(sapply(result, \(x) !x %in% c(3, -1))) > 0, ]
# Ref_1 Ref_2 Ref_3
# 4 4 -1 3
# 5 3 8 3
# 6 9 3 3
英文:
## with dplyr
library(dplyr)
df %>%
select(starts_with("Ref")) %>%
filter(if_any(everything(), \(x) !x %in% c(3, -1)))
# Ref_1 Ref_2 Ref_3
# 1 4 -1 3
# 2 3 8 3
# 3 9 3 3
## with base R
result = df[startsWith(names(df), "Ref")]
result[rowSums(sapply(result, \(x) !x %in% c(3, -1))) > 0, ]
# Ref_1 Ref_2 Ref_3
# 4 4 -1 3
# 5 3 8 3
# 6 9 3 3
答案2
得分: 0
使用 base:
cols <- grep("^Ref", colnames(df), value = TRUE)
df[ which(rowSums(df[, cols] == length(cols) |
df[, cols] == -1) < length(cols)), cols ]
# Ref_1 Ref_2 Ref_3
# 4 4 -1 3
# 5 3 8 3
# 6 9 3 3
英文:
Using base:
cols <- grep("^Ref", colnames(df), value = TRUE)
df[ which(rowSums(df[, cols] == length(cols) |
df[, cols] == -1) < length(cols)), cols ]
# Ref_1 Ref_2 Ref_3
# 4 4 -1 3
# 5 3 8 3
# 6 9 3 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论