根据列名筛选行

huangapple go评论72阅读模式
英文:

Filter rows based on column name

问题

我有一个数据集,其中一些列以“Ref”开头。我想选择所有以“Ref”开头的列,并仅检查这些列中的条件。不幸的是,由于数据是在另一个步骤中生成的,我不知道有多少列以“Ref”开头,只知道会有。所有“Ref”列都包含数字。我想要检查的条件是,列中的任何数字是否不等于3或-1。

使用上面的数据集,行4到6应该被条件选中,因为它不符合条件。v1、b1和c1中的数据应该被排除,因为它们不是以“Ref”开头的。

我可以使用if-else语句,但那需要我将“Ref”列硬编码到语句中。如何使它足够通用,以便它可以选择符合条件的行,而不知道数据集中的所有“Ref”列?

英文:

I have a dataset and some of the columns start with "Ref". I want to select all the columns that with "Ref" and check a condition in those columns only. Unfortunately, as the data is generated in another step, I don't know how many columns will start with "Ref", just that it will. All the "Ref" columns contain numbers. The condition want I want to check if any of the numbers in the column are not equal to 3 or -1.

Ref_1 <- c(3,3,3,4,3,9)
Ref_2 <- c(-1,-1,-1,-1,8,3)
Ref_3 <- c(3,-1,3,3,3,3)
v1 <- c(2,4,3,1,-1,2)
b1 <- c(2,4,3,1,-1,2)
c1 <- c(2,4,3,1,-1,2)
df <- data.frame(Ref_1,Ref_2, Ref_3, v1, v1, c1)

Using the dataset above, row 4 - 6 should be picked up by the condition as it fails. The data in v1, b1 and c1 should be excluded because it didn't start with "Ref".

I can use a if-else statement, but that will require me to hard code the "Ref" columns into the statement. How do I make it general enough so that it will pick out the rows that meet that condition without knowing all the "Ref" columns in the dataset?

答案1

得分: 4

## 使用dplyr
library(dplyr)
df %>%
  select(starts_with("Ref")) %>%
  filter(if_any(everything(), \(x) !x %in% c(3, -1)))
#   Ref_1 Ref_2 Ref_3
# 1     4    -1     3
# 2     3     8     3
# 3     9     3     3

## 使用基本R
result = df[startsWith(names(df), "Ref")]
result[rowSums(sapply(result, \(x) !x %in% c(3, -1))) > 0, ]
#   Ref_1 Ref_2 Ref_3
# 4     4    -1     3
# 5     3     8     3
# 6     9     3     3
英文:
## with dplyr
library(dplyr)
df %>%
  select(starts_with("Ref")) %>%
  filter(if_any(everything(), \(x) !x %in% c(3, -1)))
#   Ref_1 Ref_2 Ref_3
# 1     4    -1     3
# 2     3     8     3
# 3     9     3     3

## with base R
result = df[startsWith(names(df), "Ref")]
result[rowSums(sapply(result, \(x) !x %in% c(3, -1))) > 0, ]
#   Ref_1 Ref_2 Ref_3
# 4     4    -1     3
# 5     3     8     3
# 6     9     3     3

答案2

得分: 0

使用 base

cols <- grep("^Ref", colnames(df), value = TRUE)
df[ which(rowSums(df[, cols] == length(cols) |
                  df[, cols] == -1) < length(cols)), cols ]
#   Ref_1 Ref_2 Ref_3
# 4     4    -1     3
# 5     3     8     3
# 6     9     3     3
英文:

Using base:

cols &lt;- grep(&quot;^Ref&quot;, colnames(df), value = TRUE)
df[ which(rowSums(df[, cols] == length(cols) |
                  df[, cols] == -1) &lt; length(cols)), cols ]
#   Ref_1 Ref_2 Ref_3
# 4     4    -1     3
# 5     3     8     3
# 6     9     3     3

huangapple
  • 本文由 发表于 2023年4月7日 03:52:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75953275.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定