获取子集函数中的变量值 – R

huangapple go评论104阅读模式
英文:

Get variable value in subset function - R

问题

  1. 在尝试获取子集函数中变量值时,我发现了一个问题。当运行代码时,我收到以下消息:“警告:Error in -: 无效的一元运算符参数”,因为子集函数“-c(val)”中的“val”未在上面定义为变量。
  2. cname <- c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10",
  3. "A11","A12","A13","A14","A15","A16","A17","A18","A19","A20",
  4. "A21","A22","A23","A24","A25","A26","A27","A28","A29","A30","A31")
  5. for (i in 15:length(cname)) {
  6. val <- cname[i]
  7. ifelse(sum(!is.na(df2$val))==0,
  8. df2 <- subset(df2, select = -c(val)),
  9. df2)
  10. }

df2 的结果是 此数据

我的期望结果是删除仅包含 NA 值的不必要列,如您可以在 这里 看到的那样。

如何获取 val 的值,以便删除仅包含 NA 值的列?

  1. <details>
  2. <summary>英文:</summary>
  3. I found an issue while trying to get the value of a variable in the subset function. When I run the code, I receive the message: &quot;Warning: Error in -: invalid argument to unary operator&quot; because &quot;val&quot; in subset function &quot;-c(val)&quot; not define as variable above.

cname <- c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10",
"A11","A12","A13","A14","A15","A16","A17","A18","A19","A20",
"A21","A22","A23","A24","A25","A26","A27","A28","A29","A30","A31")

for (i in 15:length(cname)) {
val <- cname[i]
ifelse(sum(!is.na(df2$val))==0,
df2 <- subset(df2, select = -c(val)),
df2)
}

  1. The df2 results in [this data][1].
  2. My expected result is to remove unnecessary columns that have NA values only, as you can see [here][2].
  3. How can I get the value from val, so I can remove the columns that have only NA values?
  4. [1]: https://i.stack.imgur.com/xOpmt.png
  5. [2]: https://i.stack.imgur.com/8pCkf.png
  6. </details>
  7. # 答案1
  8. **得分**: 0
  9. We can use `subset` without a loop - use the vectorized `colSums` on a logical matrix (`is.na(df2)`) to return the count of NAs in each column, compare (`!=`) it with the number of rows (`nrow(df2)`) to create a logical vector, subset the column names, use that in `select` argument in `subset`:
  10. ```R
  11. subset(df2, select = names(df2)[colSums(is.na(df2)) != nrow(df2)])

-output:

  1. A1 A2 A4 A5
  2. 1 1 1 NA 10
  3. 2 2 2 NA 10
  4. 3 3 3 NA 10
  5. 4 4 NA 3 10
  6. 5 5 5 2 10

Or with tidyverse - use select and check for any non-NA elements in each column for selecting the column:

  1. library(dplyr)
  2. df2 %>%
  3. select(where(~ any(!is.na(.x)))

-output:

  1. A1 A2 A4 A5
  2. 1 1 1 NA 10
  3. 2 2 2 NA 10
  4. 3 3 3 NA 10
  5. 4 4 NA 3 10
  6. 5 5 5 2 10

data

  1. df2 <- data.frame(A1 = 1:5, A2 = c(1:3, NA, 5), A3 = NA_integer_,
  2. A4 = c(NA, NA, NA, 3, 2), A5 = 10)
英文:

We can use subset without a loop - use the vectorized colSums on a logical matrix (is.na(df2)) to return the count of NAs in each column, compare (!=) it with the number of rows (nrow(df2)) to create a logical vector, subset the column names, use that in select argument in subset

  1. subset(df2, select = names(df2)[colSums(is.na(df2)) != nrow(df2)])

-output

  1. A1 A2 A4 A5
  2. 1 1 1 NA 10
  3. 2 2 2 NA 10
  4. 3 3 3 NA 10
  5. 4 4 NA 3 10
  6. 5 5 5 2 10

Or with tidyverse - use select and check for any non-NA elements in each column for selecting the column

  1. library(dplyr)
  2. df2 %&gt;%
  3. select(where(~ any(!is.na(.x))))

-output

  1. A1 A2 A4 A5
  2. 1 1 1 NA 10
  3. 2 2 2 NA 10
  4. 3 3 3 NA 10
  5. 4 4 NA 3 10
  6. 5 5 5 2 10

data

  1. df2 &lt;- data.frame(A1 = 1:5, A2 = c(1:3, NA, 5), A3 = NA_integer_,
  2. A4 = c(NA, NA, NA, 3, 2), A5 = 10)
  3. </details>

huangapple
  • 本文由 发表于 2023年4月4日 09:08:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75924775.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定