如何在使用paste()函数指定变量时测试变量的值是否为NA?

huangapple go评论48阅读模式
英文:

How do I test if the values of a variable are NAs while using the paste() function to designate the variable?

问题

Sure, here's the translated code part:

我想创建一个循环,如果原始变量的相应值也为NA,则将多个虚拟变量的值替换为NA(以重新编码MCQ调查)。

我有19个问题,标记为Q1到Q19,虚拟变量标记为Q1\_[answer1],Q1\_[answer2]等等。我创建了值为10的虚拟变量,而不是嵌套另一个ifelse函数来查看Q1,Q2等的值,我想创建一个循环,自动获取虚拟变量(通过使用grep("Q", [n], "_"),其中n随着循环的进行而增加)。

这基本上是我的数据框的样子

```R
df1 <- data.frame(Q1 = c("a", "b,c", NA, "a,b", "b", NA, "c"))

为了检查我的Q1的值是否缺失,我想使用以下代码(或等效代码:

is.na(paste0("df$Q",n))

这将允许我循环遍历不同的问题。然而,这会测试"df$Q1"是否等于NA,而不是查看Q1作为变量。我想找到一种方法,使它就像我直接输入"df$Q1"一样,返回变量所有值的is.na()测试结果:

is.na(df$Q1)

是否有类似is.na或paste0的函数可以轻松实现这个?


<details>
<summary>英文:</summary>

I&#39;d like to create a loop which replaces the values of several dummy variables by NA if the corresponding value of the original variable is NA as well (in order to recode a MCQ survey).

I have 19 questions, labeled Q1 through Q19, with dummy variables labeled Q1\_\[answer1\], Q1\_\[answer2\] etc. I made dummy variables with values 1 and 0, and instead of nesting another ifelse function which looks at the value of Q1, Q2 etc, I&#39;d like to create a loop that takes the dummy variables automatically (by using grep(&quot;Q&quot;, \[n\], &quot;\_&quot;) where n increases as the loop progresses).

Here is essentially what my dataframe is like

df1 <- data.frame(Q1 = c("a", "b,c", NA, "a,b", "b", NA, "c"))

#this is done for the purposes of the loop, which I'm not including here
n <- 1


In order to check if the values of my Q1 are missing or not, I&#39;d like to use the following code (or equivalent:

is.na(paste0("df$Q",n))

[1] FALSE


which would allow me to cycle through the different questions. However, this tests if &quot;df$Q1&quot; is equal to NA rather than looking at Q1 as a variable. I would like to find a way for it to be like if I had input &quot;df$Q1&quot; directly, which returns the list of results for the is.na() test for all values of the variable:

is.na(df$Q1)

[1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE


Is there a function like is.na or like paste0 which would allow me to do this easily?

</details>


# 答案1
**得分**: 1

如何处理这个问题可能取决于您打算如何处理结果。如评论中所示,最简单的方法是通过 `is.na()` 函数将向量传递。

如果您有一个大型数据框,这可能会很费力。相反,使用循环或 `sapply` 函数可能会得到您需要的结果。

扩展您的数据框以进行演示:

```R
df1 <- data.frame(Q1 = c("a", "b,c", NA, "a,b", "b", NA, "c"), 
                  Q2 = c("a", "b,c", "a.b", "a,b", "b", "b", "c"),
                  Q3 = c("a", "b,c", NA, "a,b", "b", "c", NA))

for(i in 1:length(names(df1))) {
    print(is.na(df1[[paste0("Q", i)]]))
}

结果如下:

[1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[1] FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE

或者使用 sapply(df1, is.na)

     Q1    Q2    Q3
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,]  TRUE FALSE  TRUE
[4,] FALSE FALSE FALSE
[5,] FALSE FALSE FALSE
[6,]  TRUE FALSE FALSE
[7,] FALSE FALSE  TRUE

一个技巧是使用 sapply(df1, is.na) * 1,这样可以轻松进行求和。

英文:

How you tackle this, may depend on what you intend to do with the result. The simplest approach, as given in the comments, is to pass the vector through is.na().

If you have a large data.frame this can be labourious. Instead, a loop or the sapply may give what you need.

Extending your data.frame to demonstrate:

df1 &lt;- data.frame(Q1 = c(&quot;a&quot;, &quot;b,c&quot;, NA, &quot;a,b&quot;, &quot;b&quot;, NA, &quot;c&quot;), 
                  Q2 = c(&quot;a&quot;, &quot;b,c&quot;, &quot;a.b&quot;, &quot;a,b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;c&quot;),
                  Q3 = c(&quot;a&quot;, &quot;b,c&quot;, NA, &quot;a,b&quot;, &quot;b&quot;, &quot;c&quot;, NA))


  for(i in 1:length(names(df1))) {
    print(is.na(df1[[paste0(&quot;Q&quot;, i)]]))
  }

Gives:

[1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[1] FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE

or sapply(df1, is.na)

        Q1    Q2    Q3
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,]  TRUE FALSE  TRUE
[4,] FALSE FALSE FALSE
[5,] FALSE FALSE FALSE
[6,]  TRUE FALSE FALSE
[7,] FALSE FALSE  TRUE

One trick is to use sapply(df1, is.na) * 1, which allows easy summation.

huangapple
  • 本文由 发表于 2023年4月19日 17:41:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76053005.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定