英文:
How do I test if the values of a variable are NAs while using the paste() function to designate the variable?
问题
Sure, here's the translated code part:
我想创建一个循环,如果原始变量的相应值也为NA,则将多个虚拟变量的值替换为NA(以重新编码MCQ调查)。
我有19个问题,标记为Q1到Q19,虚拟变量标记为Q1\_[answer1],Q1\_[answer2]等等。我创建了值为1和0的虚拟变量,而不是嵌套另一个ifelse函数来查看Q1,Q2等的值,我想创建一个循环,自动获取虚拟变量(通过使用grep("Q", [n], "_"),其中n随着循环的进行而增加)。
这基本上是我的数据框的样子
```R
df1 <- data.frame(Q1 = c("a", "b,c", NA, "a,b", "b", NA, "c"))
为了检查我的Q1的值是否缺失,我想使用以下代码(或等效代码:
is.na(paste0("df$Q",n))
这将允许我循环遍历不同的问题。然而,这会测试"df$Q1"是否等于NA,而不是查看Q1作为变量。我想找到一种方法,使它就像我直接输入"df$Q1"一样,返回变量所有值的is.na()测试结果:
is.na(df$Q1)
是否有类似is.na或paste0的函数可以轻松实现这个?
<details>
<summary>英文:</summary>
I'd like to create a loop which replaces the values of several dummy variables by NA if the corresponding value of the original variable is NA as well (in order to recode a MCQ survey).
I have 19 questions, labeled Q1 through Q19, with dummy variables labeled Q1\_\[answer1\], Q1\_\[answer2\] etc. I made dummy variables with values 1 and 0, and instead of nesting another ifelse function which looks at the value of Q1, Q2 etc, I'd like to create a loop that takes the dummy variables automatically (by using grep("Q", \[n\], "\_") where n increases as the loop progresses).
Here is essentially what my dataframe is like
df1 <- data.frame(Q1 = c("a", "b,c", NA, "a,b", "b", NA, "c"))
#this is done for the purposes of the loop, which I'm not including here
n <- 1
In order to check if the values of my Q1 are missing or not, I'd like to use the following code (or equivalent:
is.na(paste0("df$Q",n))
[1] FALSE
which would allow me to cycle through the different questions. However, this tests if "df$Q1" is equal to NA rather than looking at Q1 as a variable. I would like to find a way for it to be like if I had input "df$Q1" directly, which returns the list of results for the is.na() test for all values of the variable:
is.na(df$Q1)
[1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE
Is there a function like is.na or like paste0 which would allow me to do this easily?
</details>
# 答案1
**得分**: 1
如何处理这个问题可能取决于您打算如何处理结果。如评论中所示,最简单的方法是通过 `is.na()` 函数将向量传递。
如果您有一个大型数据框,这可能会很费力。相反,使用循环或 `sapply` 函数可能会得到您需要的结果。
扩展您的数据框以进行演示:
```R
df1 <- data.frame(Q1 = c("a", "b,c", NA, "a,b", "b", NA, "c"),
Q2 = c("a", "b,c", "a.b", "a,b", "b", "b", "c"),
Q3 = c("a", "b,c", NA, "a,b", "b", "c", NA))
for(i in 1:length(names(df1))) {
print(is.na(df1[[paste0("Q", i)]]))
}
结果如下:
[1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[1] FALSE FALSE TRUE FALSE FALSE FALSE TRUE
或者使用 sapply(df1, is.na)
:
Q1 Q2 Q3
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] TRUE FALSE TRUE
[4,] FALSE FALSE FALSE
[5,] FALSE FALSE FALSE
[6,] TRUE FALSE FALSE
[7,] FALSE FALSE TRUE
一个技巧是使用 sapply(df1, is.na) * 1
,这样可以轻松进行求和。
英文:
How you tackle this, may depend on what you intend to do with the result. The simplest approach, as given in the comments, is to pass the vector through is.na()
.
If you have a large data.frame this can be labourious. Instead, a loop or the sapply
may give what you need.
Extending your data.frame to demonstrate:
df1 <- data.frame(Q1 = c("a", "b,c", NA, "a,b", "b", NA, "c"),
Q2 = c("a", "b,c", "a.b", "a,b", "b", "b", "c"),
Q3 = c("a", "b,c", NA, "a,b", "b", "c", NA))
for(i in 1:length(names(df1))) {
print(is.na(df1[[paste0("Q", i)]]))
}
Gives:
[1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[1] FALSE FALSE TRUE FALSE FALSE FALSE TRUE
or sapply(df1, is.na)
Q1 Q2 Q3
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] TRUE FALSE TRUE
[4,] FALSE FALSE FALSE
[5,] FALSE FALSE FALSE
[6,] TRUE FALSE FALSE
[7,] FALSE FALSE TRUE
One trick is to use sapply(df1, is.na) * 1
, which allows easy summation.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论