英文:
Combining columns in R: same data, separate columns, same file, different conditions per row
问题
我卡在这里有一段时间了。我正在处理一个复杂的调查,其中有26个问题,但它们是以随机顺序填写的。这不是关于移动NA数据,而是合并两列,而不添加其中的信息。我需要创建一个包含两列中所有行的一个新列,同时添加一个列来展示分配给每个id/行的条件。
这是我的数据目前的样子以及应该的样子。Var 1、Var 2 和 Var 3 是相同的变量,但它们分配给不同的条件。
有没有可能用R来做到这一点?否则我只能手动操作(在研究了7个小时如何在R中完成这个任务后,我觉得只是用Excel更容易)。谢谢!
我已经成功使用coalesce
来做到这一点:
library(dplyr)
# 创建样本数据框
df <- data.frame(id = c(1, 2, 3),
var1 = c("apple", NA, "banana"),
var2 = c("", "orange", "yellow"))
# 如果var1为NA,则将var2添加到var1
df$var1 <- coalesce(df$var1, df$var2)
# 删除var2列
df <- select(df, -var2)
# 打印最终数据框
print(df)
英文:
I am very stuck on this for a while. I am working on a complex survey; it has 26 questions, but they were filled in randomised order. It is not about shifting NA data, it's merging 2 columns, without adding the information in them. I need to create 1 column containing all the rows from both column + add one more column to exhibit the condition assigned to each id/row.
This is an image of how my data looks right now and how it should look. Var 1, Var 2, and Var 3 are identical variables, but they were assigned to different conditions.
Any chance to be able to do this with R? Otherwise I go at it manually (after 7 hours of research in R on how to do this, I think it's easier just to crack up an excel). Thank you!
LE: I managed to do this with coalesce:
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-js -->
library(dplyr)
# create sample data frame
df <- data.frame(id = c(1, 2, 3),
var1 = c("apple", NA, "banana"),
var2 = c("", "orange", "yellow"))
# add var2 to var1 if var1 is NA
df$var1 <- coalesce(df$var1, df$var2)
# remove var2 column
df <- select(df, -var2)
# print the final data frame
print(df)
<!-- end snippet -->
答案1
得分: 2
你可以使用Excel来完成这个任务,但如果由于任何原因需要重新进行,那么每次都要重新做一样的工作。编写脚本可能在第一次花费更多时间,但对于随后的操作来说会更快。
不管怎样,使用基本的R语言来完成这个任务的一种解决方案可能是运行类似以下的代码:
var1_new <- ifelse(randomiser == 1, var1_1, var1_2)
对于每个变量都可以这样做。
请注意,变量的名称可能与你的示例表格中的不同。在同一个数据集中不能有两个名称为"Var 1"的变量(而且也不允许有空格)。
英文:
You could do that with excel, but if for any reason you need to redo it, its the same work all over again. Writing a script might be longer the first time but much shorter for all the following times.
anyway, one solution to do this with base r wold simply be running something like:
var1_new <- ifelse(randomiser == 1, var1_1, var1_2)
for each variable.
Please note that the variables are probably named differently than in your example table. There could not be two variables in the same dataset named Var 1 (and also no spaces are permitted).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论