2023年3月7日 16:18:07go评论88阅读模式

英文:

Combining columns in R: same data, separate columns, same file, different conditions per row

问题

我卡在这里有一段时间了。我正在处理一个复杂的调查，其中有26个问题，但它们是以随机顺序填写的。这不是关于移动NA数据，而是合并两列，而不添加其中的信息。我需要创建一个包含两列中所有行的一个新列，同时添加一个列来展示分配给每个id/行的条件。

这是我的数据目前的样子以及应该的样子。Var 1、Var 2 和 Var 3 是相同的变量，但它们分配给不同的条件。

有没有可能用R来做到这一点？否则我只能手动操作（在研究了7个小时如何在R中完成这个任务后，我觉得只是用Excel更容易）。谢谢！

我已经成功使用coalesce来做到这一点：

library(dplyr)
# 创建样本数据框
df <- data.frame(id = c(1, 2, 3),
                 var1 = c("apple", NA, "banana"),
                 var2 = c("", "orange", "yellow"))
# 如果var1为NA，则将var2添加到var1
df$var1 <- coalesce(df$var1, df$var2)
# 删除var2列
df <- select(df, -var2)
# 打印最终数据框
print(df)

英文:

I am very stuck on this for a while. I am working on a complex survey; it has 26 questions, but they were filled in randomised order. It is not about shifting NA data, it's merging 2 columns, without adding the information in them. I need to create 1 column containing all the rows from both column + add one more column to exhibit the condition assigned to each id/row.

This is an image of how my data looks right now and how it should look. Var 1, Var 2, and Var 3 are identical variables, but they were assigned to different conditions.

Any chance to be able to do this with R? Otherwise I go at it manually (after 7 hours of research in R on how to do this, I think it's easier just to crack up an excel). Thank you!

LE: I managed to do this with coalesce:

library(dplyr)
# create sample data frame
df &lt;- data.frame(id = c(1, 2, 3),
                 var1 = c(&quot;apple&quot;, NA, &quot;banana&quot;),
                 var2 = c(&quot;&quot;, &quot;orange&quot;, &quot;yellow&quot;))
# add var2 to var1 if var1 is NA
df$var1 &lt;- coalesce(df$var1, df$var2)
# remove var2 column
df &lt;- select(df, -var2)
# print the final data frame
print(df)

答案1

得分: 2

你可以使用Excel来完成这个任务，但如果由于任何原因需要重新进行，那么每次都要重新做一样的工作。编写脚本可能在第一次花费更多时间，但对于随后的操作来说会更快。

不管怎样，使用基本的R语言来完成这个任务的一种解决方案可能是运行类似以下的代码：

var1_new <- ifelse(randomiser == 1, var1_1, var1_2)

对于每个变量都可以这样做。

请注意，变量的名称可能与你的示例表格中的不同。在同一个数据集中不能有两个名称为"Var 1"的变量（而且也不允许有空格）。

英文:

You could do that with excel, but if for any reason you need to redo it, its the same work all over again. Writing a script might be longer the first time but much shorter for all the following times.

anyway, one solution to do this with base r wold simply be running something like:

var1_new &lt;- ifelse(randomiser == 1, var1_1, var1_2)

for each variable.

Please note that the variables are probably named differently than in your example table. There could not be two variables in the same dataset named Var 1 (and also no spaces are permitted).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

合并 R 中的列：相同数据，分开的列，同一文件，每行不同条件。

问题

答案1

在两因素重复测量方差分析中的误差传播

以计算效率为基础，操纵大型深度嵌套对象的方法？

Python输出在Quarto中的突出显示

Stacked Barplot in R-Studio

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。