合并 R 中的列:相同数据,分开的列,同一文件,每行不同条件。

huangapple go评论57阅读模式
英文:

Combining columns in R: same data, separate columns, same file, different conditions per row

问题

我卡在这里有一段时间了。我正在处理一个复杂的调查,其中有26个问题,但它们是以随机顺序填写的。这不是关于移动NA数据,而是合并两列,而不添加其中的信息。我需要创建一个包含两列中所有行的一个新列,同时添加一个列来展示分配给每个id/行的条件。

这是我的数据目前的样子以及应该的样子。Var 1、Var 2 和 Var 3 是相同的变量,但它们分配给不同的条件。

有没有可能用R来做到这一点?否则我只能手动操作(在研究了7个小时如何在R中完成这个任务后,我觉得只是用Excel更容易)。谢谢!

合并 R 中的列:相同数据,分开的列,同一文件,每行不同条件。

我已经成功使用coalesce来做到这一点:

library(dplyr)

# 创建样本数据框
df <- data.frame(id = c(1, 2, 3),
                 var1 = c("apple", NA, "banana"),
                 var2 = c("", "orange", "yellow"))

# 如果var1为NA,则将var2添加到var1
df$var1 <- coalesce(df$var1, df$var2)

# 删除var2列
df <- select(df, -var2)

# 打印最终数据框
print(df)
英文:

I am very stuck on this for a while. I am working on a complex survey; it has 26 questions, but they were filled in randomised order. It is not about shifting NA data, it's merging 2 columns, without adding the information in them. I need to create 1 column containing all the rows from both column + add one more column to exhibit the condition assigned to each id/row.

This is an image of how my data looks right now and how it should look. Var 1, Var 2, and Var 3 are identical variables, but they were assigned to different conditions.

合并 R 中的列:相同数据,分开的列,同一文件,每行不同条件。

Any chance to be able to do this with R? Otherwise I go at it manually (after 7 hours of research in R on how to do this, I think it's easier just to crack up an excel). Thank you!

LE: I managed to do this with coalesce:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

library(dplyr)

# create sample data frame
df &lt;- data.frame(id = c(1, 2, 3),
                 var1 = c(&quot;apple&quot;, NA, &quot;banana&quot;),
                 var2 = c(&quot;&quot;, &quot;orange&quot;, &quot;yellow&quot;))

# add var2 to var1 if var1 is NA
df$var1 &lt;- coalesce(df$var1, df$var2)

# remove var2 column
df &lt;- select(df, -var2)

# print the final data frame
print(df)

<!-- end snippet -->

答案1

得分: 2

你可以使用Excel来完成这个任务,但如果由于任何原因需要重新进行,那么每次都要重新做一样的工作。编写脚本可能在第一次花费更多时间,但对于随后的操作来说会更快。

不管怎样,使用基本的R语言来完成这个任务的一种解决方案可能是运行类似以下的代码:

var1_new <- ifelse(randomiser == 1, var1_1, var1_2)

对于每个变量都可以这样做。

请注意,变量的名称可能与你的示例表格中的不同。在同一个数据集中不能有两个名称为"Var 1"的变量(而且也不允许有空格)。

英文:

You could do that with excel, but if for any reason you need to redo it, its the same work all over again. Writing a script might be longer the first time but much shorter for all the following times.

anyway, one solution to do this with base r wold simply be running something like:

var1_new &lt;- ifelse(randomiser == 1, var1_1, var1_2)

for each variable.

Please note that the variables are probably named differently than in your example table. There could not be two variables in the same dataset named Var 1 (and also no spaces are permitted).

huangapple
  • 本文由 发表于 2023年3月7日 16:18:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75659462.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定