2023年6月16日 10:05:44go评论99阅读模式

英文:

Elegant way to merge many duplicated columns in R?

问题

我有两个数据框需要在R中合并。在合并之前，我不能真正修改它们，它们共享许多列，但在重复的列中没有重叠值，如果一个列中的一行有一个值，那么重复的列中相同的行就是NA。合并后，我有以下结果：

df1 <- data.frame(id = c(1,2,3,4),
                  Var1 = c(1,2,NA,NA),
                  Var2 = c("a","b",NA,NA))
df2 <- data.frame(id = c(1,2,3,4),
                  Var1 = c(NA,NA,3,4),
                  Var2 = c(NA,NA,"c","d"))
mergedData <- merge(df1, df2, by = "id")
mergedData <- mergedData[, c("Var1.x", "Var1.y", "Var2.x", "Var2.y")]
print(mergedData)
  Var1.x Var1.y Var2.x Var2.y
1      1     NA      a   <NA>
2      2     NA      b   <NA>
3     NA      3   <NA>      c
4     NA      4   <NA>      d

我写了一个函数，用于在其他脚本中合并两列，然后删除其中一个：

mergeColumns <- function(df, colToRemove, colToKeep) {
  df[[colToKeep]][is.na(df[[colToKeep]])] <- df[[colToRemove]][is.na(df[[colToKeep]])]
  df[[colToRemove]] <- NULL
  names(df)[names(df) %in% colToKeep] <- gsub(".x|.y", "", names(df)[names(df) %in% colToKeep])
  return(df)
}

这个函数在处理单个列时运行良好（尽管我确信有更清晰的方法）。然而，在我的合并数据框中，有许多需要合并的列。我不确定最佳方法是什么。我考虑过创建一个需要合并的列的数据框，并尝试使用apply函数族，但没有成功（我有点知道下面的方法不太可能成功，但认为它可能会让我朝正确的方向前进）：

duplicatedColumns <- data.frame(colstoKeep = c("Var1.x", "Var2.x"),
                                coltoRemove = c("Var1.y", "Var2.y"))
lapply(data, mergeColumns(duplicatedColumns$sampleIntakeCols, duplicatedColumns$rnaMetadataCols))
# Error in match.fun(FUN) : argument "FUN" is missing, with no default

希望能提供如何让这个工作或更好方法的任何见解。最好使用基本的R。提前感谢您。

英文:

I have two dataframes that I need to merge in R. I can't really modify them before the merge, and they share many columns, but there are no overlapping values in the duplicated columns, if one row in one column has a value then the same row in the duplicated column is NA. After the merge, I have something that looks like this:

df1 &lt;- data.frame(id = c(1,2,3,4),
                  Var1 = c(1,2,NA,NA),
                  Var2 = c(&quot;a&quot;,&quot;b&quot;,NA,NA))
df2 &lt;- data.frame(id = c(1,2,3,4),
                  Var1 = c(NA,NA,3,4),
                  Var2 = c(NA,NA,&quot;c&quot;,&quot;d&quot;))
mergedData &lt;- merge(df1, df2, by = &quot;id&quot;)
mergedData &lt;- mergedData[, c(&quot;Var1.x&quot;, &quot;Var1.y&quot;, &quot;Var2.x&quot;, &quot;Var2.y&quot;)]
print(mergedData)
  Var1.x Var1.y Var2.x Var2.y
1      1     NA      a   &lt;NA&gt;
2      2     NA      b   &lt;NA&gt;
3     NA      3   &lt;NA&gt;      c
4     NA      4   &lt;NA&gt;      d

I wrote a function to use in other scripts that merges two columns and then removes one of them:

mergeColumns &lt;- function(df, colToRemove, colToKeep) {
  df[[colToKeep]][is.na(df[[colToKeep]])] &lt;- df[[colToRemove]][is.na(df[[colToKeep]])]
  df[[colToRemove]] &lt;- NULL
  names(df)[names(df) %in% colToKeep] &lt;- gsub(&quot;.x|.y&quot;, &quot;&quot;, names(df)[names(df) %in% colToKeep])
  return(df)
}

This works fine (although I'm sure there is a cleaner approach) when working with single columns. However, in my merged dataframe I have many columns that need to be merged. I am not sure of the best way to do this. I thought about creating a dataframe of the columns that need to be merged and tried using the apply family of functions with no success (I sort of knew the approach below was unlikely to work but thought it might get me going in the right direction):

duplicatedColumns &lt;- data.frame(colstoKeep = c(&quot;Var1.x&quot;, &quot;Var2.x&#39;&quot;),
                                coltoRemove = c(&quot;Var1.y&quot;, &quot;Var2.y&quot;))
lapply(data, mergeColumns(duplicatedColumns$sampleIntakeCols, duplicatedColumns$rnaMetadataCols))
&gt;Error in match.fun(FUN) : argument &quot;FUN&quot; is missing, with no default

Any insight into how I might be able to get this to work of a better approach would be much appreciated. Base R would be preferable. Thanks in advance.

答案1

得分: 1

给定你需要递归修改数据框的情况，我只会使用一个老式的循环。你也可以使用 `grep()` 来识别列。

colstoKeep <- grep("\.x$", names(mergedData), value = TRUE)
colstoRemove <- grep("\.y$", names(mergedData), value = TRUE)

for (i in seq_along(colstoKeep)) {
mergedData <- mergeColumns(mergedData, colstoKeep[[i]], colstoRemove[[i]])
}
mergedData

Var1 Var2
1 1 a
2 2 b
3 1 c
4 2 d


<details>
<summary>英文:</summary>
Given that you need to recursively modify the dataframe, I’d just use a good old-fashioned loop. You can also use `grep()` to identify the columns.

colstoKeep <- grep("\.x$", names(mergedData), value = TRUE)
colstoRemove <- grep("\.y$", names(mergedData), value = TRUE)

for (i in seq_along(colstoKeep)) {
mergedData <- mergeColumns(mergedData, colstoKeep[[i]], colstoRemove[[i]])
}
mergedData

Var1 Var2
1 1 a
2 2 b
3 1 c
4 2 d


</details>
# 答案2
**得分**: 1
你可以使用循环或`apply`函数族。尝试这样做：
```R
mergedData <- data.frame(Var1.x = c(1,2,NA,NA),
                         Var1.y = c(NA,NA,1,2),
                         Var2.x = c("a", "b", NA, NA),
                         Var2.y = c(NA,NA,"c","d"))
duplicatedColumns <- data.frame(colstoKeep = c("Var1.x", "Var2.x"),
                                coltoRemove = c("Var1.y", "Var2.y"))
for (i in 1:nrow(duplicatedColumns)) {
  colToKeep <- duplicatedColumns$colstoKeep[i]
  colToRemove <- duplicatedColumns$coltoRemove[i]
  mergedData[[colToKeep]][is.na(mergedData[[colToKeep]])] <- mergedData[[colToRemove]][is.na(mergedData[[colToKeep]])]
  mergedData[[colToRemove]] <- NULL
  names(mergedData)[names(mergedData) == colToKeep] <- gsub(".x|.y", "", colToKeep)
}

英文:

you can use a loop or the apply family of functions. Try this:

mergedData &lt;- data.frame(Var1.x = c(1,2,NA,NA),
                         Var1.y = c(NA,NA,1,2),
                         Var2.x = c(&quot;a&quot;, &quot;b&quot;, NA, NA),
                         Var2.y = c(NA,NA,&quot;c&quot;,&quot;d&quot;))
duplicatedColumns &lt;- data.frame(colstoKeep = c(&quot;Var1.x&quot;, &quot;Var2.x&quot;),
                                coltoRemove = c(&quot;Var1.y&quot;, &quot;Var2.y&quot;))
for (i in 1:nrow(duplicatedColumns)) {
  colToKeep &lt;- duplicatedColumns$colstoKeep[i]
  colToRemove &lt;- duplicatedColumns$coltoRemove[i]
  mergedData[[colToKeep]][is.na(mergedData[[colToKeep]])] &lt;- mergedData[[colToRemove]][is.na(mergedData[[colToKeep]])]
  mergedData[[colToRemove]] &lt;- NULL
  names(mergedData)[names(mergedData) == colToKeep] &lt;- gsub(&quot;.x|.y&quot;, &quot;&quot;, colToKeep)
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Elegant way to merge many duplicated columns in R?

问题

答案1

将字符向量转换为带标签的数值向量

为什么 ylim 和 xlim 在成对绘图中不起作用？

关于分段拟合有什么问题？

使用Microsoft365R响应Teams消息

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。