Elegant way to merge many duplicated columns in R?

huangapple go评论88阅读模式

Elegant way to merge many duplicated columns in R?



df1 <- data.frame(id = c(1,2,3,4),
                  Var1 = c(1,2,NA,NA),
                  Var2 = c("a","b",NA,NA))

df2 <- data.frame(id = c(1,2,3,4),
                  Var1 = c(NA,NA,3,4),
                  Var2 = c(NA,NA,"c","d"))

mergedData <- merge(df1, df2, by = "id")

mergedData <- mergedData[, c("Var1.x", "Var1.y", "Var2.x", "Var2.y")]

  Var1.x Var1.y Var2.x Var2.y
1      1     NA      a   <NA>
2      2     NA      b   <NA>
3     NA      3   <NA>      c
4     NA      4   <NA>      d


mergeColumns <- function(df, colToRemove, colToKeep) {
  df[[colToKeep]][is.na(df[[colToKeep]])] <- df[[colToRemove]][is.na(df[[colToKeep]])]
  df[[colToRemove]] <- NULL
  names(df)[names(df) %in% colToKeep] <- gsub(".x|.y", "", names(df)[names(df) %in% colToKeep])


duplicatedColumns <- data.frame(colstoKeep = c("Var1.x", "Var2.x"),
                                coltoRemove = c("Var1.y", "Var2.y"))

lapply(data, mergeColumns(duplicatedColumns$sampleIntakeCols, duplicatedColumns$rnaMetadataCols))
# Error in match.fun(FUN) : argument "FUN" is missing, with no default



I have two dataframes that I need to merge in R. I can't really modify them before the merge, and they share many columns, but there are no overlapping values in the duplicated columns, if one row in one column has a value then the same row in the duplicated column is NA. After the merge, I have something that looks like this:

df1 &lt;- data.frame(id = c(1,2,3,4),
                  Var1 = c(1,2,NA,NA),
                  Var2 = c(&quot;a&quot;,&quot;b&quot;,NA,NA))

df2 &lt;- data.frame(id = c(1,2,3,4),
                  Var1 = c(NA,NA,3,4),
                  Var2 = c(NA,NA,&quot;c&quot;,&quot;d&quot;))

mergedData &lt;- merge(df1, df2, by = &quot;id&quot;)

mergedData &lt;- mergedData[, c(&quot;Var1.x&quot;, &quot;Var1.y&quot;, &quot;Var2.x&quot;, &quot;Var2.y&quot;)]

  Var1.x Var1.y Var2.x Var2.y
1      1     NA      a   &lt;NA&gt;
2      2     NA      b   &lt;NA&gt;
3     NA      3   &lt;NA&gt;      c
4     NA      4   &lt;NA&gt;      d

I wrote a function to use in other scripts that merges two columns and then removes one of them:

mergeColumns &lt;- function(df, colToRemove, colToKeep) {
  df[[colToKeep]][is.na(df[[colToKeep]])] &lt;- df[[colToRemove]][is.na(df[[colToKeep]])]
  df[[colToRemove]] &lt;- NULL
  names(df)[names(df) %in% colToKeep] &lt;- gsub(&quot;.x|.y&quot;, &quot;&quot;, names(df)[names(df) %in% colToKeep])

This works fine (although I'm sure there is a cleaner approach) when working with single columns. However, in my merged dataframe I have many columns that need to be merged. I am not sure of the best way to do this. I thought about creating a dataframe of the columns that need to be merged and tried using the apply family of functions with no success (I sort of knew the approach below was unlikely to work but thought it might get me going in the right direction):

duplicatedColumns &lt;- data.frame(colstoKeep = c(&quot;Var1.x&quot;, &quot;Var2.x&#39;&quot;),
                                coltoRemove = c(&quot;Var1.y&quot;, &quot;Var2.y&quot;))

lapply(data, mergeColumns(duplicatedColumns$sampleIntakeCols, duplicatedColumns$rnaMetadataCols))
&gt;Error in match.fun(FUN) : argument &quot;FUN&quot; is missing, with no default

Any insight into how I might be able to get this to work of a better approach would be much appreciated. Base R would be preferable. Thanks in advance.


得分: 1

给定你需要递归修改数据框的情况,我只会使用一个老式的循环。你也可以使用 `grep()` 来识别列。

colstoKeep <- grep("\.x$", names(mergedData), value = TRUE)
colstoRemove <- grep("\.y$", names(mergedData), value = TRUE)

for (i in seq_along(colstoKeep)) {
mergedData <- mergeColumns(mergedData, colstoKeep[[i]], colstoRemove[[i]])

Var1 Var2
1 1 a
2 2 b
3 1 c
4 2 d


Given that you need to recursively modify the dataframe, I’d just use a good old-fashioned loop. You can also use `grep()` to identify the columns.

colstoKeep <- grep("\.x$", names(mergedData), value = TRUE)
colstoRemove <- grep("\.y$", names(mergedData), value = TRUE)

for (i in seq_along(colstoKeep)) {
mergedData <- mergeColumns(mergedData, colstoKeep[[i]], colstoRemove[[i]])

Var1 Var2
1 1 a
2 2 b
3 1 c
4 2 d


# 答案2
**得分**: 1


mergedData <- data.frame(Var1.x = c(1,2,NA,NA),
                         Var1.y = c(NA,NA,1,2),
                         Var2.x = c("a", "b", NA, NA),
                         Var2.y = c(NA,NA,"c","d"))

duplicatedColumns <- data.frame(colstoKeep = c("Var1.x", "Var2.x"),
                                coltoRemove = c("Var1.y", "Var2.y"))

for (i in 1:nrow(duplicatedColumns)) {
  colToKeep <- duplicatedColumns$colstoKeep[i]
  colToRemove <- duplicatedColumns$coltoRemove[i]
  mergedData[[colToKeep]][is.na(mergedData[[colToKeep]])] <- mergedData[[colToRemove]][is.na(mergedData[[colToKeep]])]
  mergedData[[colToRemove]] <- NULL
  names(mergedData)[names(mergedData) == colToKeep] <- gsub(".x|.y", "", colToKeep)

you can use a loop or the apply family of functions. Try this:

mergedData &lt;- data.frame(Var1.x = c(1,2,NA,NA),
                         Var1.y = c(NA,NA,1,2),
                         Var2.x = c(&quot;a&quot;, &quot;b&quot;, NA, NA),
                         Var2.y = c(NA,NA,&quot;c&quot;,&quot;d&quot;))

duplicatedColumns &lt;- data.frame(colstoKeep = c(&quot;Var1.x&quot;, &quot;Var2.x&quot;),
                                coltoRemove = c(&quot;Var1.y&quot;, &quot;Var2.y&quot;))

for (i in 1:nrow(duplicatedColumns)) {
  colToKeep &lt;- duplicatedColumns$colstoKeep[i]
  colToRemove &lt;- duplicatedColumns$coltoRemove[i]
  mergedData[[colToKeep]][is.na(mergedData[[colToKeep]])] &lt;- mergedData[[colToRemove]][is.na(mergedData[[colToKeep]])]
  mergedData[[colToRemove]] &lt;- NULL
  names(mergedData)[names(mergedData) == colToKeep] &lt;- gsub(&quot;.x|.y&quot;, &quot;&quot;, colToKeep)

  • 本文由 发表于 2023年6月16日 10:05:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76486541.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
