在R中合并不同数据集中具有相同列名的列元素。

huangapple go评论96阅读模式
英文:

Combine elements of columns with same name in different Datasets in R

问题

我想要合并两个DataFrame,通过列名将元素整合到一起。

X X1 X2 X3 X4 X5
A B C D Y W
E E F G O S
H I J L NA NA
Z NA NA NA NA NA
P NA NA NA NA NA
英文:

I have two df

df_1

X X1 X2 X3
A B C D
E E F G
H I J L

and another, df_2

X X4 X5
Z Y W
P O S

I would like to merge both by column name and integrate the elements

X X1 X2 X3 X4 X5
A B C D Y W
E E F G O S
H I J L NA NA
Z NA NA NA NA NA
P NA NA NA NA NA

答案1

得分: 5

  1. `dplyr`中,你可以使用`bind_rows`,然后使用`order`对缺失值和非缺失值进行排序:
英文:

In dplyr, you can use bind_rows and then order NAs and non-NAs:

  1. library(dplyr)
  2. bind_rows(df_1, df_2) |>
  3. mutate(across(everything(), ~ .x[order(is.na(.x))]))
  4. # X X1 X2 X3 X4 X5
  5. #1 A B C D Y W
  6. #2 E E F G O S
  7. #3 H I J L <NA> <NA>
  8. #4 Z <NA> <NA> <NA> <NA> <NA>
  9. #5 P <NA> <NA> <NA> <NA> <NA>

答案2

得分: 2

通过基本的R,你可以首先合并具有不同列的行,然后将NA值移到末尾。

  1. mismatch_rbind <- function(a, b) {
  2. a[setdiff(names(b), names(a))] <- NA
  3. b[setdiff(names(a), names(b))] <- NA
  4. rbind(a, b)
  5. }
  6. na_last <- function(x) {
  7. r <- x[!is.na(x)]
  8. length(r) <- length(x)
  9. r
  10. }
  11. out <- mismatch_rbind(df_1, df_2)
  12. out[] <- lapply(out, na_last)
  13. out
  14. # X X1 X2 X3 X4 X5
  15. # 1 A B C D Y W
  16. # 2 E E F G O S
  17. # 3 H I J L <NA> <NA>
  18. # 4 Z <NA> <NA> <NA> <NA> <NA>
  19. # 5 P <NA> <NA> <NA> <NA> <NA>
英文:

With base R you can first bind rows with different columns and then move NA values to the end

  1. mismatch_rbind &lt;- function(a, b) {
  2. a[setdiff(names(b), names(a))] &lt;- NA
  3. b[setdiff(names(a), names(b))] &lt;- NA
  4. rbind(a, b)
  5. }
  6. na_last &lt;- function(x) {
  7. r &lt;- x[!is.na(x)]
  8. length(r) &lt;- length(x)
  9. r
  10. }
  11. out &lt;- mismatch_rbind(df_1, df_2)
  12. out[] &lt;- lapply(out, na_last)
  13. out
  14. # X X1 X2 X3 X4 X5
  15. # 1 A B C D Y W
  16. # 2 E E F G O S
  17. # 3 H I J L &lt;NA&gt; &lt;NA&gt;
  18. # 4 Z &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;
  19. # 5 P &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;

答案3

得分: 2

  1. 另一个基于基础R的解决方案:遍历所有列名以创建合并列的列表;用`NA`填充到相同的长度;然后强制转换回数据框。
英文:

Another base R solution: iterate over all column names to make a list of combined columns; pad with NAs to the same length; and coerce back to a dataframe.

  1. new_cols &lt;- union(names(df_1), names(df_2)) |&gt;
  2. setNames(nm = _) |&gt;
  3. lapply(\(x) c(df_1[[x]], df_2[[x]]))
  4. max_len &lt;- max(sapply(new_cols, length))
  5. new_cols |&gt;
  6. lapply(\(x) {
  7. length(x) &lt;- max_len
  8. x
  9. }) |&gt;
  10. as.data.frame()
  1. X X1 X2 X3 X4 X5
  2. 1 A B C D Y W
  3. 2 E E F G O S
  4. 3 H I J L &lt;NA&gt; &lt;NA&gt;
  5. 4 Z &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;
  6. 5 P &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;

答案4

得分: 2

使用data.table的解决方案

  1. library(data.table)
  2. setDT(df_1)
  3. setDT(df_2)
  4. rbindlist(list(df_1, df_2), fill = TRUE)[, lapply(.SD, \(x) na.omit(x)[1:.N])]

结果

  1. X X1 X2 X3 X4 X5
  2. 1: A B C D Y W
  3. 2: E E F G O S
  4. 3: H I J L NA NA
  5. 4: Z NA NA NA NA NA
  6. 5: P NA NA NA NA NA

注意:代码部分没有翻译。

英文:

solution in data.table

  1. library(data.table)
  2. setDT(df_1)
  3. setDT(df_2)
  4. rbindlist(list(df_1, df_2), fill = TRUE)[, lapply(.SD, \(x) na.omit(x)[1:.N])]

results

  1. X X1 X2 X3 X4 X5
  2. 1: A B C D Y W
  3. 2: E E F G O S
  4. 3: H I J L &lt;NA&gt; &lt;NA&gt;
  5. 4: Z &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;
  6. 5: P &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;

huangapple
  • 本文由 发表于 2023年2月23日 23:21:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/75546827.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定