英文:
Combine elements of columns with same name in different Datasets in R
问题
我想要合并两个DataFrame,通过列名将元素整合到一起。
| X | X1 | X2 | X3 | X4 | X5 | 
|---|---|---|---|---|---|
| A | B | C | D | Y | W | 
| E | E | F | G | O | S | 
| H | I | J | L | NA | NA | 
| Z | NA | NA | NA | NA | NA | 
| P | NA | NA | NA | NA | NA | 
英文:
I have two df
df_1
| X | X1 | X2 | X3 | 
|---|---|---|---|
| A | B | C | D | 
| E | E | F | G | 
| H | I | J | L | 
and another, df_2
| X | X4 | X5 | 
|---|---|---|
| Z | Y | W | 
| P | O | S | 
I would like to merge both by column name and integrate the elements
| X | X1 | X2 | X3 | X4 | X5 | 
|---|---|---|---|---|---|
| A | B | C | D | Y | W | 
| E | E | F | G | O | S | 
| H | I | J | L | NA | NA | 
| Z | NA | NA | NA | NA | NA | 
| P | NA | NA | NA | NA | NA | 
答案1
得分: 5
在`dplyr`中,你可以使用`bind_rows`,然后使用`order`对缺失值和非缺失值进行排序:
英文:
In dplyr, you can use bind_rows and then order NAs and non-NAs:
library(dplyr)
bind_rows(df_1, df_2) |>
  mutate(across(everything(), ~ .x[order(is.na(.x))]))
#  X   X1   X2   X3   X4   X5
#1 A    B    C    D    Y    W
#2 E    E    F    G    O    S
#3 H    I    J    L <NA> <NA>
#4 Z <NA> <NA> <NA> <NA> <NA>
#5 P <NA> <NA> <NA> <NA> <NA>
答案2
得分: 2
通过基本的R,你可以首先合并具有不同列的行,然后将NA值移到末尾。
mismatch_rbind <- function(a, b) {
  a[setdiff(names(b), names(a))] <- NA
  b[setdiff(names(a), names(b))] <- NA
  rbind(a, b)
}
na_last <- function(x) {
  r <- x[!is.na(x)]
  length(r) <- length(x)
  r
}
out <- mismatch_rbind(df_1, df_2)
out[] <- lapply(out, na_last)
out
#   X   X1   X2   X3   X4   X5
# 1 A    B    C    D    Y    W
# 2 E    E    F    G    O    S
# 3 H    I    J    L <NA> <NA>
# 4 Z <NA> <NA> <NA> <NA> <NA>
# 5 P <NA> <NA> <NA> <NA> <NA>
英文:
With base R you can first bind rows with different columns and then move NA values to the end
mismatch_rbind <- function(a, b) {
  a[setdiff(names(b), names(a))] <- NA
  b[setdiff(names(a), names(b))] <- NA
  rbind(a, b)
}
na_last <- function(x) {
  r <- x[!is.na(x)]
  length(r) <- length(x)
  r
}
out <- mismatch_rbind(df_1, df_2)
out[] <- lapply(out, na_last)
out
#   X   X1   X2   X3   X4   X5
# 1 A    B    C    D    Y    W
# 2 E    E    F    G    O    S
# 3 H    I    J    L <NA> <NA>
# 4 Z <NA> <NA> <NA> <NA> <NA>
# 5 P <NA> <NA> <NA> <NA> <NA>
答案3
得分: 2
另一个基于基础R的解决方案:遍历所有列名以创建合并列的列表;用`NA`填充到相同的长度;然后强制转换回数据框。
英文:
Another base R solution: iterate over all column names to make a list of combined columns; pad with NAs to the same length; and coerce back to a dataframe.
new_cols <- union(names(df_1), names(df_2)) |>
  setNames(nm = _) |>
  lapply(\(x) c(df_1[[x]], df_2[[x]]))
max_len <- max(sapply(new_cols, length))
new_cols |>
  lapply(\(x) {
    length(x) <- max_len
    x
  }) |>
  as.data.frame()
  X   X1   X2   X3   X4   X5
1 A    B    C    D    Y    W
2 E    E    F    G    O    S
3 H    I    J    L <NA> <NA>
4 Z <NA> <NA> <NA> <NA> <NA>
5 P <NA> <NA> <NA> <NA> <NA>
答案4
得分: 2
使用data.table的解决方案
library(data.table)
setDT(df_1)
setDT(df_2)
rbindlist(list(df_1, df_2), fill = TRUE)[, lapply(.SD, \(x) na.omit(x)[1:.N])]
结果
   X  X1 X2 X3 X4 X5
1: A  B  C  D  Y  W
2: E  E  F  G  O  S
3: H  I  J  L NA NA
4: Z NA NA NA NA NA
5: P NA NA NA NA NA
注意:代码部分没有翻译。
英文:
solution in data.table
library(data.table)
setDT(df_1)
setDT(df_2)
rbindlist(list(df_1, df_2), fill = TRUE)[, lapply(.SD, \(x) na.omit(x)[1:.N])]
results
   X   X1   X2   X3   X4   X5
1: A    B    C    D    Y    W
2: E    E    F    G    O    S
3: H    I    J    L <NA> <NA>
4: Z <NA> <NA> <NA> <NA> <NA>
5: P <NA> <NA> <NA> <NA> <NA>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论