在R中合并不同数据集中具有相同列名的列元素。

huangapple go评论57阅读模式
英文:

Combine elements of columns with same name in different Datasets in R

问题

我想要合并两个DataFrame,通过列名将元素整合到一起。

X X1 X2 X3 X4 X5
A B C D Y W
E E F G O S
H I J L NA NA
Z NA NA NA NA NA
P NA NA NA NA NA
英文:

I have two df

df_1

X X1 X2 X3
A B C D
E E F G
H I J L

and another, df_2

X X4 X5
Z Y W
P O S

I would like to merge both by column name and integrate the elements

X X1 X2 X3 X4 X5
A B C D Y W
E E F G O S
H I J L NA NA
Z NA NA NA NA NA
P NA NA NA NA NA

答案1

得分: 5

在`dplyr`中,你可以使用`bind_rows`,然后使用`order`对缺失值和非缺失值进行排序:
英文:

In dplyr, you can use bind_rows and then order NAs and non-NAs:

library(dplyr)
bind_rows(df_1, df_2) |>
  mutate(across(everything(), ~ .x[order(is.na(.x))]))

#  X   X1   X2   X3   X4   X5
#1 A    B    C    D    Y    W
#2 E    E    F    G    O    S
#3 H    I    J    L <NA> <NA>
#4 Z <NA> <NA> <NA> <NA> <NA>
#5 P <NA> <NA> <NA> <NA> <NA>

答案2

得分: 2

通过基本的R,你可以首先合并具有不同列的行,然后将NA值移到末尾。

mismatch_rbind <- function(a, b) {
  a[setdiff(names(b), names(a))] <- NA
  b[setdiff(names(a), names(b))] <- NA
  rbind(a, b)
}
na_last <- function(x) {
  r <- x[!is.na(x)]
  length(r) <- length(x)
  r
}

out <- mismatch_rbind(df_1, df_2)
out[] <- lapply(out, na_last)
out
#   X   X1   X2   X3   X4   X5
# 1 A    B    C    D    Y    W
# 2 E    E    F    G    O    S
# 3 H    I    J    L <NA> <NA>
# 4 Z <NA> <NA> <NA> <NA> <NA>
# 5 P <NA> <NA> <NA> <NA> <NA>
英文:

With base R you can first bind rows with different columns and then move NA values to the end

mismatch_rbind &lt;- function(a, b) {
  a[setdiff(names(b), names(a))] &lt;- NA
  b[setdiff(names(a), names(b))] &lt;- NA
  rbind(a, b)
}
na_last &lt;- function(x) {
  r &lt;- x[!is.na(x)]
  length(r) &lt;- length(x)
  r
}

out &lt;- mismatch_rbind(df_1, df_2)
out[] &lt;- lapply(out, na_last)
out
#   X   X1   X2   X3   X4   X5
# 1 A    B    C    D    Y    W
# 2 E    E    F    G    O    S
# 3 H    I    J    L &lt;NA&gt; &lt;NA&gt;
# 4 Z &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;
# 5 P &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;

答案3

得分: 2

另一个基于基础R的解决方案:遍历所有列名以创建合并列的列表;用`NA`填充到相同的长度;然后强制转换回数据框。
英文:

Another base R solution: iterate over all column names to make a list of combined columns; pad with NAs to the same length; and coerce back to a dataframe.

new_cols &lt;- union(names(df_1), names(df_2)) |&gt;
  setNames(nm = _) |&gt;
  lapply(\(x) c(df_1[[x]], df_2[[x]]))

max_len &lt;- max(sapply(new_cols, length))

new_cols |&gt;
  lapply(\(x) {
    length(x) &lt;- max_len
    x
  }) |&gt;
  as.data.frame()
  X   X1   X2   X3   X4   X5
1 A    B    C    D    Y    W
2 E    E    F    G    O    S
3 H    I    J    L &lt;NA&gt; &lt;NA&gt;
4 Z &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;
5 P &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;

答案4

得分: 2

使用data.table的解决方案

library(data.table)

setDT(df_1)
setDT(df_2)

rbindlist(list(df_1, df_2), fill = TRUE)[, lapply(.SD, \(x) na.omit(x)[1:.N])]

结果

   X  X1 X2 X3 X4 X5
1: A  B  C  D  Y  W
2: E  E  F  G  O  S
3: H  I  J  L NA NA
4: Z NA NA NA NA NA
5: P NA NA NA NA NA

注意:代码部分没有翻译。

英文:

solution in data.table

library(data.table)

setDT(df_1)
setDT(df_2)

rbindlist(list(df_1, df_2), fill = TRUE)[, lapply(.SD, \(x) na.omit(x)[1:.N])]

results

   X   X1   X2   X3   X4   X5
1: A    B    C    D    Y    W
2: E    E    F    G    O    S
3: H    I    J    L &lt;NA&gt; &lt;NA&gt;
4: Z &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;
5: P &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;

huangapple
  • 本文由 发表于 2023年2月23日 23:21:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/75546827.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定