2023年2月23日 23:21:45go评论96阅读模式

英文:

Combine elements of columns with same name in different Datasets in R

问题

我想要合并两个DataFrame，通过列名将元素整合到一起。

X	X1	X2	X3	X4	X5
A	B	C	D	Y	W
E	E	F	G	O	S
H	I	J	L	NA	NA
Z	NA	NA	NA	NA	NA
P	NA	NA	NA	NA	NA

英文:

I have two df

df_1

X	X1	X2	X3
A	B	C	D
E	E	F	G
H	I	J	L

and another, df_2

X	X4	X5
Z	Y	W
P	O	S

I would like to merge both by column name and integrate the elements

X	X1	X2	X3	X4	X5
A	B	C	D	Y	W
E	E	F	G	O	S
H	I	J	L	NA	NA
Z	NA	NA	NA	NA	NA
P	NA	NA	NA	NA	NA

答案1

得分: 5

在`dplyr`中，你可以使用`bind_rows`，然后使用`order`对缺失值和非缺失值进行排序：

英文:

In dplyr, you can use bind_rows and then order NAs and non-NAs:

library(dplyr)
bind_rows(df_1, df_2) |&gt;
  mutate(across(everything(), ~ .x[order(is.na(.x))]))
#  X   X1   X2   X3   X4   X5
#1 A    B    C    D    Y    W
#2 E    E    F    G    O    S
#3 H    I    J    L &lt;NA&gt; &lt;NA&gt;
#4 Z &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;
#5 P &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;

答案2

得分: 2

通过基本的R，你可以首先合并具有不同列的行，然后将NA值移到末尾。

mismatch_rbind <- function(a, b) {
  a[setdiff(names(b), names(a))] <- NA
  b[setdiff(names(a), names(b))] <- NA
  rbind(a, b)
}
na_last <- function(x) {
  r <- x[!is.na(x)]
  length(r) <- length(x)
  r
}
out <- mismatch_rbind(df_1, df_2)
out[] <- lapply(out, na_last)
out
#   X   X1   X2   X3   X4   X5
# 1 A    B    C    D    Y    W
# 2 E    E    F    G    O    S
# 3 H    I    J    L <NA> <NA>
# 4 Z <NA> <NA> <NA> <NA> <NA>
# 5 P <NA> <NA> <NA> <NA> <NA>

英文:

With base R you can first bind rows with different columns and then move NA values to the end

mismatch_rbind &lt;- function(a, b) {
  a[setdiff(names(b), names(a))] &lt;- NA
  b[setdiff(names(a), names(b))] &lt;- NA
  rbind(a, b)
}
na_last &lt;- function(x) {
  r &lt;- x[!is.na(x)]
  length(r) &lt;- length(x)
  r
}
out &lt;- mismatch_rbind(df_1, df_2)
out[] &lt;- lapply(out, na_last)
out
#   X   X1   X2   X3   X4   X5
# 1 A    B    C    D    Y    W
# 2 E    E    F    G    O    S
# 3 H    I    J    L &lt;NA&gt; &lt;NA&gt;
# 4 Z &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;
# 5 P &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;

答案3

得分: 2

另一个基于基础R的解决方案：遍历所有列名以创建合并列的列表；用`NA`填充到相同的长度；然后强制转换回数据框。

英文:

Another base R solution: iterate over all column names to make a list of combined columns; pad with NAs to the same length; and coerce back to a dataframe.

new_cols &lt;- union(names(df_1), names(df_2)) |&gt;
  setNames(nm = _) |&gt;
  lapply(\(x) c(df_1[[x]], df_2[[x]]))
max_len &lt;- max(sapply(new_cols, length))
new_cols |&gt;
  lapply(\(x) {
    length(x) &lt;- max_len
    x
  }) |&gt;
  as.data.frame()

  X   X1   X2   X3   X4   X5
1 A    B    C    D    Y    W
2 E    E    F    G    O    S
3 H    I    J    L &lt;NA&gt; &lt;NA&gt;
4 Z &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;
5 P &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;

答案4

得分: 2

使用data.table的解决方案

library(data.table)
setDT(df_1)
setDT(df_2)
rbindlist(list(df_1, df_2), fill = TRUE)[, lapply(.SD, \(x) na.omit(x)[1:.N])]

结果

   X  X1 X2 X3 X4 X5
1: A  B  C  D  Y  W
2: E  E  F  G  O  S
3: H  I  J  L NA NA
4: Z NA NA NA NA NA
5: P NA NA NA NA NA

注意：代码部分没有翻译。

英文:

solution in data.table

library(data.table)
setDT(df_1)
setDT(df_2)
rbindlist(list(df_1, df_2), fill = TRUE)[, lapply(.SD, \(x) na.omit(x)[1:.N])]

results

   X   X1   X2   X3   X4   X5
1: A    B    C    D    Y    W
2: E    E    F    G    O    S
3: H    I    J    L &lt;NA&gt; &lt;NA&gt;
4: Z &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;
5: P &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中合并不同数据集中具有相同列名的列元素。

问题

答案1

答案2

答案3

答案4

如何使用dplyr和purrr计算总和？

如何在使用lme4包进行随机效应建模时将counterbalance指定为随机效应？

在R中基于逻辑条件返回列表中的变量名称。

重塑数据框中的字符串在 R 中

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论