英文:
Combining column values with column names for some select columns using tidyr unite
问题
给定一个数据框:
df <- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), col4 = c(100:103))
我想要将列与它们的列名组合在一起。我知道可以使用tidyr中的unite
函数,并获得以下输出:
df %>% unite(NewCol, c(Col1, Col4), remove = F)
Col1 Col2 Col3 Col4 NewCol
1 A W 1 100 A_100
2 B X 2 101 B_101
3 C Y 3 102 C_102
4 D Z 4 103 D_103
但我想要将列名放在列的值旁边,如下所示(分隔符_
实际上不是很重要):
Col1 Col2 Col3 Col4 NewCol
1 A W 1 100 Col1_A_Col4_100
2 B X 2 101 Col1_B_Col4_101
3 C Y 3 102 Col1_C_Col4_102
4 D Z 4 103 Col1_D_Col4_103
我尝试了这里发布的解决方案,它确实产生了期望的输出,但创建了一个单独的输出。
imap_dfr(df %>% select(Col1, Col4), ~ paste(.y, .x, sep = "_")) %>%
unite(NewCol, sep = "_")
NewCol
<chr>
1 Col1_A_Col4_100
2 Col1_B_Col4_101
3 Col1_C_Col4_102
4 Col1_D_Col4_103
您可以简单地使用bind_cols()
将两者组合吗?如何确保两者之间保留了行的顺序?是否有另一种方法可以在同一个数据框中创建NewCol
,类似于第一种情况中的unite
?
您可以使用bind_cols()
将两个数据框组合在一起,并确保它们的行顺序相同。以下是如何完成这个任务:
library(dplyr)
library(tidyr)
# 使用 unite 创建 NewCol
df1 <- df %>%
unite(NewCol, c(Col1, Col4), remove = FALSE)
# 使用 imap_dfr 创建 NewCol
df2 <- imap_dfr(df %>% select(Col1, Col4), ~ paste(.y, .x, sep = "_")) %>%
rename(NewCol = .)
# 使用 bind_cols 将两个数据框组合
result_df <- bind_cols(df, df1["NewCol"], df2["NewCol"])
# 打印结果
print(result_df)
这将产生一个包含所需输出的数据框 result_df
,并确保了行的顺序保持一致。
英文:
Given a dataframe:
df <- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), col4 = c(100:103))
I want to combine column with their column names. I know I can use unite
from tidyr and get the following output.
df %>% unite(NewCol, c(Col1, Col4), remove = F)
Col1 Col2 Col3 Col4 NewCol
1 A W 1 100 A_100
2 B X 2 101 B_101
3 C Y 3 102 C_102
4 D Z 4 103 D_103
But I want to have the column name next to the value of the column as follows (the separator _
is really not that important):
Col1 Col2 Col3 Col4 NewCol
1 A W 1 100 Col1_A_Col4_100
2 B X 2 101 Col1_B_Col4_101
3 C Y 3 102 Col1_C_Col4_102
4 D Z 4 103 Col1_D_Col4_103
I tried the solution posted here which does give the desired output but it creates a separate output.
imap_dfr(df %>% select(Col1, Col4), ~ paste(.y, .x, sep = "_")) %>%
unite(NewCol, sep = "_")
NewCol
<chr>
1 Col1_A_Col4_100
2 Col1_B_Col4_101
3 Col1_C_Col4_102
4 Col1_D_Col4_103
Would I simply use bind_cols()
to combine both? How do I know the sequence of the rows is preserved between the two? Is there another way that I can create NewCol
within the same dataframe similar to unite in the first case?
答案1
得分: 2
一个选项是创建临时的'colname + value'列,然后在第二步中合并它们,例如:
## 加载库
library(tidyverse)
## 加载示例数据
df <- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), Col4 = c(100:103))
## 预期结果
df %>%
bind_cols(imap_dfr(df %>%
select(Col1, Col4),
~ paste(.y, .x, sep = "_")) %>%
unite(newcol, sep = "_"))
#> Col1 Col2 Col3 Col4 newcol
#> 1 A W 1 100 Col1_A_Col4_100
#> 2 B X 2 101 Col1_B_Col4_101
#> 3 C Y 3 102 Col1_C_Col4_102
#> 4 D Z 4 103 Col1_D_Col4_103
## 对于少量列
df %>%
mutate(tmp_Col1 = paste0("Col1", "_", Col1),
tmp_Col4 = paste0("Col4", "_", Col4)) %>%
unite(newcol, c(tmp_Col1, tmp_Col4), sep = "_")
#> Col1 Col2 Col3 Col4 newcol
#> 1 A W 1 100 Col1_A_Col4_100
#> 2 B X 2 101 Col1_B_Col4_101
#> 3 C Y 3 102 Col1_C_Col4_102
#> 4 D Z 4 103 Col1_D_Col4_103
## 对于大量列
df %>%
mutate(across(c(Col1, Col4),
~paste0(cur_column(), "_", .x))) %>%
unite(newcol, c(Col1, Col4), sep = "_") %>%
left_join(df)
#> Joining with `by = join_by(Col2, Col3)`
#> newcol Col2 Col3 Col1 Col4
#> 1 Col1_A_Col4_100 W 1 A 100
#> 2 Col1_B_Col4_101 X 2 B 101
#> 3 Col1_C_Col4_102 Y 3 C 102
#> 4 Col1_D_Col4_103 Z 4 D 103
创建于2023年06月22日,使用 reprex v2.0.2
如果你有大量要转换的列,使用across()
可以让你使用tidyselect函数,比如starts_with()
,来选择感兴趣的列,而不必逐个指定每列的名称。
英文:
One option is to create temporary 'colname + value' columns, then unite them in a second step, e.g.
## Load libraries
library(tidyverse)
## Load example data
df <- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), Col4 = c(100:103))
## Expected outcome
df %>% bind_cols(imap_dfr(df %>% select(Col1, Col4),
~ paste(.y, .x, sep = "_")) %>%
unite(newcol, sep = "_"))
#> Col1 Col2 Col3 Col4 newcol
#> 1 A W 1 100 Col1_A_Col4_100
#> 2 B X 2 101 Col1_B_Col4_101
#> 3 C Y 3 102 Col1_C_Col4_102
#> 4 D Z 4 103 Col1_D_Col4_103
## With a small number of columns
df %>%
mutate(tmp_Col1 = paste0("Col1", "_", Col1),
tmp_Col4 = paste0("Col4", "_", Col4)) %>%
unite(newcol, c(tmp_Col1, tmp_Col4), sep = "_")
#> Col1 Col2 Col3 Col4 newcol
#> 1 A W 1 100 Col1_A_Col4_100
#> 2 B X 2 101 Col1_B_Col4_101
#> 3 C Y 3 102 Col1_C_Col4_102
#> 4 D Z 4 103 Col1_D_Col4_103
## With a large number of columns
df %>%
mutate(across(c(Col1, Col4),
~paste0(cur_column(), "_", .x))) %>%
unite(newcol, c(Col1, Col4), sep = "_") %>%
left_join(df)
#> Joining with `by = join_by(Col2, Col3)`
#> newcol Col2 Col3 Col1 Col4
#> 1 Col1_A_Col4_100 W 1 A 100
#> 2 Col1_B_Col4_101 X 2 B 101
#> 3 Col1_C_Col4_102 Y 3 C 102
#> 4 Col1_D_Col4_103 Z 4 D 103
<sup>Created on 2023-06-22 with reprex v2.0.2</sup>
If you have a large number of columns you want to transform, using across()
allows you to employ tidyselect functions, such as starts_with()
, to select columns of interest without having to specify each column by name.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论