使用tidyr unite将某些选择列的列值与列名合并。

huangapple go评论94阅读模式
英文:

Combining column values with column names for some select columns using tidyr unite

问题

给定一个数据框:

  1. df <- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), col4 = c(100:103))

我想要将列与它们的列名组合在一起。我知道可以使用tidyr中的unite函数,并获得以下输出:

  1. df %>% unite(NewCol, c(Col1, Col4), remove = F)
  2. Col1 Col2 Col3 Col4 NewCol
  3. 1 A W 1 100 A_100
  4. 2 B X 2 101 B_101
  5. 3 C Y 3 102 C_102
  6. 4 D Z 4 103 D_103

但我想要将列名放在列的值旁边,如下所示(分隔符_实际上不是很重要):

  1. Col1 Col2 Col3 Col4 NewCol
  2. 1 A W 1 100 Col1_A_Col4_100
  3. 2 B X 2 101 Col1_B_Col4_101
  4. 3 C Y 3 102 Col1_C_Col4_102
  5. 4 D Z 4 103 Col1_D_Col4_103

我尝试了这里发布的解决方案,它确实产生了期望的输出,但创建了一个单独的输出。

  1. imap_dfr(df %>% select(Col1, Col4), ~ paste(.y, .x, sep = "_")) %>%
  2. unite(NewCol, sep = "_")
  3. NewCol
  4. <chr>
  5. 1 Col1_A_Col4_100
  6. 2 Col1_B_Col4_101
  7. 3 Col1_C_Col4_102
  8. 4 Col1_D_Col4_103

您可以简单地使用bind_cols()将两者组合吗?如何确保两者之间保留了行的顺序?是否有另一种方法可以在同一个数据框中创建NewCol,类似于第一种情况中的unite

您可以使用bind_cols()将两个数据框组合在一起,并确保它们的行顺序相同。以下是如何完成这个任务:

  1. library(dplyr)
  2. library(tidyr)
  3. # 使用 unite 创建 NewCol
  4. df1 <- df %>%
  5. unite(NewCol, c(Col1, Col4), remove = FALSE)
  6. # 使用 imap_dfr 创建 NewCol
  7. df2 <- imap_dfr(df %>% select(Col1, Col4), ~ paste(.y, .x, sep = "_")) %>%
  8. rename(NewCol = .)
  9. # 使用 bind_cols 将两个数据框组合
  10. result_df <- bind_cols(df, df1["NewCol"], df2["NewCol"])
  11. # 打印结果
  12. print(result_df)

这将产生一个包含所需输出的数据框 result_df,并确保了行的顺序保持一致。

英文:

Given a dataframe:

  1. df &lt;- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), col4 = c(100:103))

I want to combine column with their column names. I know I can use unite from tidyr and get the following output.

  1. df %&gt;% unite(NewCol, c(Col1, Col4), remove = F)
  2. Col1 Col2 Col3 Col4 NewCol
  3. 1 A W 1 100 A_100
  4. 2 B X 2 101 B_101
  5. 3 C Y 3 102 C_102
  6. 4 D Z 4 103 D_103

But I want to have the column name next to the value of the column as follows (the separator _ is really not that important):

  1. Col1 Col2 Col3 Col4 NewCol
  2. 1 A W 1 100 Col1_A_Col4_100
  3. 2 B X 2 101 Col1_B_Col4_101
  4. 3 C Y 3 102 Col1_C_Col4_102
  5. 4 D Z 4 103 Col1_D_Col4_103

I tried the solution posted here which does give the desired output but it creates a separate output.

  1. imap_dfr(df %&gt;% select(Col1, Col4), ~ paste(.y, .x, sep = &quot;_&quot;)) %&gt;%
  2. unite(NewCol, sep = &quot;_&quot;)
  3. NewCol
  4. &lt;chr&gt;
  5. 1 Col1_A_Col4_100
  6. 2 Col1_B_Col4_101
  7. 3 Col1_C_Col4_102
  8. 4 Col1_D_Col4_103

Would I simply use bind_cols() to combine both? How do I know the sequence of the rows is preserved between the two? Is there another way that I can create NewCol within the same dataframe similar to unite in the first case?

答案1

得分: 2

一个选项是创建临时的'colname + value'列,然后在第二步中合并它们,例如:

  1. ## 加载库
  2. library(tidyverse)
  3. ## 加载示例数据
  4. df <- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), Col4 = c(100:103))
  5. ## 预期结果
  6. df %>%
  7. bind_cols(imap_dfr(df %>%
  8. select(Col1, Col4),
  9. ~ paste(.y, .x, sep = "_")) %>%
  10. unite(newcol, sep = "_"))
  11. #> Col1 Col2 Col3 Col4 newcol
  12. #> 1 A W 1 100 Col1_A_Col4_100
  13. #> 2 B X 2 101 Col1_B_Col4_101
  14. #> 3 C Y 3 102 Col1_C_Col4_102
  15. #> 4 D Z 4 103 Col1_D_Col4_103
  16. ## 对于少量列
  17. df %>%
  18. mutate(tmp_Col1 = paste0("Col1", "_", Col1),
  19. tmp_Col4 = paste0("Col4", "_", Col4)) %>%
  20. unite(newcol, c(tmp_Col1, tmp_Col4), sep = "_")
  21. #> Col1 Col2 Col3 Col4 newcol
  22. #> 1 A W 1 100 Col1_A_Col4_100
  23. #> 2 B X 2 101 Col1_B_Col4_101
  24. #> 3 C Y 3 102 Col1_C_Col4_102
  25. #> 4 D Z 4 103 Col1_D_Col4_103
  26. ## 对于大量列
  27. df %>%
  28. mutate(across(c(Col1, Col4),
  29. ~paste0(cur_column(), "_", .x))) %>%
  30. unite(newcol, c(Col1, Col4), sep = "_") %>%
  31. left_join(df)
  32. #> Joining with `by = join_by(Col2, Col3)`
  33. #> newcol Col2 Col3 Col1 Col4
  34. #> 1 Col1_A_Col4_100 W 1 A 100
  35. #> 2 Col1_B_Col4_101 X 2 B 101
  36. #> 3 Col1_C_Col4_102 Y 3 C 102
  37. #> 4 Col1_D_Col4_103 Z 4 D 103

创建于2023年06月22日,使用 reprex v2.0.2

如果你有大量要转换的列,使用across()可以让你使用tidyselect函数,比如starts_with(),来选择感兴趣的列,而不必逐个指定每列的名称。

英文:

One option is to create temporary 'colname + value' columns, then unite them in a second step, e.g.

  1. ## Load libraries
  2. library(tidyverse)
  3. ## Load example data
  4. df &lt;- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), Col4 = c(100:103))
  5. ## Expected outcome
  6. df %&gt;% bind_cols(imap_dfr(df %&gt;% select(Col1, Col4),
  7. ~ paste(.y, .x, sep = &quot;_&quot;)) %&gt;%
  8. unite(newcol, sep = &quot;_&quot;))
  9. #&gt; Col1 Col2 Col3 Col4 newcol
  10. #&gt; 1 A W 1 100 Col1_A_Col4_100
  11. #&gt; 2 B X 2 101 Col1_B_Col4_101
  12. #&gt; 3 C Y 3 102 Col1_C_Col4_102
  13. #&gt; 4 D Z 4 103 Col1_D_Col4_103
  14. ## With a small number of columns
  15. df %&gt;%
  16. mutate(tmp_Col1 = paste0(&quot;Col1&quot;, &quot;_&quot;, Col1),
  17. tmp_Col4 = paste0(&quot;Col4&quot;, &quot;_&quot;, Col4)) %&gt;%
  18. unite(newcol, c(tmp_Col1, tmp_Col4), sep = &quot;_&quot;)
  19. #&gt; Col1 Col2 Col3 Col4 newcol
  20. #&gt; 1 A W 1 100 Col1_A_Col4_100
  21. #&gt; 2 B X 2 101 Col1_B_Col4_101
  22. #&gt; 3 C Y 3 102 Col1_C_Col4_102
  23. #&gt; 4 D Z 4 103 Col1_D_Col4_103
  24. ## With a large number of columns
  25. df %&gt;%
  26. mutate(across(c(Col1, Col4),
  27. ~paste0(cur_column(), &quot;_&quot;, .x))) %&gt;%
  28. unite(newcol, c(Col1, Col4), sep = &quot;_&quot;) %&gt;%
  29. left_join(df)
  30. #&gt; Joining with `by = join_by(Col2, Col3)`
  31. #&gt; newcol Col2 Col3 Col1 Col4
  32. #&gt; 1 Col1_A_Col4_100 W 1 A 100
  33. #&gt; 2 Col1_B_Col4_101 X 2 B 101
  34. #&gt; 3 Col1_C_Col4_102 Y 3 C 102
  35. #&gt; 4 Col1_D_Col4_103 Z 4 D 103

<sup>Created on 2023-06-22 with reprex v2.0.2</sup>

If you have a large number of columns you want to transform, using across() allows you to employ tidyselect functions, such as starts_with(), to select columns of interest without having to specify each column by name.

huangapple
  • 本文由 发表于 2023年6月22日 11:52:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76528502.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定