2023年6月22日 11:52:20go评论94阅读模式

英文:

Combining column values with column names for some select columns using tidyr unite

问题

给定一个数据框：

df <- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), col4 = c(100:103))

我想要将列与它们的列名组合在一起。我知道可以使用tidyr中的unite函数，并获得以下输出：

df %>% unite(NewCol, c(Col1, Col4), remove = F)
  Col1 Col2 Col3 Col4 NewCol
1    A    W    1  100  A_100
2    B    X    2  101  B_101
3    C    Y    3  102  C_102
4    D    Z    4  103  D_103

但我想要将列名放在列的值旁边，如下所示（分隔符_实际上不是很重要）：

  Col1 Col2 Col3 Col4 NewCol
1    A    W    1  100  Col1_A_Col4_100
2    B    X    2  101  Col1_B_Col4_101
3    C    Y    3  102  Col1_C_Col4_102
4    D    Z    4  103  Col1_D_Col4_103

我尝试了这里发布的解决方案，它确实产生了期望的输出，但创建了一个单独的输出。

imap_dfr(df %>% select(Col1, Col4), ~ paste(.y, .x, sep = "_")) %>%
  unite(NewCol, sep = "_")
  NewCol         
  <chr>          
1 Col1_A_Col4_100
2 Col1_B_Col4_101
3 Col1_C_Col4_102
4 Col1_D_Col4_103

您可以简单地使用bind_cols()将两者组合吗？如何确保两者之间保留了行的顺序？是否有另一种方法可以在同一个数据框中创建NewCol，类似于第一种情况中的unite？

您可以使用bind_cols()将两个数据框组合在一起，并确保它们的行顺序相同。以下是如何完成这个任务：

library(dplyr)
library(tidyr)
# 使用 unite 创建 NewCol
df1 <- df %>%
  unite(NewCol, c(Col1, Col4), remove = FALSE)
# 使用 imap_dfr 创建 NewCol
df2 <- imap_dfr(df %>% select(Col1, Col4), ~ paste(.y, .x, sep = "_")) %>%
  rename(NewCol = .)
# 使用 bind_cols 将两个数据框组合
result_df <- bind_cols(df, df1["NewCol"], df2["NewCol"])
# 打印结果
print(result_df)

这将产生一个包含所需输出的数据框 result_df，并确保了行的顺序保持一致。

英文:

Given a dataframe:

df &lt;- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), col4 = c(100:103))

I want to combine column with their column names. I know I can use unite from tidyr and get the following output.

df %&gt;% unite(NewCol, c(Col1, Col4), remove = F)
  Col1 Col2 Col3 Col4 NewCol
1    A    W    1  100  A_100
2    B    X    2  101  B_101
3    C    Y    3  102  C_102
4    D    Z    4  103  D_103

But I want to have the column name next to the value of the column as follows (the separator _ is really not that important):

  Col1 Col2 Col3 Col4 NewCol
1    A    W    1  100  Col1_A_Col4_100
2    B    X    2  101  Col1_B_Col4_101
3    C    Y    3  102  Col1_C_Col4_102
4    D    Z    4  103  Col1_D_Col4_103

I tried the solution posted here which does give the desired output but it creates a separate output.

imap_dfr(df %&gt;% select(Col1, Col4), ~ paste(.y, .x, sep = &quot;_&quot;)) %&gt;%
  unite(NewCol, sep = &quot;_&quot;)
  NewCol         
  &lt;chr&gt;          
1 Col1_A_Col4_100
2 Col1_B_Col4_101
3 Col1_C_Col4_102
4 Col1_D_Col4_103

Would I simply use bind_cols() to combine both? How do I know the sequence of the rows is preserved between the two? Is there another way that I can create NewCol within the same dataframe similar to unite in the first case?

答案1

得分: 2

一个选项是创建临时的'colname + value'列，然后在第二步中合并它们，例如：

## 加载库
library(tidyverse)
## 加载示例数据
df <- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), Col4 = c(100:103))
## 预期结果
df %>%
  bind_cols(imap_dfr(df %>%
                       select(Col1, Col4),
                     ~ paste(.y, .x, sep = "_")) %>%
  unite(newcol, sep = "_"))
#>   Col1 Col2 Col3 Col4          newcol
#> 1    A    W    1  100 Col1_A_Col4_100
#> 2    B    X    2  101 Col1_B_Col4_101
#> 3    C    Y    3  102 Col1_C_Col4_102
#> 4    D    Z    4  103 Col1_D_Col4_103
## 对于少量列
df %>%
  mutate(tmp_Col1 = paste0("Col1", "_", Col1),
         tmp_Col4 = paste0("Col4", "_", Col4)) %>%
  unite(newcol, c(tmp_Col1, tmp_Col4), sep = "_")
#>   Col1 Col2 Col3 Col4          newcol
#> 1    A    W    1  100 Col1_A_Col4_100
#> 2    B    X    2  101 Col1_B_Col4_101
#> 3    C    Y    3  102 Col1_C_Col4_102
#> 4    D    Z    4  103 Col1_D_Col4_103
## 对于大量列
df %>%
  mutate(across(c(Col1, Col4),
                ~paste0(cur_column(), "_", .x))) %>%
  unite(newcol, c(Col1, Col4), sep = "_") %>%
  left_join(df)
#> Joining with `by = join_by(Col2, Col3)`
#>            newcol Col2 Col3 Col1 Col4
#> 1 Col1_A_Col4_100    W    1    A  100
#> 2 Col1_B_Col4_101    X    2    B  101
#> 3 Col1_C_Col4_102    Y    3    C  102
#> 4 Col1_D_Col4_103    Z    4    D  103

^{创建于2023年06月22日，使用 reprex v2.0.2}

如果你有大量要转换的列，使用across()可以让你使用tidyselect函数，比如starts_with()，来选择感兴趣的列，而不必逐个指定每列的名称。

英文:

One option is to create temporary 'colname + value' columns, then unite them in a second step, e.g.

## Load libraries
library(tidyverse)
## Load example data
df &lt;- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), Col4 = c(100:103))
## Expected outcome
df %&gt;% bind_cols(imap_dfr(df %&gt;% select(Col1, Col4),
                          ~ paste(.y, .x, sep = &quot;_&quot;)) %&gt;%
                   unite(newcol, sep = &quot;_&quot;))
#&gt;   Col1 Col2 Col3 Col4          newcol
#&gt; 1    A    W    1  100 Col1_A_Col4_100
#&gt; 2    B    X    2  101 Col1_B_Col4_101
#&gt; 3    C    Y    3  102 Col1_C_Col4_102
#&gt; 4    D    Z    4  103 Col1_D_Col4_103
## With a small number of columns
df %&gt;%
  mutate(tmp_Col1 = paste0(&quot;Col1&quot;, &quot;_&quot;, Col1),
         tmp_Col4 = paste0(&quot;Col4&quot;, &quot;_&quot;, Col4)) %&gt;%
  unite(newcol, c(tmp_Col1, tmp_Col4), sep = &quot;_&quot;)
#&gt;   Col1 Col2 Col3 Col4          newcol
#&gt; 1    A    W    1  100 Col1_A_Col4_100
#&gt; 2    B    X    2  101 Col1_B_Col4_101
#&gt; 3    C    Y    3  102 Col1_C_Col4_102
#&gt; 4    D    Z    4  103 Col1_D_Col4_103
## With a large number of columns
df %&gt;%
  mutate(across(c(Col1, Col4),
                ~paste0(cur_column(), &quot;_&quot;, .x))) %&gt;%
  unite(newcol, c(Col1, Col4), sep = &quot;_&quot;) %&gt;%
  left_join(df)
#&gt; Joining with `by = join_by(Col2, Col3)`
#&gt;            newcol Col2 Col3 Col1 Col4
#&gt; 1 Col1_A_Col4_100    W    1    A  100
#&gt; 2 Col1_B_Col4_101    X    2    B  101
#&gt; 3 Col1_C_Col4_102    Y    3    C  102
#&gt; 4 Col1_D_Col4_103    Z    4    D  103

<sup>Created on 2023-06-22 with reprex v2.0.2</sup>

If you have a large number of columns you want to transform, using across() allows you to employ tidyselect functions, such as starts_with(), to select columns of interest without having to specify each column by name.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用tidyr unite将某些选择列的列值与列名合并。

问题

答案1

选择具有特定值的行

在使用 Snakemake 变量在 R 脚本中时出错。

将SQLite表导出为Apache Parquet，无需创建数据框。

比较一个值是否位于另外两个值之间

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。