使用tidyr unite将某些选择列的列值与列名合并。

huangapple go评论69阅读模式
英文:

Combining column values with column names for some select columns using tidyr unite

问题

给定一个数据框:

df <- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), col4 = c(100:103))

我想要将列与它们的列名组合在一起。我知道可以使用tidyr中的unite函数,并获得以下输出:

df %>% unite(NewCol, c(Col1, Col4), remove = F)

  Col1 Col2 Col3 Col4 NewCol
1    A    W    1  100  A_100
2    B    X    2  101  B_101
3    C    Y    3  102  C_102
4    D    Z    4  103  D_103

但我想要将列名放在列的值旁边,如下所示(分隔符_实际上不是很重要):

  Col1 Col2 Col3 Col4 NewCol
1    A    W    1  100  Col1_A_Col4_100
2    B    X    2  101  Col1_B_Col4_101
3    C    Y    3  102  Col1_C_Col4_102
4    D    Z    4  103  Col1_D_Col4_103

我尝试了这里发布的解决方案,它确实产生了期望的输出,但创建了一个单独的输出。

imap_dfr(df %>% select(Col1, Col4), ~ paste(.y, .x, sep = "_")) %>%
  unite(NewCol, sep = "_")

  NewCol         
  <chr>          
1 Col1_A_Col4_100
2 Col1_B_Col4_101
3 Col1_C_Col4_102
4 Col1_D_Col4_103

您可以简单地使用bind_cols()将两者组合吗?如何确保两者之间保留了行的顺序?是否有另一种方法可以在同一个数据框中创建NewCol,类似于第一种情况中的unite

您可以使用bind_cols()将两个数据框组合在一起,并确保它们的行顺序相同。以下是如何完成这个任务:

library(dplyr)
library(tidyr)

# 使用 unite 创建 NewCol
df1 <- df %>%
  unite(NewCol, c(Col1, Col4), remove = FALSE)

# 使用 imap_dfr 创建 NewCol
df2 <- imap_dfr(df %>% select(Col1, Col4), ~ paste(.y, .x, sep = "_")) %>%
  rename(NewCol = .)

# 使用 bind_cols 将两个数据框组合
result_df <- bind_cols(df, df1["NewCol"], df2["NewCol"])

# 打印结果
print(result_df)

这将产生一个包含所需输出的数据框 result_df,并确保了行的顺序保持一致。

英文:

Given a dataframe:

df &lt;- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), col4 = c(100:103))

I want to combine column with their column names. I know I can use unite from tidyr and get the following output.

df %&gt;% unite(NewCol, c(Col1, Col4), remove = F)

  Col1 Col2 Col3 Col4 NewCol
1    A    W    1  100  A_100
2    B    X    2  101  B_101
3    C    Y    3  102  C_102
4    D    Z    4  103  D_103

But I want to have the column name next to the value of the column as follows (the separator _ is really not that important):

  Col1 Col2 Col3 Col4 NewCol
1    A    W    1  100  Col1_A_Col4_100
2    B    X    2  101  Col1_B_Col4_101
3    C    Y    3  102  Col1_C_Col4_102
4    D    Z    4  103  Col1_D_Col4_103

I tried the solution posted here which does give the desired output but it creates a separate output.

imap_dfr(df %&gt;% select(Col1, Col4), ~ paste(.y, .x, sep = &quot;_&quot;)) %&gt;%
  unite(NewCol, sep = &quot;_&quot;)

  NewCol         
  &lt;chr&gt;          
1 Col1_A_Col4_100
2 Col1_B_Col4_101
3 Col1_C_Col4_102
4 Col1_D_Col4_103

Would I simply use bind_cols() to combine both? How do I know the sequence of the rows is preserved between the two? Is there another way that I can create NewCol within the same dataframe similar to unite in the first case?

答案1

得分: 2

一个选项是创建临时的'colname + value'列,然后在第二步中合并它们,例如:

## 加载库
library(tidyverse)

## 加载示例数据
df <- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), Col4 = c(100:103))

## 预期结果
df %>%
  bind_cols(imap_dfr(df %>%
                       select(Col1, Col4),
                     ~ paste(.y, .x, sep = "_")) %>%
  unite(newcol, sep = "_"))
#>   Col1 Col2 Col3 Col4          newcol
#> 1    A    W    1  100 Col1_A_Col4_100
#> 2    B    X    2  101 Col1_B_Col4_101
#> 3    C    Y    3  102 Col1_C_Col4_102
#> 4    D    Z    4  103 Col1_D_Col4_103

## 对于少量列
df %>%
  mutate(tmp_Col1 = paste0("Col1", "_", Col1),
         tmp_Col4 = paste0("Col4", "_", Col4)) %>%
  unite(newcol, c(tmp_Col1, tmp_Col4), sep = "_")
#>   Col1 Col2 Col3 Col4          newcol
#> 1    A    W    1  100 Col1_A_Col4_100
#> 2    B    X    2  101 Col1_B_Col4_101
#> 3    C    Y    3  102 Col1_C_Col4_102
#> 4    D    Z    4  103 Col1_D_Col4_103

## 对于大量列
df %>%
  mutate(across(c(Col1, Col4),
                ~paste0(cur_column(), "_", .x))) %>%
  unite(newcol, c(Col1, Col4), sep = "_") %>%
  left_join(df)
#> Joining with `by = join_by(Col2, Col3)`
#>            newcol Col2 Col3 Col1 Col4
#> 1 Col1_A_Col4_100    W    1    A  100
#> 2 Col1_B_Col4_101    X    2    B  101
#> 3 Col1_C_Col4_102    Y    3    C  102
#> 4 Col1_D_Col4_103    Z    4    D  103

创建于2023年06月22日,使用 reprex v2.0.2

如果你有大量要转换的列,使用across()可以让你使用tidyselect函数,比如starts_with(),来选择感兴趣的列,而不必逐个指定每列的名称。

英文:

One option is to create temporary 'colname + value' columns, then unite them in a second step, e.g.

## Load libraries
library(tidyverse)

## Load example data
df &lt;- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), Col4 = c(100:103))

## Expected outcome
df %&gt;% bind_cols(imap_dfr(df %&gt;% select(Col1, Col4),
                          ~ paste(.y, .x, sep = &quot;_&quot;)) %&gt;%
                   unite(newcol, sep = &quot;_&quot;))
#&gt;   Col1 Col2 Col3 Col4          newcol
#&gt; 1    A    W    1  100 Col1_A_Col4_100
#&gt; 2    B    X    2  101 Col1_B_Col4_101
#&gt; 3    C    Y    3  102 Col1_C_Col4_102
#&gt; 4    D    Z    4  103 Col1_D_Col4_103

## With a small number of columns
df %&gt;%
  mutate(tmp_Col1 = paste0(&quot;Col1&quot;, &quot;_&quot;, Col1),
         tmp_Col4 = paste0(&quot;Col4&quot;, &quot;_&quot;, Col4)) %&gt;%
  unite(newcol, c(tmp_Col1, tmp_Col4), sep = &quot;_&quot;)
#&gt;   Col1 Col2 Col3 Col4          newcol
#&gt; 1    A    W    1  100 Col1_A_Col4_100
#&gt; 2    B    X    2  101 Col1_B_Col4_101
#&gt; 3    C    Y    3  102 Col1_C_Col4_102
#&gt; 4    D    Z    4  103 Col1_D_Col4_103

## With a large number of columns
df %&gt;%
  mutate(across(c(Col1, Col4),
                ~paste0(cur_column(), &quot;_&quot;, .x))) %&gt;%
  unite(newcol, c(Col1, Col4), sep = &quot;_&quot;) %&gt;%
  left_join(df)
#&gt; Joining with `by = join_by(Col2, Col3)`
#&gt;            newcol Col2 Col3 Col1 Col4
#&gt; 1 Col1_A_Col4_100    W    1    A  100
#&gt; 2 Col1_B_Col4_101    X    2    B  101
#&gt; 3 Col1_C_Col4_102    Y    3    C  102
#&gt; 4 Col1_D_Col4_103    Z    4    D  103

<sup>Created on 2023-06-22 with reprex v2.0.2</sup>

If you have a large number of columns you want to transform, using across() allows you to employ tidyselect functions, such as starts_with(), to select columns of interest without having to specify each column by name.

huangapple
  • 本文由 发表于 2023年6月22日 11:52:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76528502.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定