将文本使用分隔符在R中拆分为两列

huangapple go评论62阅读模式
英文:

Splitting Text to Two Columns Using a Separator in R

问题

I am trying to split the column using a separator | but unfortunately the suggestions in Stack Overflow which recommended using separator() is not working in my situation.

Can someone help me out here.

The data frame, I am using is given below:

structure(list(UniqueID = c("12M-MA | X1", "12M-MA | X2",
"12M-MA | X3", "12M-MA | X4", "12M-MA | X5"), Cost = c(0.2, 0.3, 0.2, 0.2, 412.86), Actuals = c(0,
0, 0, 0, 32), Forecast = c(0, 0, 0, 0, 21), Value_Actuals = c(0,
0, 0, 0, 28341), Value_Forecast = c(0, 0, 0, 0, 652431
), Forecast Accuracy = c(0, 0, 0, 0, 8.51), Max (Act vs Cons) = c(0,
0, 0, 0, 652431)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))

The code which I have used and the one which is working is this:

library(readxl)
library(dplyr)
library(writexl)
library(stringr)
Df <- read_excel("C:/X/X/X/X/YXZ-2023.xlsx"skip = 1)

Df <- Df %>%
separate(UniqueID,c("ABC","XYZ"),sep = "|")

The expectation is to keep the column as is and get two more columns, one with the name of "ABC" and the other with "XYZ".

12-MA should go into the column "ABC" and the other value such as X1, X2, X3, X4, X5 should go into the column "XYZ".

Can someone help me out and let me know what I am doing wrong. This was the suggestion in Stack Overflow and everywhere but isn't working for me.

英文:

I am trying to split the column using a separator &quot;|&quot; but unfortunately the suggestions in Stack Overflow which recommended using separator() is not working in my situation.

Can someone help me out here.

The data frame, I am using is given below:

structure(list(UniqueID = c(&quot;12M-MA | X1&quot;, &quot;12M-MA | X2&quot;, 
&quot;12M-MA | X3&quot;, &quot;12M-MA | X4&quot;, &quot;12M-MA | X5&quot;
), Cost = c(0.2, 0.3, 0.2, 0.2, 412.86), Actuals = c(0, 
0, 0, 0, 32), Forecast = c(0, 0, 0, 0, 21), Value_Actuals = c(0, 
0, 0, 0, 28341), Value_Forecast = c(0, 0, 0, 0, 652431
), `Forecast Accuracy` = c(0, 0, 0, 0, 8.51), `Max (Act vs Cons)` = c(0, 
0, 0, 0, 652431)), row.names = c(NA, -5L), class = c(&quot;tbl_df&quot;, 
&quot;tbl&quot;, &quot;data.frame&quot;))

The code which I have used and the one which is working is this:

library(readxl)
library(dplyr)
library(writexl)
library(stringr)
Df&lt;- read_excel(&quot;C:/X/X/X/X/YXZ-2023.xlsx&quot;skip = 1)

Df &lt;- Df %&gt;% 
  separate(UniqueID,c(&quot;ABC&quot;,&quot;XYZ&quot;),sep = &quot;|&quot;)

The expectation is to get keep the column as is and get two more columns one with the name of "ABC" and the other with "XYZ".

12-MA should go into the column "ABC" and the other value such as X1,X2,X3,X4,X5 should go into column "XYZ".

Can someone help me out and let me now what is it that I am doing wrong. This was the suggestion in stack over flow and everywhere but isn't working for me.

答案1

得分: 2

您尝试过使用新的 separate_wider_delim 函数,并将 cols_remove = F 传递进去吗?

require(tidyverse)
# require(tidyr) # 如果您不想导入 `tidyverse`

Df %>%
  separate_wider_delim(UniqueID, delim = "|", names = c("ABC", "XYZ"), cols_remove = F)
Df %>%
  separate_wider_delim(UniqueID, delim = " | ", names = c("ABC", "XYZ"), cols_remove = F) # 如果您不想在新列中有 " " 符号

# 一个 tibble: 5 × 10
  ABC    XYZ   UniqueID     Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
  <chr>  <chr> <chr>       <dbl>   <dbl>    <dbl>         <dbl>          <dbl>               <dbl>               <dbl>
1 12M-MA X1    12M-MA | X1   0.2       0        0             0              0                0                      0
2 12M-MA X2    12M-MA | X2   0.3       0        0             0              0                0                      0
3 12M-MA X3    12M-MA | X3   0.2       0        0             0              0                0                      0
4 12M-MA X4    12M-MA | X4   0.2       0        0             0              0                0                      0
5 12M-MA X5    12M-MA | X5 413.       32       21         28341         652431                8.51              652431
英文:

Have you tried the newer separate_wider_delim with cols_remove = F?

require(tidyverse)
# require(tidyr) # if you do not want to import `tidyverse`

Df %&gt;% separate_wider_delim(UniqueID, delim = &quot;|&quot;, names = c(&quot;ABC&quot;, &quot;XYZ&quot;), cols_remove = F)
Df %&gt;% separate_wider_delim(UniqueID, delim = &quot; | &quot;, names = c(&quot;ABC&quot;, &quot;XYZ&quot;), cols_remove = F) # if you do not want &quot; &quot; in the new columns

# A tibble: 5 &#215; 10
  ABC    XYZ   UniqueID     Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
  &lt;chr&gt;  &lt;chr&gt; &lt;chr&gt;       &lt;dbl&gt;   &lt;dbl&gt;    &lt;dbl&gt;         &lt;dbl&gt;          &lt;dbl&gt;               &lt;dbl&gt;               &lt;dbl&gt;
1 12M-MA X1    12M-MA | X1   0.2       0        0             0              0                0                      0
2 12M-MA X2    12M-MA | X2   0.3       0        0             0              0                0                      0
3 12M-MA X3    12M-MA | X3   0.2       0        0             0              0                0                      0
4 12M-MA X4    12M-MA | X4   0.2       0        0             0              0                0                      0
5 12M-MA X5    12M-MA | X5 413.       32       21         28341         652431                8.51              652431

答案2

得分: 1

We have to escape | with: \\

\\ in R escapes special characters. See here.

Here is a list of special characters.

To keep the original column we use remove = FALSE argument.

Finally we use across with trimws to remove spaces.

library(dplyr)
library(tidyr)

df %>%    
  separate(UniqueID, c("ABC", "XYZ"), sep = "\\|", remove = FALSE) %>% 
  mutate(across(c(ABC, XYZ), trimws))

 UniqueID    ABC    XYZ    Cost Actuals Forecast Value_Actuals Value_Forecast Forecast Accuracy Max (Act vs Cons)
 <chr>       <chr>  <chr>  <dbl>   <dbl>    <dbl>         <dbl>          <dbl>               <dbl>               <dbl>
1 12M-MA | X1 12M-MA X1      0.2       0        0             0              0                0                      0
2 12M-MA | X2 12M-MA X2      0.3       0        0             0              0                0                      0
3 12M-MA | X3 12M-MA X3      0.2       0        0             0              0                0                      0
4 12M-MA | X4 12M-MA X4      0.2       0        0             0              0                0                      0
5 12M-MA | X5 12M-MA X5    413.       32       21         28341         652431                8.51              652431
英文:

We have to escape | with: \\

\\ in R escapes special characters. See here.

Here is a list of special characters.

To keep the original column we use remove = FALSE argument.

Finally we use across with trimws to remove spaces.

library(dplyr)
library(tidyr)

df %&gt;%    
  separate(UniqueID,c(&quot;ABC&quot;,&quot;XYZ&quot;),sep = &quot;\\|&quot;, remove = FALSE) %&gt;% 
  mutate(across(c(ABC, XYZ), trimws))

 UniqueID    ABC    XYZ    Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
  &lt;chr&gt;       &lt;chr&gt;  &lt;chr&gt; &lt;dbl&gt;   &lt;dbl&gt;    &lt;dbl&gt;         &lt;dbl&gt;          &lt;dbl&gt;               &lt;dbl&gt;               &lt;dbl&gt;
1 12M-MA | X1 12M-MA X1      0.2       0        0             0              0                0                      0
2 12M-MA | X2 12M-MA X2      0.3       0        0             0              0                0                      0
3 12M-MA | X3 12M-MA X3      0.2       0        0             0              0                0                      0
4 12M-MA | X4 12M-MA X4      0.2       0        0             0              0                0                      0
5 12M-MA | X5 12M-MA X5    413.       32       21         28341         652431                8.51              652431

答案3

得分: 0

Another solution is with extract:

使用 extract 函数的另一种解决方案是:

library(tidyr)
df %>%
  extract(UniqueID,
          c("ABC", "XYZ"),
          "(.*)\\s\\|\\s(.*)")
# A tibble: 5 × 9
  ABC    XYZ    Cost Actuals Forecast Value_Actuals Value_Forecast Forecast Accuracy Max (Act vs Cons)
  <chr>  <chr>  <dbl>   <dbl>    <dbl>         <dbl>          <dbl>            <dbl>             <dbl>
1 12M-MA X1      0.2       0        0             0              0                  0                      0
2 12M-MA X2      0.3       0        0             0              0                  0                      0
3 12M-MA X3      0.2       0        0             0              0                  0                      0
4 12M-MA X4      0.2       0        0             0              0                  0                      0
5 12M-MA X5    413.       32       21         28341         652431                8.51              652431
英文:

Another solution is with extract:

library(tidyr)
df %&gt;%
  extract(UniqueID,
          c(&quot;ABC&quot;, &quot;XYZ&quot;),
          &quot;(.*)\\s\\|\\s(.*)&quot;)
# A tibble: 5 &#215; 9
  ABC    XYZ    Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
  &lt;chr&gt;  &lt;chr&gt; &lt;dbl&gt;   &lt;dbl&gt;    &lt;dbl&gt;         &lt;dbl&gt;          &lt;dbl&gt;               &lt;dbl&gt;               &lt;dbl&gt;
1 12M-MA X1      0.2       0        0             0              0                0                      0
2 12M-MA X2      0.3       0        0             0              0                0                      0
3 12M-MA X3      0.2       0        0             0              0                0                      0
4 12M-MA X4      0.2       0        0             0              0                0                      0
5 12M-MA X5    413.       32       21         28341         652431                8.51              652431

huangapple
  • 本文由 发表于 2023年5月24日 23:40:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76325267.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定