2023年5月24日 23:40:02go评论90阅读模式

英文:

Splitting Text to Two Columns Using a Separator in R

问题

I am trying to split the column using a separator | but unfortunately the suggestions in Stack Overflow which recommended using separator() is not working in my situation.

Can someone help me out here.

The data frame, I am using is given below:

structure(list(UniqueID = c("12M-MA | X1", "12M-MA | X2",
"12M-MA | X3", "12M-MA | X4", "12M-MA | X5"), Cost = c(0.2, 0.3, 0.2, 0.2, 412.86), Actuals = c(0,
0, 0, 0, 32), Forecast = c(0, 0, 0, 0, 21), Value_Actuals = c(0,
0, 0, 0, 28341), Value_Forecast = c(0, 0, 0, 0, 652431
), Forecast Accuracy = c(0, 0, 0, 0, 8.51), Max (Act vs Cons) = c(0,
0, 0, 0, 652431)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))

The code which I have used and the one which is working is this:

library(readxl)
library(dplyr)
library(writexl)
library(stringr)
Df <- read_excel("C:/X/X/X/X/YXZ-2023.xlsx"skip = 1)

Df <- Df %>%
separate(UniqueID,c("ABC","XYZ"),sep = "|")

The expectation is to keep the column as is and get two more columns, one with the name of "ABC" and the other with "XYZ".

12-MA should go into the column "ABC" and the other value such as X1, X2, X3, X4, X5 should go into the column "XYZ".

Can someone help me out and let me know what I am doing wrong. This was the suggestion in Stack Overflow and everywhere but isn't working for me.

英文:

I am trying to split the column using a separator "|" but unfortunately the suggestions in Stack Overflow which recommended using separator() is not working in my situation.

Can someone help me out here.

The data frame, I am using is given below:

structure(list(UniqueID = c(&quot;12M-MA | X1&quot;, &quot;12M-MA | X2&quot;, 
&quot;12M-MA | X3&quot;, &quot;12M-MA | X4&quot;, &quot;12M-MA | X5&quot;
), Cost = c(0.2, 0.3, 0.2, 0.2, 412.86), Actuals = c(0, 
0, 0, 0, 32), Forecast = c(0, 0, 0, 0, 21), Value_Actuals = c(0, 
0, 0, 0, 28341), Value_Forecast = c(0, 0, 0, 0, 652431
), `Forecast Accuracy` = c(0, 0, 0, 0, 8.51), `Max (Act vs Cons)` = c(0, 
0, 0, 0, 652431)), row.names = c(NA, -5L), class = c(&quot;tbl_df&quot;, 
&quot;tbl&quot;, &quot;data.frame&quot;))

The code which I have used and the one which is working is this:

library(readxl)
library(dplyr)
library(writexl)
library(stringr)
Df&lt;- read_excel(&quot;C:/X/X/X/X/YXZ-2023.xlsx&quot;skip = 1)
Df &lt;- Df %&gt;% 
  separate(UniqueID,c(&quot;ABC&quot;,&quot;XYZ&quot;),sep = &quot;|&quot;)

The expectation is to get keep the column as is and get two more columns one with the name of "ABC" and the other with "XYZ".

12-MA should go into the column "ABC" and the other value such as X1,X2,X3,X4,X5 should go into column "XYZ".

Can someone help me out and let me now what is it that I am doing wrong. This was the suggestion in stack over flow and everywhere but isn't working for me.

答案1

得分: 2

您尝试过使用新的 separate_wider_delim 函数，并将 cols_remove = F 传递进去吗？

require(tidyverse)
# require(tidyr) # 如果您不想导入 `tidyverse`
Df %>%
  separate_wider_delim(UniqueID, delim = "|", names = c("ABC", "XYZ"), cols_remove = F)
Df %>%
  separate_wider_delim(UniqueID, delim = " | ", names = c("ABC", "XYZ"), cols_remove = F) # 如果您不想在新列中有 " " 符号
# 一个 tibble: 5 × 10
  ABC    XYZ   UniqueID     Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
  <chr>  <chr> <chr>       <dbl>   <dbl>    <dbl>         <dbl>          <dbl>               <dbl>               <dbl>
1 12M-MA X1    12M-MA | X1   0.2       0        0             0              0                0                      0
2 12M-MA X2    12M-MA | X2   0.3       0        0             0              0                0                      0
3 12M-MA X3    12M-MA | X3   0.2       0        0             0              0                0                      0
4 12M-MA X4    12M-MA | X4   0.2       0        0             0              0                0                      0
5 12M-MA X5    12M-MA | X5 413.       32       21         28341         652431                8.51              652431

英文:

Have you tried the newer separate_wider_delim with cols_remove = F?

require(tidyverse)
# require(tidyr) # if you do not want to import `tidyverse`
Df %&gt;% separate_wider_delim(UniqueID, delim = &quot;|&quot;, names = c(&quot;ABC&quot;, &quot;XYZ&quot;), cols_remove = F)
Df %&gt;% separate_wider_delim(UniqueID, delim = &quot; | &quot;, names = c(&quot;ABC&quot;, &quot;XYZ&quot;), cols_remove = F) # if you do not want &quot; &quot; in the new columns
# A tibble: 5 &#215; 10
  ABC    XYZ   UniqueID     Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
  &lt;chr&gt;  &lt;chr&gt; &lt;chr&gt;       &lt;dbl&gt;   &lt;dbl&gt;    &lt;dbl&gt;         &lt;dbl&gt;          &lt;dbl&gt;               &lt;dbl&gt;               &lt;dbl&gt;
1 12M-MA X1    12M-MA | X1   0.2       0        0             0              0                0                      0
2 12M-MA X2    12M-MA | X2   0.3       0        0             0              0                0                      0
3 12M-MA X3    12M-MA | X3   0.2       0        0             0              0                0                      0
4 12M-MA X4    12M-MA | X4   0.2       0        0             0              0                0                      0
5 12M-MA X5    12M-MA | X5 413.       32       21         28341         652431                8.51              652431

答案2

得分: 1

We have to escape | with: \\

\\ in R escapes special characters. See here.

Here is a list of special characters.

To keep the original column we use remove = FALSE argument.

Finally we use across with trimws to remove spaces.

library(dplyr)
library(tidyr)
df %>%    
  separate(UniqueID, c("ABC", "XYZ"), sep = "\\|", remove = FALSE) %>% 
  mutate(across(c(ABC, XYZ), trimws))
 UniqueID    ABC    XYZ    Cost Actuals Forecast Value_Actuals Value_Forecast Forecast Accuracy Max (Act vs Cons)
 <chr>       <chr>  <chr>  <dbl>   <dbl>    <dbl>         <dbl>          <dbl>               <dbl>               <dbl>
1 12M-MA | X1 12M-MA X1      0.2       0        0             0              0                0                      0
2 12M-MA | X2 12M-MA X2      0.3       0        0             0              0                0                      0
3 12M-MA | X3 12M-MA X3      0.2       0        0             0              0                0                      0
4 12M-MA | X4 12M-MA X4      0.2       0        0             0              0                0                      0
5 12M-MA | X5 12M-MA X5    413.       32       21         28341         652431                8.51              652431

英文:

We have to escape | with: \\

\\ in R escapes special characters. See here.

Here is a list of special characters.

To keep the original column we use remove = FALSE argument.

Finally we use across with trimws to remove spaces.

library(dplyr)
library(tidyr)
df %&gt;%    
  separate(UniqueID,c(&quot;ABC&quot;,&quot;XYZ&quot;),sep = &quot;\\|&quot;, remove = FALSE) %&gt;% 
  mutate(across(c(ABC, XYZ), trimws))
 UniqueID    ABC    XYZ    Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
  &lt;chr&gt;       &lt;chr&gt;  &lt;chr&gt; &lt;dbl&gt;   &lt;dbl&gt;    &lt;dbl&gt;         &lt;dbl&gt;          &lt;dbl&gt;               &lt;dbl&gt;               &lt;dbl&gt;
1 12M-MA | X1 12M-MA X1      0.2       0        0             0              0                0                      0
2 12M-MA | X2 12M-MA X2      0.3       0        0             0              0                0                      0
3 12M-MA | X3 12M-MA X3      0.2       0        0             0              0                0                      0
4 12M-MA | X4 12M-MA X4      0.2       0        0             0              0                0                      0
5 12M-MA | X5 12M-MA X5    413.       32       21         28341         652431                8.51              652431

答案3

得分: 0

Another solution is with extract:

使用 extract 函数的另一种解决方案是：

library(tidyr)
df %>%
  extract(UniqueID,
          c("ABC", "XYZ"),
          "(.*)\\s\\|\\s(.*)")
# A tibble: 5 × 9
  ABC    XYZ    Cost Actuals Forecast Value_Actuals Value_Forecast Forecast Accuracy Max (Act vs Cons)
  <chr>  <chr>  <dbl>   <dbl>    <dbl>         <dbl>          <dbl>            <dbl>             <dbl>
1 12M-MA X1      0.2       0        0             0              0                  0                      0
2 12M-MA X2      0.3       0        0             0              0                  0                      0
3 12M-MA X3      0.2       0        0             0              0                  0                      0
4 12M-MA X4      0.2       0        0             0              0                  0                      0
5 12M-MA X5    413.       32       21         28341         652431                8.51              652431

英文:

Another solution is with extract:

library(tidyr)
df %&gt;%
  extract(UniqueID,
          c(&quot;ABC&quot;, &quot;XYZ&quot;),
          &quot;(.*)\\s\\|\\s(.*)&quot;)
# A tibble: 5 &#215; 9
  ABC    XYZ    Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
  &lt;chr&gt;  &lt;chr&gt; &lt;dbl&gt;   &lt;dbl&gt;    &lt;dbl&gt;         &lt;dbl&gt;          &lt;dbl&gt;               &lt;dbl&gt;               &lt;dbl&gt;
1 12M-MA X1      0.2       0        0             0              0                0                      0
2 12M-MA X2      0.3       0        0             0              0                0                      0
3 12M-MA X3      0.2       0        0             0              0                0                      0
4 12M-MA X4      0.2       0        0             0              0                0                      0
5 12M-MA X5    413.       32       21         28341         652431                8.51              652431

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将文本使用分隔符在R中拆分为两列

问题

答案1

答案2

答案3

如何将函数列表的输出进行索引以在另一个函数中使用？

使用str_replace()函数来检测带有’+’字符串的字符串

将 ggplot2 图像按X轴等比例保存

Vectorize the assignment of a column in a pandas dataframe where a custom index has many rows and the column value is set using all rows in the index

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。