英文:
Splitting Text to Two Columns Using a Separator in R
问题
I am trying to split the column using a separator |
but unfortunately the suggestions in Stack Overflow which recommended using separator()
is not working in my situation.
Can someone help me out here.
The data frame, I am using is given below:
structure(list(UniqueID = c("12M-MA | X1", "12M-MA | X2",
"12M-MA | X3", "12M-MA | X4", "12M-MA | X5"), Cost = c(0.2, 0.3, 0.2, 0.2, 412.86), Actuals = c(0,
0, 0, 0, 32), Forecast = c(0, 0, 0, 0, 21), Value_Actuals = c(0,
0, 0, 0, 28341), Value_Forecast = c(0, 0, 0, 0, 652431
), Forecast Accuracy
= c(0, 0, 0, 0, 8.51), Max (Act vs Cons)
= c(0,
0, 0, 0, 652431)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
The code which I have used and the one which is working is this:
library(readxl)
library(dplyr)
library(writexl)
library(stringr)
Df <- read_excel("C:/X/X/X/X/YXZ-2023.xlsx"skip = 1)
Df <- Df %>%
separate(UniqueID,c("ABC","XYZ"),sep = "|")
The expectation is to keep the column as is and get two more columns, one with the name of "ABC" and the other with "XYZ".
12-MA should go into the column "ABC" and the other value such as X1, X2, X3, X4, X5 should go into the column "XYZ".
Can someone help me out and let me know what I am doing wrong. This was the suggestion in Stack Overflow and everywhere but isn't working for me.
英文:
I am trying to split the column using a separator "|"
but unfortunately the suggestions in Stack Overflow which recommended using separator()
is not working in my situation.
Can someone help me out here.
The data frame, I am using is given below:
structure(list(UniqueID = c("12M-MA | X1", "12M-MA | X2",
"12M-MA | X3", "12M-MA | X4", "12M-MA | X5"
), Cost = c(0.2, 0.3, 0.2, 0.2, 412.86), Actuals = c(0,
0, 0, 0, 32), Forecast = c(0, 0, 0, 0, 21), Value_Actuals = c(0,
0, 0, 0, 28341), Value_Forecast = c(0, 0, 0, 0, 652431
), `Forecast Accuracy` = c(0, 0, 0, 0, 8.51), `Max (Act vs Cons)` = c(0,
0, 0, 0, 652431)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
The code which I have used and the one which is working is this:
library(readxl)
library(dplyr)
library(writexl)
library(stringr)
Df<- read_excel("C:/X/X/X/X/YXZ-2023.xlsx"skip = 1)
Df <- Df %>%
separate(UniqueID,c("ABC","XYZ"),sep = "|")
The expectation is to get keep the column as is and get two more columns one with the name of "ABC" and the other with "XYZ".
12-MA should go into the column "ABC" and the other value such as X1,X2,X3,X4,X5 should go into column "XYZ".
Can someone help me out and let me now what is it that I am doing wrong. This was the suggestion in stack over flow and everywhere but isn't working for me.
答案1
得分: 2
您尝试过使用新的 separate_wider_delim
函数,并将 cols_remove = F
传递进去吗?
require(tidyverse)
# require(tidyr) # 如果您不想导入 `tidyverse`
Df %>%
separate_wider_delim(UniqueID, delim = "|", names = c("ABC", "XYZ"), cols_remove = F)
Df %>%
separate_wider_delim(UniqueID, delim = " | ", names = c("ABC", "XYZ"), cols_remove = F) # 如果您不想在新列中有 " " 符号
# 一个 tibble: 5 × 10
ABC XYZ UniqueID Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12M-MA X1 12M-MA | X1 0.2 0 0 0 0 0 0
2 12M-MA X2 12M-MA | X2 0.3 0 0 0 0 0 0
3 12M-MA X3 12M-MA | X3 0.2 0 0 0 0 0 0
4 12M-MA X4 12M-MA | X4 0.2 0 0 0 0 0 0
5 12M-MA X5 12M-MA | X5 413. 32 21 28341 652431 8.51 652431
英文:
Have you tried the newer separate_wider_delim
with cols_remove = F
?
require(tidyverse)
# require(tidyr) # if you do not want to import `tidyverse`
Df %>% separate_wider_delim(UniqueID, delim = "|", names = c("ABC", "XYZ"), cols_remove = F)
Df %>% separate_wider_delim(UniqueID, delim = " | ", names = c("ABC", "XYZ"), cols_remove = F) # if you do not want " " in the new columns
# A tibble: 5 × 10
ABC XYZ UniqueID Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12M-MA X1 12M-MA | X1 0.2 0 0 0 0 0 0
2 12M-MA X2 12M-MA | X2 0.3 0 0 0 0 0 0
3 12M-MA X3 12M-MA | X3 0.2 0 0 0 0 0 0
4 12M-MA X4 12M-MA | X4 0.2 0 0 0 0 0 0
5 12M-MA X5 12M-MA | X5 413. 32 21 28341 652431 8.51 652431
答案2
得分: 1
We have to escape |
with: \\
\\
in R escapes special characters. See here.
Here is a list of special characters.
To keep the original column we use remove = FALSE
argument.
Finally we use across
with trimws
to remove spaces.
library(dplyr)
library(tidyr)
df %>%
separate(UniqueID, c("ABC", "XYZ"), sep = "\\|", remove = FALSE) %>%
mutate(across(c(ABC, XYZ), trimws))
UniqueID ABC XYZ Cost Actuals Forecast Value_Actuals Value_Forecast Forecast Accuracy Max (Act vs Cons)
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12M-MA | X1 12M-MA X1 0.2 0 0 0 0 0 0
2 12M-MA | X2 12M-MA X2 0.3 0 0 0 0 0 0
3 12M-MA | X3 12M-MA X3 0.2 0 0 0 0 0 0
4 12M-MA | X4 12M-MA X4 0.2 0 0 0 0 0 0
5 12M-MA | X5 12M-MA X5 413. 32 21 28341 652431 8.51 652431
英文:
We have to escape |
with: \\
\\
in R escapes special characters. See here.
Here is a list of special characters.
To keep the original column we use remove = FALSE
argument.
Finally we use across
with trimws
to remove spaces.
library(dplyr)
library(tidyr)
df %>%
separate(UniqueID,c("ABC","XYZ"),sep = "\\|", remove = FALSE) %>%
mutate(across(c(ABC, XYZ), trimws))
UniqueID ABC XYZ Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12M-MA | X1 12M-MA X1 0.2 0 0 0 0 0 0
2 12M-MA | X2 12M-MA X2 0.3 0 0 0 0 0 0
3 12M-MA | X3 12M-MA X3 0.2 0 0 0 0 0 0
4 12M-MA | X4 12M-MA X4 0.2 0 0 0 0 0 0
5 12M-MA | X5 12M-MA X5 413. 32 21 28341 652431 8.51 652431
答案3
得分: 0
Another solution is with extract
:
使用 extract
函数的另一种解决方案是:
library(tidyr)
df %>%
extract(UniqueID,
c("ABC", "XYZ"),
"(.*)\\s\\|\\s(.*)")
# A tibble: 5 × 9
ABC XYZ Cost Actuals Forecast Value_Actuals Value_Forecast Forecast Accuracy Max (Act vs Cons)
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12M-MA X1 0.2 0 0 0 0 0 0
2 12M-MA X2 0.3 0 0 0 0 0 0
3 12M-MA X3 0.2 0 0 0 0 0 0
4 12M-MA X4 0.2 0 0 0 0 0 0
5 12M-MA X5 413. 32 21 28341 652431 8.51 652431
英文:
Another solution is with extract
:
library(tidyr)
df %>%
extract(UniqueID,
c("ABC", "XYZ"),
"(.*)\\s\\|\\s(.*)")
# A tibble: 5 × 9
ABC XYZ Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12M-MA X1 0.2 0 0 0 0 0 0
2 12M-MA X2 0.3 0 0 0 0 0 0
3 12M-MA X3 0.2 0 0 0 0 0 0
4 12M-MA X4 0.2 0 0 0 0 0 0
5 12M-MA X5 413. 32 21 28341 652431 8.51 652431
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论