英文:
Split a column into two, using parenthesis as separator in R
问题
I have a weird data format and I need to split a column to two.
col = c("142343-2344343(+)", "546354-4775458(-)", "374637463")
I want to split col to col1 and col2, using the first parenthesis as separator.
I want something like this
     col1                 col2       
142343-2344343            +
546354-4775458            -
374637463                  NA
I'd love your help!
英文:
I have a weird data format and I need to split a column to two.
col=c("142343-2344343(+)", "546354-4775458(-)", "374637463")
I want to split col to col1 and col2, using the first parenthesis as separator.
I want something like this
     col1                 col2       
142343-2344343            +
546354-4775458            _
374637463                  NA
I d love your help!
答案1
得分: 3
尝试 separate:
library(tidyverse)
data.frame(col) %>%
  separate(col,
            into = c("col1", "col2"),
            sep = "\\(|\\)")
结果如下:
            col1 col2
1 142343-2344343    +
2 546354-4775458    -
3      374637463 <NA>
英文:
Try separate:
library(tidyverse)
data.frame(col) %>%
  separate(col,
          into = c("col1", "col2"),
          sep = "\\(|\\)")
            col1 col2
1 142343-2344343    +
2 546354-4775458    -
3      374637463 <NA>
答案2
得分: 2
We may use base R with read.csv
read.csv(text = sub("(.*)([+-])$", "\,\", 
gsub("\\(|\\)", "", col)), header = FALSE, na.strings= "", 
col.names = c("col1", "col2"))
-output
             col1 col2
1 142343-2344343    +
2 546354-4775458    -
3      374637463 <NA>
With tidyr, an option is
library(tidyr)
library(dplyr)
library(tibble)
tibble(col) %>%
 separate_wider_regex(col, c(col1 = ".*", "\\(", var2 = "[^)]", 
    "\\)"), too_few = "align_start")
-output
# A tibble: 3 × 2
  col1           var2 
  <chr>          <chr>
1 142343-2344343 +    
2 546354-4775458 -    
3 374637463      <NA> 
英文:
We may use base R with read.csv
read.csv(text = sub("(.*)([+-])$", "\,\", 
gsub("\\(|\\)", "", col)), header = FALSE, na.strings= "", 
col.names = c("col1", "col2"))
-output
             col1 col2
1 142343-2344343    +
2 546354-4775458    -
3      374637463 <NA>
With tidyr, an option is
library(tidyr)
library(dplyr)
library(tibble)
tibble(col) %>% 
 separate_wider_regex(col, c(col1 = ".*", "\\(", var2 = "[^)]", 
    "\\)"), too_few = "align_start")
-output
# A tibble: 3 × 2
  col1           var2 
  <chr>          <chr>
1 142343-2344343 +    
2 546354-4775458 -    
3 374637463      <NA> 
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论