在R中将一列按括号作为分隔符拆分为两列。

huangapple go评论63阅读模式
英文:

Split a column into two, using parenthesis as separator in R

问题

I have a weird data format and I need to split a column to two.

col = c("142343-2344343(+)", "546354-4775458(-)", "374637463")

I want to split col to col1 and col2, using the first parenthesis as separator.

I want something like this

     col1                 col2       
142343-2344343            +
546354-4775458            -
374637463                  NA

I'd love your help!

英文:

I have a weird data format and I need to split a column to two.

col=c("142343-2344343(+)", "546354-4775458(-)", "374637463")

I want to split col to col1 and col2, using the first parenthesis as separator.

I want something like this

     col1                 col2       
142343-2344343            +
546354-4775458            _
374637463                  NA

I d love your help!

答案1

得分: 3

尝试 separate:

library(tidyverse)
data.frame(col) %>%
  separate(col,
            into = c("col1", "col2"),
            sep = "\\(|\\)")

结果如下:

            col1 col2
1 142343-2344343    +
2 546354-4775458    -
3      374637463 <NA>
英文:

Try separate:

library(tidyverse)
data.frame(col) %&gt;%
  separate(col,
          into = c(&quot;col1&quot;, &quot;col2&quot;),
          sep = &quot;\\(|\\)&quot;)
            col1 col2
1 142343-2344343    +
2 546354-4775458    -
3      374637463 &lt;NA&gt;

答案2

得分: 2

We may use base R with read.csv

read.csv(text = sub("(.*)([+-])$", "\,\", 
gsub("\\(|\\)", "", col)), header = FALSE, na.strings= "", 
col.names = c("col1", "col2"))

-output

             col1 col2
1 142343-2344343    +
2 546354-4775458    -
3      374637463 <NA>

With tidyr, an option is

library(tidyr)
library(dplyr)
library(tibble)
tibble(col) %>%
 separate_wider_regex(col, c(col1 = ".*", "\\(", var2 = "[^)]", 
    "\\)"), too_few = "align_start")

-output

# A tibble: 3 × 2
  col1           var2 
  <chr>          <chr>
1 142343-2344343 +    
2 546354-4775458 -    
3 374637463      <NA> 
英文:

We may use base R with read.csv

read.csv(text = sub(&quot;(.*)([+-])$&quot;, &quot;\,\&quot;, 
gsub(&quot;\\(|\\)&quot;, &quot;&quot;, col)), header = FALSE, na.strings= &quot;&quot;, 
col.names = c(&quot;col1&quot;, &quot;col2&quot;))

-output

             col1 col2
1 142343-2344343    +
2 546354-4775458    -
3      374637463 &lt;NA&gt;

With tidyr, an option is

library(tidyr)
library(dplyr)
library(tibble)
tibble(col) %&gt;% 
 separate_wider_regex(col, c(col1 = &quot;.*&quot;, &quot;\\(&quot;, var2 = &quot;[^)]&quot;, 
    &quot;\\)&quot;), too_few = &quot;align_start&quot;)

-output

# A tibble: 3 &#215; 2
  col1           var2 
  &lt;chr&gt;          &lt;chr&gt;
1 142343-2344343 +    
2 546354-4775458 -    
3 374637463      &lt;NA&gt; 

huangapple
  • 本文由 发表于 2023年4月19日 23:14:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76056155.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定