Separate columns in R based on the second occurrence of (“.”).

huangapple go评论72阅读模式
英文:

Separate columns in R based on the second occurence of ("\\.")

问题

Sure, here's the translated part:

我有一个非常难以从数据集中分离出我的列

tibble(sample=c("AM.F10.T1", "AM.F10.T2","DA.AD.1","DA.AD.2", "ES.AD.1"))

并使它们看起来像

#>   sample        col1      col2
#>   <chr>    
#> 1 AM.F10.T1     AM.F10     T1
#> 2 AM.F10.T2     AM.F10     T2
#> 3 DA.AD.1       DA.AD       1
#> 4 DA.AD.2       DA.AD       2
#> 5 ES.AD.1       ES.AD       1

谢谢您花时间查看我的帖子
英文:

I have a very hard to separate my columns from data set

library(dplyr)
#&gt; 
#&gt; Attaching package: &#39;dplyr&#39;
#&gt; The following objects are masked from &#39;package:stats&#39;:
#&gt; 
#&gt;     filter, lag
#&gt; The following objects are masked from &#39;package:base&#39;:
#&gt; 
#&gt;     intersect, setdiff, setequal, union

tibble(sample=c(&quot;AM.F10.T1&quot;, &quot;AM.F10.T2&quot;,&quot;DA.AD.1&quot;,&quot;DA.AD.2&quot;, &quot;ES.AD.1&quot;))
#&gt; # A tibble: 5 &#215; 1
#&gt;   sample   
#&gt;   &lt;chr&gt;    
#&gt; 1 AM.F10.T1
#&gt; 2 AM.F10.T2
#&gt; 3 DA.AD.1  
#&gt; 4 DA.AD.2  
#&gt; 5 ES.AD.1

<sup>Created on 2023-05-11 with reprex v2.0.2</sup>

and make them look like

#&gt;   sample        col1      col2
#&gt;   &lt;chr&gt;    
#&gt; 1 AM.F10.T1     AM.F10     T1
#&gt; 2 AM.F10.T2     AM.F10     T2
#&gt; 3 DA.AD.1       DA.AD       1
#&gt; 4 DA.AD.2       DA.AD       2
#&gt; 5 ES.AD.1       ES.AD       1

Thank you for spending time in my post

答案1

得分: 1

你可以使用 tidyr::separate_wider_regex() 来实现这个功能(此函数包含在 tidyr 的最新版本中)。你可以明确指定第一列和第二列的内容以及它们之间的分隔符。

library(tidyr)
tibble(sample=c("AM.F10.T1", "AM.F10.T2","DA.AD.1","DA.AD.2", "ES.AD.1")) |> 
  separate_wider_regex(
     cols = sample, 
     patterns = c(first  = "\\w*\\.\\w*", "\\.", second = "\\w*")
  )
#> # A tibble: 5 × 2
#>   first  second
#>   <chr>  <chr> 
#> 1 AM.F10 T1    
#> 2 AM.F10 T2    
#> 3 DA.AD  1     
#> 4 DA.AD  2     
#> 5 ES.AD  1

创建于2023年05月11日,使用 reprex v2.0.2

英文:

You can do this with tidyr::separate_wider_regex() (this function is in the recent release of tidyr). You can be explicit about what is in the first and second columns and what separates them.

library(tidyr)
tibble(sample=c(&quot;AM.F10.T1&quot;, &quot;AM.F10.T2&quot;,&quot;DA.AD.1&quot;,&quot;DA.AD.2&quot;, &quot;ES.AD.1&quot;)) |&gt; 
  separate_wider_regex(
     cols = sample, 
     patterns = c(first  = &quot;\\w*\\.\\w*&quot;, &quot;\\.&quot;, second = &quot;\\w*&quot;)
  )
#&gt; # A tibble: 5 &#215; 2
#&gt;   first  second
#&gt;   &lt;chr&gt;  &lt;chr&gt; 
#&gt; 1 AM.F10 T1    
#&gt; 2 AM.F10 T2    
#&gt; 3 DA.AD  1     
#&gt; 4 DA.AD  2     
#&gt; 5 ES.AD  1

<sup>Created on 2023-05-11 with reprex v2.0.2</sup>

答案2

得分: 1

虽然tidyr包中的extract函数已被separate_wider_regex替代,但我认为它有时仍然很有用。

在第一个捕获组中使用激进匹配会强制后一个捕获组获取第二个句点后的内容。

library(tidyr)

extract(df, sample, regex = "(.*)\\.(.*)", into = c("col1", "col2"), remove = FALSE)

# A tibble: 5 × 3
  sample    col1   col2 
  <chr>     <chr>  <chr>
1 AM.F10.T1 AM.F10 T1   
2 AM.F10.T2 AM.F10 T2   
3 DA.AD.1   DA.AD  1    
4 DA.AD.2   DA.AD  2    
5 ES.AD.1   ES.AD  1
英文:

Although the extract function from the tidyr package was superseded by separate_wider_regex, I think it's still useful sometimes.

Using an aggressive match in the first capture group would force the latter capture group to get the content after the second dot.

library(tidyr)

extract(df, sample, regex = &quot;(.*)\\.(.*)&quot;, into = c(&quot;col1&quot;, &quot;col2&quot;), remove = F)

# A tibble: 5 &#215; 3
  sample    col1   col2 
  &lt;chr&gt;     &lt;chr&gt;  &lt;chr&gt;
1 AM.F10.T1 AM.F10 T1   
2 AM.F10.T2 AM.F10 T2   
3 DA.AD.1   DA.AD  1    
4 DA.AD.2   DA.AD  2    
5 ES.AD.1   ES.AD  1

huangapple
  • 本文由 发表于 2023年5月11日 20:19:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76227585.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定