英文:
Separate columns in R based on the second occurence of ("\\.")
问题
Sure, here's the translated part:
我有一个非常难以从数据集中分离出我的列
tibble(sample=c("AM.F10.T1", "AM.F10.T2","DA.AD.1","DA.AD.2", "ES.AD.1"))
并使它们看起来像
#> sample col1 col2
#> <chr>
#> 1 AM.F10.T1 AM.F10 T1
#> 2 AM.F10.T2 AM.F10 T2
#> 3 DA.AD.1 DA.AD 1
#> 4 DA.AD.2 DA.AD 2
#> 5 ES.AD.1 ES.AD 1
谢谢您花时间查看我的帖子
英文:
I have a very hard to separate my columns from data set
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
tibble(sample=c("AM.F10.T1", "AM.F10.T2","DA.AD.1","DA.AD.2", "ES.AD.1"))
#> # A tibble: 5 × 1
#> sample
#> <chr>
#> 1 AM.F10.T1
#> 2 AM.F10.T2
#> 3 DA.AD.1
#> 4 DA.AD.2
#> 5 ES.AD.1
<sup>Created on 2023-05-11 with reprex v2.0.2</sup>
and make them look like
#> sample col1 col2
#> <chr>
#> 1 AM.F10.T1 AM.F10 T1
#> 2 AM.F10.T2 AM.F10 T2
#> 3 DA.AD.1 DA.AD 1
#> 4 DA.AD.2 DA.AD 2
#> 5 ES.AD.1 ES.AD 1
Thank you for spending time in my post
答案1
得分: 1
你可以使用 tidyr::separate_wider_regex()
来实现这个功能(此函数包含在 tidyr
的最新版本中)。你可以明确指定第一列和第二列的内容以及它们之间的分隔符。
library(tidyr)
tibble(sample=c("AM.F10.T1", "AM.F10.T2","DA.AD.1","DA.AD.2", "ES.AD.1")) |>
separate_wider_regex(
cols = sample,
patterns = c(first = "\\w*\\.\\w*", "\\.", second = "\\w*")
)
#> # A tibble: 5 × 2
#> first second
#> <chr> <chr>
#> 1 AM.F10 T1
#> 2 AM.F10 T2
#> 3 DA.AD 1
#> 4 DA.AD 2
#> 5 ES.AD 1
创建于2023年05月11日,使用 reprex v2.0.2
英文:
You can do this with tidyr::separate_wider_regex()
(this function is in the recent release of tidyr
). You can be explicit about what is in the first and second columns and what separates them.
library(tidyr)
tibble(sample=c("AM.F10.T1", "AM.F10.T2","DA.AD.1","DA.AD.2", "ES.AD.1")) |>
separate_wider_regex(
cols = sample,
patterns = c(first = "\\w*\\.\\w*", "\\.", second = "\\w*")
)
#> # A tibble: 5 × 2
#> first second
#> <chr> <chr>
#> 1 AM.F10 T1
#> 2 AM.F10 T2
#> 3 DA.AD 1
#> 4 DA.AD 2
#> 5 ES.AD 1
<sup>Created on 2023-05-11 with reprex v2.0.2</sup>
答案2
得分: 1
虽然tidyr
包中的extract
函数已被separate_wider_regex
替代,但我认为它有时仍然很有用。
在第一个捕获组中使用激进匹配会强制后一个捕获组获取第二个句点后的内容。
library(tidyr)
extract(df, sample, regex = "(.*)\\.(.*)", into = c("col1", "col2"), remove = FALSE)
# A tibble: 5 × 3
sample col1 col2
<chr> <chr> <chr>
1 AM.F10.T1 AM.F10 T1
2 AM.F10.T2 AM.F10 T2
3 DA.AD.1 DA.AD 1
4 DA.AD.2 DA.AD 2
5 ES.AD.1 ES.AD 1
英文:
Although the extract
function from the tidyr
package was superseded by separate_wider_regex
, I think it's still useful sometimes.
Using an aggressive match in the first capture group would force the latter capture group to get the content after the second dot.
library(tidyr)
extract(df, sample, regex = "(.*)\\.(.*)", into = c("col1", "col2"), remove = F)
# A tibble: 5 × 3
sample col1 col2
<chr> <chr> <chr>
1 AM.F10.T1 AM.F10 T1
2 AM.F10.T2 AM.F10 T2
3 DA.AD.1 DA.AD 1
4 DA.AD.2 DA.AD 2
5 ES.AD.1 ES.AD 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论