从单元格中移除具有多个出现的字符串部分

huangapple go评论60阅读模式
英文:

Remove part of string with multiple occurences inside cell

问题

你可以尝试以下的代码来获得你期望的结果:

gsub("([^|;]+)\\|.*", "\", bla$mycol)

这将返回bla_v2_2072;bla_v2_0113

英文:

I have the following dataframe:

bla = data.frame(mycol = "bla_v2_2072|ID:61462952|;bla_v2_0113|ID:61460993|")

and I want to remove everything after the first '|', but the cell contains basically two substrings separated by ';'.

Now, I tried

gsub("\\|.*","",bla$mycol)

which gives me bla_v2_2072, but what I expect is

bla_v2_2072;bla_v2_0113

答案1

得分: 1

以下是代码的翻译部分:

使用dplyr库:

library(dplyr)
library(tidyr)
library(stringr)
bla %>% 
  mutate(rn = row_number()) %>% 
  separate_longer_delim(mycol, delim = ";") %>% 
  reframe(mycol = str_c(str_remove(mycol, "|.*"), 
  collapse = ";"), .by = 'rn') %>%
  select(-rn)

输出:

                   mycol
1 bla_v2_2072;bla_v2_0113

或者使用基本R:

gsub("(\\w+)(\\|ID:\\d+\\|)", "\", bla$mycol)
[1] "bla_v2_2072;bla_v2_0113"

请注意,这是代码的翻译部分,没有其他内容。

英文:

We may use

library(dplyr)
library(tidyr)
library(stringr)
bla %>% 
  mutate(rn = row_number()) %>% 
  separate_longer_delim(mycol, delim = ";") %>% 
   reframe(mycol = str_c(str_remove(mycol, "\\|.*"), 
   collapse = ";"), .by = 'rn') %>%
  select(-rn)

-output

                   mycol
1 bla_v2_2072;bla_v2_0113

Or using base R

gsub("(\\w+)(\\|ID:\\d+\\|)", "\", bla$mycol)
[1] "bla_v2_2072;bla_v2_0113"

答案2

得分: 0

使用 gsub():

bla$mycol <- gsub("(\\|.*?(?=;))|(\\|[^;]*$)", "", bla$mycol, perl = TRUE)

或者在tidyverse中使用相同的正则表达式模式:

library(dplyr)
library(stringr)

bla %>% 
  mutate(mycol = str_remove_all(mycol, "(\\|.*?(?=;))|(\\|[^;]*$)"))

结果:

                    mycol
1 bla_v2_2072;bla_v2_0113

解释:

"(\\|.*?(?=;))              # 字面上的'|'和直到下一个';'的字符
              |             # 或者
               (\\|[^;]*$)" # 字面上的'|'直到字符串结束,如果没有中间的';'的话
英文:

Using gsub():

bla$mycol <- gsub("(\\|.*?(?=;))|(\\|[^;]*$)", "", bla$mycol, perl = TRUE)

Or using the same regex pattern in tidyverse:

library(dplyr)
library(stringr)

bla %>% 
  mutate(mycol = str_remove_all(mycol, "(\\|.*?(?=;))|(\\|[^;]*$)"))

Result:

                    mycol
1 bla_v2_2072;bla_v2_0113

Explanation:

"(\\|.*?(?=;))              # literal '|' and following characters up to next ';'
              |             # or
               (\\|[^;]*$)" # literal '|' through end of string if no intervening ';'

答案3

得分: 0

gsub("\\|[^|]+\\|", "", bla$mycol)
#> [1] "bla_v2_2072;bla_v2_0113"

模式解释:转义的"|"后跟至少一次任意字符(不包括"|")然后再跟一个"|"。

英文:
gsub("\\|[^|]+\\|", "", bla$mycol)
#> [1] "bla_v2_2072;bla_v2_0113"

pattern explanation: escaped "|" followed by everything not "|" at least one time then one more "|"

答案4

得分: -1

可以首先通过分号";"分隔字符串,然后删除分号后的所有内容,最后使用 paste0 将它们连接起来。

> paste0(sub("\\|.*", "", unlist(strsplit(bla$mycol, split=";"))), collapse = "; ")
[1] "bla_v2_2072; bla_v2_0113"
英文:

You can first separate your string by ";" and then remove everything after "|". Finally, concatenate them back using paste0.

> paste0(sub("\\|.*","", unlist(strsplit(bla$mycol, split=";"))), collapse = "; ")
[1] "bla_v2_2072; bla_v2_0113"

huangapple
  • 本文由 发表于 2023年2月24日 01:06:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/75548078.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定