英文:
Remove part of string with multiple occurences inside cell
问题
你可以尝试以下的代码来获得你期望的结果:
gsub("([^|;]+)\\|.*", "\", bla$mycol)
这将返回bla_v2_2072;bla_v2_0113
。
英文:
I have the following dataframe:
bla = data.frame(mycol = "bla_v2_2072|ID:61462952|;bla_v2_0113|ID:61460993|")
and I want to remove everything after the first '|', but the cell contains basically two substrings separated by ';'.
Now, I tried
gsub("\\|.*","",bla$mycol)
which gives me bla_v2_2072
, but what I expect is
bla_v2_2072;bla_v2_0113
答案1
得分: 1
以下是代码的翻译部分:
使用dplyr库:
library(dplyr)
library(tidyr)
library(stringr)
bla %>%
mutate(rn = row_number()) %>%
separate_longer_delim(mycol, delim = ";") %>%
reframe(mycol = str_c(str_remove(mycol, "|.*"),
collapse = ";"), .by = 'rn') %>%
select(-rn)
输出:
mycol
1 bla_v2_2072;bla_v2_0113
或者使用基本R:
gsub("(\\w+)(\\|ID:\\d+\\|)", "\", bla$mycol)
[1] "bla_v2_2072;bla_v2_0113"
请注意,这是代码的翻译部分,没有其他内容。
英文:
We may use
library(dplyr)
library(tidyr)
library(stringr)
bla %>%
mutate(rn = row_number()) %>%
separate_longer_delim(mycol, delim = ";") %>%
reframe(mycol = str_c(str_remove(mycol, "\\|.*"),
collapse = ";"), .by = 'rn') %>%
select(-rn)
-output
mycol
1 bla_v2_2072;bla_v2_0113
Or using base R
gsub("(\\w+)(\\|ID:\\d+\\|)", "\", bla$mycol)
[1] "bla_v2_2072;bla_v2_0113"
答案2
得分: 0
使用 gsub()
:
bla$mycol <- gsub("(\\|.*?(?=;))|(\\|[^;]*$)", "", bla$mycol, perl = TRUE)
或者在tidyverse中使用相同的正则表达式模式:
library(dplyr)
library(stringr)
bla %>%
mutate(mycol = str_remove_all(mycol, "(\\|.*?(?=;))|(\\|[^;]*$)"))
结果:
mycol
1 bla_v2_2072;bla_v2_0113
解释:
"(\\|.*?(?=;)) # 字面上的'|'和直到下一个';'的字符
| # 或者
(\\|[^;]*$)" # 字面上的'|'直到字符串结束,如果没有中间的';'的话
英文:
Using gsub()
:
bla$mycol <- gsub("(\\|.*?(?=;))|(\\|[^;]*$)", "", bla$mycol, perl = TRUE)
Or using the same regex pattern in tidyverse:
library(dplyr)
library(stringr)
bla %>%
mutate(mycol = str_remove_all(mycol, "(\\|.*?(?=;))|(\\|[^;]*$)"))
Result:
mycol
1 bla_v2_2072;bla_v2_0113
Explanation:
"(\\|.*?(?=;)) # literal '|' and following characters up to next ';'
| # or
(\\|[^;]*$)" # literal '|' through end of string if no intervening ';'
答案3
得分: 0
gsub("\\|[^|]+\\|", "", bla$mycol)
#> [1] "bla_v2_2072;bla_v2_0113"
模式解释:转义的"|"后跟至少一次任意字符(不包括"|")然后再跟一个"|"。
英文:
gsub("\\|[^|]+\\|", "", bla$mycol)
#> [1] "bla_v2_2072;bla_v2_0113"
pattern explanation: escaped "|" followed by everything not "|" at least one time then one more "|"
答案4
得分: -1
可以首先通过分号";"分隔字符串,然后删除分号后的所有内容,最后使用 paste0
将它们连接起来。
> paste0(sub("\\|.*", "", unlist(strsplit(bla$mycol, split=";"))), collapse = "; ")
[1] "bla_v2_2072; bla_v2_0113"
英文:
You can first separate your string by ";" and then remove everything after "|". Finally, concatenate them back using paste0
.
> paste0(sub("\\|.*","", unlist(strsplit(bla$mycol, split=";"))), collapse = "; ")
[1] "bla_v2_2072; bla_v2_0113"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论