英文:
Get text between two symbols in R
问题
这可能听起来像是一个重复的问题,但我已经花了最后一个小时寻找答案,却无法将其他类似问题的解决方案应用于这个问题。
我有一段文本,想要提取在第二个“_”和“.”之间的文本。
到目前为止,我已经成功提取了在第一个“_”和“.”之间的文本,如下所示。
library(stringr)
mytext<- "one_two_three.four"
stringr::str_extract(mytext, "(?<=_)(.+)(?=\\.)")
所以我想要的答案是 three 而不是我的答案中提供的 two_three。
我希望继续使用 str_extract 函数,是否有人可以修改我的尝试以获得所需的答案?
英文:
This may sound like a duplicate question but I have spent the last hour looking for the answer and can't apply the other similar sounding questions' solutions to this problem.
I have a string of text and want to extract the text from between the second _ and the .
The furthest I have got is extracting the text from between the first _ and the . as you can see below.
library(stringr)
mytext<-"one_two_three.four"
stringr::str_extract(mytext, "(?<=_)(.+)(?=\\.)")
So the answer I want is three rather than the two_three my answer gives.
Would prefer to keep with the str_extract function, can anyone modify my attempt to get the desired answer?
答案1
得分: 3
以下是代码的翻译部分:
# Matt L. 建议
gsub(".*_.*_(.*)\\..*", "\", mytext)
[1] "three"
解释:
.*_ = 任意数量的字符,然后是下划线。
.*_ = 任意数量的字符,然后是下划线。
(.*) = 任意数量的字符,这是一个捕获组。
\\. = 句点(".")。
.* = 任意数量的字符。
"\\1" = 请返回第一个捕获组的内容。
# rawr 建议
stringr::str_extract(mytext, "(?<=_)([^_]+)(?=\\.)")
[1] "three"
解释:
(?<=_) = 正向后查找,即捕获组前面应该有一个下划线。
([^_]+) = 一个或多个非下划线字符。
(?=\\.) = 正向前查找,即捕获组后面应该有一个句点。
# 我的建议:
stringr::str_extract(mytext, "(_.+_)(.+)(\\.)", group = 2)
[1] "three"
解释:
(_.+_) = 第一个捕获组是下划线,然后是一个或多个任意字符,然后是下划线。
(.+) = 第二个捕获组是一个或多个任意字符。
(\\.) = 第三个捕获组是句点。
group = 2 = 请给我第二个捕获组的内容。
# 或者
stringr::str_split_i(mytext, "_|\\.", 3)
[1] "three"
解释:
按照匹配"_|\\.",即下划线或句点,将文本分割成块。 ,3 = 请给我第三个块。
英文:
# Matt L. suggestion
gsub(".*_.*_(.*)\\..*", "\", mytext)
[1] "three"
Explanation:
.*_ = any amount of anything then an underscore.
.*_ = any amount of anything then an underscore.
(.*) = any amount of anything, and this is a capture group.
\\. = a full stop/period (".")
.* = any amount of anything
"\\1" = please return the contents of the first capture group.
# rawr suggestion
stringr::str_extract(mytext, "(?<=_)([^_]+)(?=\\.)")
[1] "three"
Explanation:
(?<=_) = positive look behind i.e. there should be an underscore before the captured group
([^_]+) = one or more non-underscore characters
(?=\\.) = positive look ahead i.e. there should be a full stop / period after the captured group
And as of stringr release 1.5.0 you can specify a capture group, so my suggestions:
# my suggestions:
stringr::str_extract(mytext, "(_.+_)(.+)(\\.)", group = 2)
[1] "three"
Explanation:
(_.+_) = group 1 is an underscore then one or more of anything then an underscore.
(.+) = group 2 is one or more anythings
(\\.) = group 3 is a full stop / period
group = 2 = give me group 2 please
#or
stringr::str_split_i(mytext, "_|\\.",3)
[1] "three"
Explanation:
split into chunks by cutting out anything matching "_|\\." = underscore or a full stop/period
,3 = give me the third chunk.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论