英文:
Get text between two symbols in R
问题
这可能听起来像是一个重复的问题,但我已经花了最后一个小时寻找答案,却无法将其他类似问题的解决方案应用于这个问题。
我有一段文本,想要提取在第二个“_”和“.”之间的文本。
到目前为止,我已经成功提取了在第一个“_”和“.”之间的文本,如下所示。
library(stringr)
mytext<- "one_two_three.four"
stringr::str_extract(mytext, "(?<=_)(.+)(?=\\.)")
所以我想要的答案是 three
而不是我的答案中提供的 two_three
。
我希望继续使用 str_extract
函数,是否有人可以修改我的尝试以获得所需的答案?
英文:
This may sound like a duplicate question but I have spent the last hour looking for the answer and can't apply the other similar sounding questions' solutions to this problem.
I have a string of text and want to extract the text from between the second _
and the .
The furthest I have got is extracting the text from between the first _
and the .
as you can see below.
library(stringr)
mytext<-"one_two_three.four"
stringr::str_extract(mytext, "(?<=_)(.+)(?=\\.)")
So the answer I want is three
rather than the two_three
my answer gives.
Would prefer to keep with the str_extract function, can anyone modify my attempt to get the desired answer?
答案1
得分: 3
以下是代码的翻译部分:
# Matt L. 建议
gsub(".*_.*_(.*)\\..*", "\", mytext)
[1] "three"
解释:
.*_
= 任意数量的字符,然后是下划线。
.*_
= 任意数量的字符,然后是下划线。
(.*)
= 任意数量的字符,这是一个捕获组。
\\.
= 句点(".")。
.*
= 任意数量的字符。
"\\1"
= 请返回第一个捕获组的内容。
# rawr 建议
stringr::str_extract(mytext, "(?<=_)([^_]+)(?=\\.)")
[1] "three"
解释:
(?<=_)
= 正向后查找,即捕获组前面应该有一个下划线。
([^_]+)
= 一个或多个非下划线字符。
(?=\\.)
= 正向前查找,即捕获组后面应该有一个句点。
# 我的建议:
stringr::str_extract(mytext, "(_.+_)(.+)(\\.)", group = 2)
[1] "three"
解释:
(_.+_)
= 第一个捕获组是下划线,然后是一个或多个任意字符,然后是下划线。
(.+)
= 第二个捕获组是一个或多个任意字符。
(\\.)
= 第三个捕获组是句点。
group = 2
= 请给我第二个捕获组的内容。
# 或者
stringr::str_split_i(mytext, "_|\\.", 3)
[1] "three"
解释:
按照匹配"_|\\."
,即下划线或句点,将文本分割成块。 ,3
= 请给我第三个块。
英文:
# Matt L. suggestion
gsub(".*_.*_(.*)\\..*", "\", mytext)
[1] "three"
Explanation:
.*_
= any amount of anything then an underscore.
.*_
= any amount of anything then an underscore.
(.*)
= any amount of anything, and this is a capture group.
\\.
= a full stop/period (".")
.*
= any amount of anything
"\\1"
= please return the contents of the first capture group.
# rawr suggestion
stringr::str_extract(mytext, "(?<=_)([^_]+)(?=\\.)")
[1] "three"
Explanation:
(?<=_)
= positive look behind i.e. there should be an underscore before the captured group
([^_]+)
= one or more non-underscore characters
(?=\\.)
= positive look ahead i.e. there should be a full stop / period after the captured group
And as of stringr
release 1.5.0 you can specify a capture group, so my suggestions:
# my suggestions:
stringr::str_extract(mytext, "(_.+_)(.+)(\\.)", group = 2)
[1] "three"
Explanation:
(_.+_)
= group 1 is an underscore then one or more of anything then an underscore.
(.+)
= group 2 is one or more anythings
(\\.)
= group 3 is a full stop / period
group = 2
= give me group 2 please
#or
stringr::str_split_i(mytext, "_|\\.",3)
[1] "three"
Explanation:
split into chunks by cutting out anything matching "_|\\."
= underscore or a full stop/period
,3
= give me the third chunk.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论