在R中获取两个符号之间的文本。

huangapple go评论128阅读模式
英文:

Get text between two symbols in R

问题

这可能听起来像是一个重复的问题,但我已经花了最后一个小时寻找答案,却无法将其他类似问题的解决方案应用于这个问题。

我有一段文本,想要提取在第二个“_”和“.”之间的文本。

到目前为止,我已经成功提取了在第一个“_”和“.”之间的文本,如下所示。

library(stringr)
mytext<- "one_two_three.four"
stringr::str_extract(mytext, "(?<=_)(.+)(?=\\.)")

所以我想要的答案是 three 而不是我的答案中提供的 two_three

我希望继续使用 str_extract 函数,是否有人可以修改我的尝试以获得所需的答案?

英文:

This may sound like a duplicate question but I have spent the last hour looking for the answer and can't apply the other similar sounding questions' solutions to this problem.

I have a string of text and want to extract the text from between the second _ and the .

The furthest I have got is extracting the text from between the first _ and the . as you can see below.

library(stringr)
mytext&lt;-&quot;one_two_three.four&quot;
stringr::str_extract(mytext, &quot;(?&lt;=_)(.+)(?=\\.)&quot;)

So the answer I want is three rather than the two_three my answer gives.

Would prefer to keep with the str_extract function, can anyone modify my attempt to get the desired answer?

答案1

得分: 3

以下是代码的翻译部分:

# Matt L. 建议
gsub(".*_.*_(.*)\\..*", "\", mytext)
[1] "three"

解释:
.*_ = 任意数量的字符,然后是下划线。
.*_ = 任意数量的字符,然后是下划线。
(.*) = 任意数量的字符,这是一个捕获组。
\\. = 句点(".")。
.* = 任意数量的字符。
"\\1" = 请返回第一个捕获组的内容。

# rawr 建议
stringr::str_extract(mytext, "(?<=_)([^_]+)(?=\\.)")
[1] "three"

解释:
(?<=_) = 正向后查找,即捕获组前面应该有一个下划线。
([^_]+) = 一个或多个非下划线字符。
(?=\\.) = 正向前查找,即捕获组后面应该有一个句点。

# 我的建议:
stringr::str_extract(mytext, "(_.+_)(.+)(\\.)", group = 2)
[1] "three"

解释:
(_.+_) = 第一个捕获组是下划线,然后是一个或多个任意字符,然后是下划线。
(.+) = 第二个捕获组是一个或多个任意字符。
(\\.) = 第三个捕获组是句点。
group = 2 = 请给我第二个捕获组的内容。

# 或者
stringr::str_split_i(mytext, "_|\\.", 3)
[1] "three"

解释:
按照匹配"_|\\.",即下划线或句点,将文本分割成块。 ,3 = 请给我第三个块。

英文:
# Matt L. suggestion
gsub(&quot;.*_.*_(.*)\\..*&quot;, &quot;\&quot;, mytext)
[1] &quot;three&quot;

Explanation:
.*_ = any amount of anything then an underscore.
.*_ = any amount of anything then an underscore.
(.*) = any amount of anything, and this is a capture group.
\\. = a full stop/period (".")
.* = any amount of anything
&quot;\\1&quot; = please return the contents of the first capture group.

# rawr suggestion
stringr::str_extract(mytext, &quot;(?&lt;=_)([^_]+)(?=\\.)&quot;)
[1] &quot;three&quot;

Explanation:
(?&lt;=_) = positive look behind i.e. there should be an underscore before the captured group
([^_]+) = one or more non-underscore characters
(?=\\.) = positive look ahead i.e. there should be a full stop / period after the captured group

And as of stringr release 1.5.0 you can specify a capture group, so my suggestions:

# my suggestions:
stringr::str_extract(mytext, &quot;(_.+_)(.+)(\\.)&quot;, group = 2)
[1] &quot;three&quot;

Explanation:
(_.+_) = group 1 is an underscore then one or more of anything then an underscore.
(.+) = group 2 is one or more anythings
(\\.) = group 3 is a full stop / period
group = 2 = give me group 2 please

#or 
stringr::str_split_i(mytext, &quot;_|\\.&quot;,3)
[1] &quot;three&quot;

Explanation:
split into chunks by cutting out anything matching &quot;_|\\.&quot; = underscore or a full stop/period
,3 = give me the third chunk.

huangapple
  • 本文由 发表于 2023年3月4日 00:05:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/75629378.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定