如果模式在gsub中匹配,则仅返回值。

huangapple go评论59阅读模式
英文:

Return only values if pattern is matched in gsub

问题

以下是翻译好的部分:

我有一个包含以下形式账单的字符串:

     bills <- c("2940 Green apples 250g", "5435 Bananas 0,5kg", "3425 Milk")

我想提取产品的重量,我这样做了:

    gsub(".*\\s(\\d*,*\\d+)\\s*(g|kg)$", "\", bills)
    "250"       "0,5"       "3425 Milk"

这种方法基本可以工作,因为它正确地返回了前两个条目的2500,5,但为什么它返回了整个第三个条目"3425 Milk"呢?我以为通过使用 "\" 我告诉了 gsub 提取第一个匹配的组,这里是 "(\\d*,*\\d+)"。因此,我希望最后一个条目是 NA 或空字符串。因此,这是我的预期输出:

    expected <- c("250", "0,5", NA) # 或者
    expected <- c("250", "0,5", "")
英文:

I have a string containing bills in this form:

 bills &lt;- c(&quot;2940 Green apples 250g&quot;, &quot;5435 Bananas 0,5kg&quot;, &quot;3425 Milk&quot;)

I want to extract the weight of the products and I did so with:

gsub(&quot;.*\\s(\\d*,*\\d+)\\s*(g|kg)$&quot;, &quot;\&quot;, bills)
&quot;250&quot;       &quot;0,5&quot;       &quot;3425 Milk&quot;

This kind of works since it correctly returns 250 and 0,5 for the first two entries, but why does it return the whole third entry "3425 Milk"? I thought that by using &quot;\\1&quot; I would tell gsub to extract the first matching group, which here is (\\d*,*\\d+). Therefore, I would expect the last entry being a NA or an empty string. Thus this is my expected output:

expected &lt;- c(&quot;250&quot;, &quot;0,5&quot;, NA) # OR
expected &lt;- c(&quot;250&quot;, &quot;0,5&quot;, &quot;&quot;)

答案1

得分: 1

使用 stringr,我会保留 "g" 和 "kg",因为去除这些数字后,它们处于不同的比例尺。

library(stringr)

bills &lt;- c("2940 Green apples 250g", "5435 Bananas 0,5kg", "3425 Milk")

str_extract(bills, "\\d+(\\,\\d+)?(k?g)")
# [1] "250g"  "0,5kg" NA
英文:

Using stringr, I'd keep "g" and "kg", because when removed these numbers are in different scales.

library(stringr)

bills &lt;- c(&quot;2940 Green apples 250g&quot;, &quot;5435 Bananas 0,5kg&quot;, &quot;3425 Milk&quot;)

str_extract(bills, &quot;\\d+(\\,\\d+)?(k?g)&quot;)
# [1] &quot;250g&quot;  &quot;0,5kg&quot; NA     

答案2

得分: 1

可以添加替换以捕获所有内容。

在替换字符串不引入新符号的情况下(仅捕获组的重组,如\\1\\1\\3\\2),这将导致用空字符串替换输入字符串:

gsub("&quot;.*\\s(\\d*,*\\d+)\\s*(g|kg)$|.*&quot;", "&quot;\&quot;", bills)
# [1] &quot;250&quot;  &quot;0,5&quot; &quot;&quot;

我会将,*更改为,?,因为我不相信您的输入会在其中包含类似1,,,5g的内容。

英文:

You can add alteration to capture everything.

In case when your substitution string doesn't introduce new symbols (only recombination of captured groups, like \\1 or \\1\\3\\2 for example), this will result in replacing input string with empty one:

gsub(&quot;.*\\s(\\d*,*\\d+)\\s*(g|kg)$|.*&quot;, &quot;\&quot;, bills)
# [1] &quot;250&quot;  &quot;0,5&quot; &quot;&quot; 

Also I'd change ,* to ,?, as I don't believe your input will be valid if it contains something like 1,,,5g

huangapple
  • 本文由 发表于 2023年4月19日 17:18:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76052778.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定