英文:
Return only values if pattern is matched in gsub
问题
以下是翻译好的部分:
我有一个包含以下形式账单的字符串:
bills <- c("2940 Green apples 250g", "5435 Bananas 0,5kg", "3425 Milk")
我想提取产品的重量,我这样做了:
gsub(".*\\s(\\d*,*\\d+)\\s*(g|kg)$", "\", bills)
"250" "0,5" "3425 Milk"
这种方法基本可以工作,因为它正确地返回了前两个条目的250和0,5,但为什么它返回了整个第三个条目"3425 Milk"呢?我以为通过使用 "\" 我告诉了 gsub 提取第一个匹配的组,这里是 "(\\d*,*\\d+)"。因此,我希望最后一个条目是 NA 或空字符串。因此,这是我的预期输出:
expected <- c("250", "0,5", NA) # 或者
expected <- c("250", "0,5", "")
英文:
I have a string containing bills in this form:
bills <- c("2940 Green apples 250g", "5435 Bananas 0,5kg", "3425 Milk")
I want to extract the weight of the products and I did so with:
gsub(".*\\s(\\d*,*\\d+)\\s*(g|kg)$", "\", bills)
"250" "0,5" "3425 Milk"
This kind of works since it correctly returns 250 and 0,5 for the first two entries, but why does it return the whole third entry "3425 Milk"? I thought that by using "\\1"
I would tell gsub
to extract the first matching group, which here is (\\d*,*\\d+)
. Therefore, I would expect the last entry being a NA
or an empty string. Thus this is my expected output:
expected <- c("250", "0,5", NA) # OR
expected <- c("250", "0,5", "")
答案1
得分: 1
使用 stringr,我会保留 "g" 和 "kg",因为去除这些数字后,它们处于不同的比例尺。
library(stringr)
bills <- c("2940 Green apples 250g", "5435 Bananas 0,5kg", "3425 Milk")
str_extract(bills, "\\d+(\\,\\d+)?(k?g)")
# [1] "250g" "0,5kg" NA
英文:
Using stringr, I'd keep "g" and "kg", because when removed these numbers are in different scales.
library(stringr)
bills <- c("2940 Green apples 250g", "5435 Bananas 0,5kg", "3425 Milk")
str_extract(bills, "\\d+(\\,\\d+)?(k?g)")
# [1] "250g" "0,5kg" NA
答案2
得分: 1
可以添加替换以捕获所有内容。
在替换字符串不引入新符号的情况下(仅捕获组的重组,如\\1
或\\1\\3\\2
),这将导致用空字符串替换输入字符串:
gsub("".*\\s(\\d*,*\\d+)\\s*(g|kg)$|.*"", ""\"", bills)
# [1] "250" "0,5" ""
我会将,*
更改为,?
,因为我不相信您的输入会在其中包含类似1,,,5g
的内容。
英文:
You can add alteration to capture everything.
In case when your substitution string doesn't introduce new symbols (only recombination of captured groups, like \\1
or \\1\\3\\2
for example), this will result in replacing input string with empty one:
gsub(".*\\s(\\d*,*\\d+)\\s*(g|kg)$|.*", "\", bills)
# [1] "250" "0,5" ""
Also I'd change ,*
to ,?
, as I don't believe your input will be valid if it contains something like 1,,,5g
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论