英文:
golang regular expression to extract pairs of quantities and their units
问题
我有一组可读的字符串表示时间段。以下是四个示例:
1 天 40 小时 23 分钟 50 秒
3 小时 1 分钟 30 秒
10 天 23 分钟 11 秒
52 秒
我想将这些字符串转换为秒数。一旦将字符串分解为其组成部分,进行这种转换的数学运算就非常简单 - 只需进行乘法和加法运算。然而,我在编写正则表达式将字符串解析为 [<数量>, <单位>]
对时遇到了一些问题。例如,对于字符串:
1 天 40 小时 23 分钟 50 秒
我希望得到一个类似的数组(或切片):
[[1, "天"], [40, "小时"], [23, "分钟"], [50, "秒"]]
以下是我目前尝试的代码及其输出(可在 http://play.golang.org/p/iR-xfc8MVQ 上执行)。segs
是我第一次尝试的结果,它似乎将字符串分解为了 4 个组件,但每个组件只是一个字符串,如 1 天
,而不是一个包含两个元素的数组,如 [1, 天]
。segs2
是我第二次尝试的结果,它似乎做了一些更奇怪的事情,每个组件都重复了两次。
// 时间单位分词器
package main
import "fmt"
import "regexp"
func main() {
s := "1 天 40 小时 23 分钟 50 秒"
re := regexp.MustCompile(`(?P<quant>\d+) (?P<unit>\w+)+`)
segs := re.FindAllString(s, -1)
fmt.Println("segs:", segs)
fmt.Println(segs[0], ",", segs[1], ",", segs[2], ",", segs[3])
fmt.Println("length segs:", len(segs))
segs2 := re.FindAllStringSubmatch(s, -1)
fmt.Println("segs2:", segs2)
fmt.Println(segs2[0], ",", segs2[1], ",", segs2[2], ",", segs2[3])
fmt.Println("length segs2:", len(segs2))
}
输出:
segs: [1 天 40 小时 23 分钟 50 秒]
1 天 , 40 小时 , 23 分钟 , 50 秒
length segs: 4
segs2: [[1 天 1 天] [40 小时 40 小时] [23 分钟 23 分钟] [50 秒 50 秒]]
[1 天 1 天] , [40 小时 40 小时] , [23 分钟 23 分钟] , [50 秒 50 秒]
length segs2: 4
我在 Python 中编写了类似的正则表达式,它可以正常工作,所以我真的不确定我是否在 Go 的正则表达式语法上做错了什么,或者在 re
对象上做了错误的调用。
英文:
I have a set of human readable strings expressing a duration of time. Here are four examples:
1 days 40 hrs 23 min 50 sec
3 hrs 1 min 30 sec
10 days 23 min 11 sec
52 sec
I am trying to convert these strings into number of seconds. The math to do this is quite simple once the string is broken down into its components - it's just multiplication and addition. I am having some issues however with writing the regular expression to parse the string into [<quantity>, <unit>]
pairs. As an example, the output I would like for the string:
1 days 40 hrs 23 min 50 sec
is an array (or slice) like:
[[1, "days"], [40, "hrs"], [23, "min"], [50, "sec"]]
.
Below is the code for what I've tried so far and its output (executable at http://play.golang.org/p/iR-xfc8MVQ). segs
was my first attempt, which seems to break the string down into 4 components ok but each component is just a string like 1 days
rather than a 2-element array like [1, days]
. segs2
was my second attempt, which seems to do something weirder where each component is repeated twice.
// time unit tokenizer
package main
import "fmt"
import "regexp"
func main() {
s := "1 days 40 hrs 23 min 50 sec"
re := regexp.MustCompile("(?P<quant>\\d+) (?P<unit>\\w+)+")
segs := re.FindAllString(s, -1)
fmt.Println("segs:", segs)
fmt.Println(segs[0], "," ,segs[1], ",", segs[2], ",", segs[3])
fmt.Println("length segs:", len(segs))
segs2 := re.FindAllStringSubmatch(s, -1)
fmt.Println("segs2:", segs2)
fmt.Println(segs2[0], "," ,segs2[1], ",", segs2[2], ",", segs2[3])
fmt.Println("length segs2:", len(segs2))
}
Output:
segs: [1 days 40 hrs 23 min 50 sec]
1 days , 40 hrs , 23 min , 50 sec
length segs: 4
segs2: [[1 days 1 days] [40 hrs 40 hrs] [23 min 23 min] [50 sec 50 sec]]
[1 days 1 days] , [40 hrs 40 hrs] , [23 min 23 min] , [50 sec 50 sec]
length segs2: 4
I've written a similar regex is Python which works OK, so I'm really not sure whether I am doing something incorrect for Go's regular expression syntax or perhaps making the wrong call on the re
object.
答案1
得分: 8
Regexp.FindAllStringSubmatch
返回的是 [][]string
。但是它的内容与 Python 函数 re.findall
的返回值略有不同(我假设你在 Python 中使用了 re.findall
)。
return_value[i][0]
包含整个匹配的字符串。return_value[i][1]
包含第一个捕获组。return_value[i][2]
包含第二个捕获组。....
打印 return_value[i]
会导致打印出 return_value[i]
中的所有项(包括 return_value[i][0]
、return_value[i][1]
、return_value[i][2]
,等等)。
你可以通过只打印捕获组匹配项(不包括 [0]
)来获得你期望的结果,如下所示:
segs2 := re.FindAllStringSubmatch(s, -1)
for i := 0; i < len(segs2); i++ {
fmt.Println(segs2[i][1], ",", segs2[i][2])
}
附注
以下字符串字面量:
"(?P<quant>\d+) (?P<unit>\w+)+"
可以用以下原始字符串字面量表示:
`(?P<quant>\d+) (?P<unit>\w+)+`
参见 字符串字面量
英文:
Regexp.FindAllStringSubmatch
returns [][]string
. But its contents are slightly different from the return value of the Python function re.findall
(I assumed that you used re.findall
in Python).
return_value[i][0]
contains whole matched string.return_value[i][1]
contains captured group 1.return_value[i][2]
contains captured group 2. ....
Printing return_value[i]
cause all items in return_value[i]
to be printed. (return_value[i][0]
, return_value[i][1]
, return_value[i][2]
, ..)
You can get what you expected by only printing captured group matches (excluding [0]
) as follow:
segs2 := re.FindAllStringSubmatch(s, -1)
for i := 0; i < len(segs2); i++ {
fmt.Println(segs2[i][1], "," ,segs2[i][2]);
}
Side Note
Following string literal:
"(?P<quant>\\d+) (?P<unit>\\w+)+"
can be expressed as the following raw string literals.
`(?P<quant>\d+) (?P<unit>\w+)+`
See String literals
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论