2014年1月5日 13:55:04go评论110阅读模式

英文:

golang regular expression to extract pairs of quantities and their units

问题

我有一组可读的字符串表示时间段。以下是四个示例：

1 天 40 小时 23 分钟 50 秒

3 小时 1 分钟 30 秒

10 天 23 分钟 11 秒

52 秒

我想将这些字符串转换为秒数。一旦将字符串分解为其组成部分，进行这种转换的数学运算就非常简单 - 只需进行乘法和加法运算。然而，我在编写正则表达式将字符串解析为 [<数量>, <单位>] 对时遇到了一些问题。例如，对于字符串：

1 天 40 小时 23 分钟 50 秒

我希望得到一个类似的数组（或切片）：

[[1, "天"], [40, "小时"], [23, "分钟"], [50, "秒"]]

以下是我目前尝试的代码及其输出（可在 http://play.golang.org/p/iR-xfc8MVQ 上执行）。segs 是我第一次尝试的结果，它似乎将字符串分解为了 4 个组件，但每个组件只是一个字符串，如 1 天，而不是一个包含两个元素的数组，如 [1, 天]。segs2 是我第二次尝试的结果，它似乎做了一些更奇怪的事情，每个组件都重复了两次。

// 时间单位分词器
package main

import "fmt"
import "regexp"

func main() {
    s := "1 天 40 小时 23 分钟 50 秒"
    re := regexp.MustCompile(`(?P<quant>\d+) (?P<unit>\w+)+`)

    segs := re.FindAllString(s, -1)
    fmt.Println("segs:", segs)
    fmt.Println(segs[0], ",", segs[1], ",", segs[2], ",", segs[3])
    fmt.Println("length segs:", len(segs))

    segs2 := re.FindAllStringSubmatch(s, -1)
    fmt.Println("segs2:", segs2)
    fmt.Println(segs2[0], ",", segs2[1], ",", segs2[2], ",", segs2[3])
    fmt.Println("length segs2:", len(segs2))
}

输出：

segs: [1 天 40 小时 23 分钟 50 秒]
1 天 , 40 小时 , 23 分钟 , 50 秒
length segs: 4
segs2: [[1 天 1 天] [40 小时 40 小时] [23 分钟 23 分钟] [50 秒 50 秒]]
[1 天 1 天] , [40 小时 40 小时] , [23 分钟 23 分钟] , [50 秒 50 秒]
length segs2: 4

我在 Python 中编写了类似的正则表达式，它可以正常工作，所以我真的不确定我是否在 Go 的正则表达式语法上做错了什么，或者在 re 对象上做了错误的调用。

英文:

I have a set of human readable strings expressing a duration of time. Here are four examples:

1 days 40 hrs 23 min 50 sec

3 hrs 1 min 30 sec

10 days 23 min 11 sec

52 sec

I am trying to convert these strings into number of seconds. The math to do this is quite simple once the string is broken down into its components - it's just multiplication and addition. I am having some issues however with writing the regular expression to parse the string into [<quantity>, <unit>] pairs. As an example, the output I would like for the string:

1 days 40 hrs 23 min 50 sec

is an array (or slice) like:

[[1, "days"], [40, "hrs"], [23, "min"], [50, "sec"]].

Below is the code for what I've tried so far and its output (executable at http://play.golang.org/p/iR-xfc8MVQ). segs was my first attempt, which seems to break the string down into 4 components ok but each component is just a string like 1 days rather than a 2-element array like [1, days]. segs2 was my second attempt, which seems to do something weirder where each component is repeated twice.

// time unit tokenizer
package main

import &quot;fmt&quot;
import &quot;regexp&quot;

func main() {
	s := &quot;1 days 40 hrs 23 min 50 sec&quot;
	re := regexp.MustCompile(&quot;(?P&lt;quant&gt;\\d+) (?P&lt;unit&gt;\\w+)+&quot;)
	
	segs := re.FindAllString(s, -1)
	fmt.Println(&quot;segs:&quot;, segs)
	fmt.Println(segs[0], &quot;,&quot; ,segs[1], &quot;,&quot;, segs[2], &quot;,&quot;, segs[3])	
	fmt.Println(&quot;length segs:&quot;, len(segs))
	
	segs2 := re.FindAllStringSubmatch(s, -1)
	fmt.Println(&quot;segs2:&quot;, segs2)
	fmt.Println(segs2[0], &quot;,&quot; ,segs2[1], &quot;,&quot;, segs2[2], &quot;,&quot;, segs2[3])
	fmt.Println(&quot;length segs2:&quot;, len(segs2))
}

Output:

segs: [1 days 40 hrs 23 min 50 sec]
1 days , 40 hrs , 23 min , 50 sec
length segs: 4
segs2: [[1 days 1 days] [40 hrs 40 hrs] [23 min 23 min] [50 sec 50 sec]]
[1 days 1 days] , [40 hrs 40 hrs] , [23 min 23 min] , [50 sec 50 sec]
length segs2: 4

I've written a similar regex is Python which works OK, so I'm really not sure whether I am doing something incorrect for Go's regular expression syntax or perhaps making the wrong call on the re object.

答案1

得分: 8

Regexp.FindAllStringSubmatch 返回的是 [][]string。但是它的内容与 Python 函数 re.findall 的返回值略有不同（我假设你在 Python 中使用了 re.findall）。

return_value[i][0] 包含整个匹配的字符串。
return_value[i][1] 包含第一个捕获组。
return_value[i][2] 包含第二个捕获组。....

打印 return_value[i] 会导致打印出 return_value[i] 中的所有项（包括 return_value[i][0]、return_value[i][1]、return_value[i][2]，等等）。

你可以通过只打印捕获组匹配项（不包括 [0]）来获得你期望的结果，如下所示：

segs2 := re.FindAllStringSubmatch(s, -1)
for i := 0; i < len(segs2); i++ {
    fmt.Println(segs2[i][1], ",", segs2[i][2])
}

演示示例

附注

以下字符串字面量：

"(?P<quant>\d+) (?P<unit>\w+)+"

可以用以下原始字符串字面量表示：

`(?P<quant>\d+) (?P<unit>\w+)+`

参见字符串字面量

英文:

Regexp.FindAllStringSubmatch returns [][]string. But its contents are slightly different from the return value of the Python function re.findall (I assumed that you used re.findall in Python).

return_value[i][0] contains whole matched string.
return_value[i][1] contains captured group 1.
return_value[i][2] contains captured group 2. ....

Printing return_value[i] cause all items in return_value[i] to be printed. (return_value[i][0], return_value[i][1], return_value[i][2], ..)

You can get what you expected by only printing captured group matches (excluding [0]) as follow:

segs2 := re.FindAllStringSubmatch(s, -1)
for i := 0; i &lt; len(segs2); i++ {
	fmt.Println(segs2[i][1], &quot;,&quot; ,segs2[i][2]);
}

Demo

Side Note

Following string literal:

&quot;(?P&lt;quant&gt;\\d+) (?P&lt;unit&gt;\\w+)+&quot;

can be expressed as the following raw string literals.

`(?P&lt;quant&gt;\d+) (?P&lt;unit&gt;\w+)+`

See String literals

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

golang regular expression to extract pairs of quantities and their units

问题

答案1

如何将Go及其资源文件打包成WASM？

JSON Unmarshal 不规则的 JSON 字段

使用Go语言在OS X上解析Excel文件。

Why does `go tool pprof` show addresses instead of function names?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论