英文:
Get named list of subgroup in golang regex
问题
我正在寻找一个返回map[string]interface{}
的函数,其中interface{}
可以是一个切片、一个map[string]interface{}
或一个值。
我的用例是解析类似以下的WKT几何图形,并提取点的值;例如一个多边形的例子:
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
正则表达式(我故意设置了只匹配整数的\d,以提高可读性):
(POLYGON \(
(?P<polygons>\(
(?P<points>(?P<point>(\d \d), ){3,})
(?P<last_point>\d \d )\),)*
(?P<last_polygon>\(
(?P<points>(?P<point>(\d \d), ){3,})
(?P<last_point>\d \d)\))\)
)
我有一个函数(从SO复制而来),它可以获取一些信息,但对于嵌套组和组列表来说并不是很好:
func getRegexMatchParams(reg *regexp.Regexp, url string) (paramsMap map[string]string) {
match := reg.FindStringSubmatch(url)
paramsMap = make(map[string]string)
for i, name := range reg.SubexpNames() {
if i > 0 && i <= len(match) {
paramsMap[name] = match[i]
}
}
return match
}
似乎point
组只获取到一个点。
在playground上的示例
[编辑] 我想要的结果是这样的:
map[string]interface{}{
"polygons": map[string]interface{}{
"points": []interface{}{
map[string]string{"point": "0 0"},
map[string]string{"point": "0 10"},
map[string]string{"point": "10 10"},
map[string]string{"point": "10 0"},
},
"last_point": "0 0",
},
"last_polygon": map[string]interface{}{
"points": []interface{}{
map[string]string{"point": "3 3"},
map[string]string{"point": "3 7"},
map[string]string{"point": "7 7"},
map[string]string{"point": "7 3"},
},
"last_point": "3 3",
},
}
这样我就可以进一步用于不同的目的,比如查询数据库并验证每个多边形的last_point
是否等于points[0]
。
英文:
I'm looking for a function that returns a map[string]interface{}
where interface{}
can be a slice, a a map[string]interface{}
or a value.
My use case is to parse WKT geometry like the following and retrieves point values; Example for a donut polygon:
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
The regex (I voluntary set \d that matches only integers for readability purpose):
(POLYGON \(
(?P<polygons>\(
(?P<points>(?P<point>(\d \d), ){3,})
(?P<last_point>\d \d )\),)*
(?P<last_polygon>\(
(?P<points>(?P<point>(\d \d), ){3,})
(?P<last_point>\d \d)\))\)
)
I have a function (copied from SO) that retrieves some informations but it's not that good for nested groups and list of groups:
func getRegexMatchParams(reg *regexp.Regexp, url string) (paramsMap map[string]string) {
match := reg.FindStringSubmatch(url)
paramsMap = make(map[string]string)
for i, name := range reg.SubexpNames() {
if i > 0 && i <= len(match) {
paramsMap[name] = match[i]
}
}
return match
}
It seems that the group point
gets only 1 point.
example on playground
[EDIT] The result I want is something like this:
map[string]interface{}{
"polygons": map[string]interface{} {
"points": []interface{}{
{map[string]string{"point": "0 0"}},
{map[string]string{"point": "0 10"}},
{map[string]string{"point": "10 10"}},
{map[string]string{"point": "10 0"}},
},
"last_point": "0 0",
},
"last_polygon": map[string]interface{} {
"points": []interface{}{
{map[string]string{"point": "3 3"}},
{map[string]string{"point": "3 7"}},
{map[string]string{"point": "7 7"}},
{map[string]string{"point": "7 3"}},
},
"last_point": "3 3",
}
}
So I can use it further for different purposes like querying databases and validate that last_point = points[0] for each polygon.
答案1
得分: 2
尝试在正则表达式中添加一些空格。
还要注意,此引擎不会保留在类似 (a|b|c)+
这样的量化外部分组中的所有捕获组值,其中该组仅包含它找到的最后一个 a、b 或 c。
而且,你的正则表达式可以简化为:
(POLYGON\s*\((?P<polygons>\(\s*(?P<points>(?P<point>\s*(\d+\s+\d+)\s*,){3,})\s*(?P<last_point>\d+\s+\d+)\s*\)(?:\s*,\s*|\s*\)))+)
原始链接:
https://play.golang.org/p/rLaaEa_7GX
以下是各个组的内容:
( # (1 开始)
POLYGON \s* \(
(?P<polygons> # (2 开始)
\( \s*
(?P<points> # (3 开始)
(?P<point> # (4 开始)
\s*
( \d+ \s+ \d+ ) # (5)
\s*
,
){3,} # (4 结束)
) # (3 结束)
\s*
(?P<last_point> \d+ \s+ \d+ ) # (6)
\s* \),
)* # (2 结束)
(?P<last_polygon> # (7 开始)
\( \s*
(?P<points> # (8 开始)
(?P<point> # (9 开始)
\s*
( \d+ \s+ \d+ ) # (10)
\s*
,
){3,} # (9 结束)
) # (8 结束)
\s*
(?P<last_point> \d+ \s+ \d+ ) # (11)
\s* \)
) # (7 结束)
\s* \)
) # (1 结束)
输入:
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
输出:
** Grp 0 - ( 位置 0 ,长度 65 )
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
** Grp 1 - ( 位置 0 ,长度 65 )
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
** Grp 2 [polygons] - ( 位置 9 ,长度 30 )
(0 0, 0 10, 10 10, 10 0, 0 0),
** Grp 3 [points] - ( 位置 10 ,长度 23 )
0 0, 0 10, 10 10, 10 0,
** Grp 4 [point] - ( 位置 27 ,长度 6 )
10 0,
** Grp 5 - ( 位置 28 ,长度 4 )
10 0
** Grp 6 [last_point] - ( 位置 34 ,长度 3 )
0 0
** Grp 7 [last_polygon] - ( 位置 39 ,长度 25 )
(3 3, 3 7, 7 7, 7 3, 3 3)
** Grp 8 [points] - ( 位置 40 ,长度 19 )
3 3, 3 7, 7 7, 7 3,
** Grp 9 [point] - ( 位置 54 ,长度 5 )
7 3,
** Grp 10 - ( 位置 55 ,长度 3 )
7 3
** Grp 11 [last_point] - ( 位置 60 ,长度 3 )
3 3
可能的解决方案:
这并非不可能,只是需要额外的几个步骤。
(顺便说一句,难道没有一个可以解析这个的 WKT 库吗?)
现在,我不知道你的语言能力如何,所以这只是一个一般的方法。
- 验证要解析的形式。
这将验证并返回所有多边形集作为单个字符串在All_Polygons
组中。
目标 POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
POLYGON\s*\((?P<All_Polygons>(?:\(\s*\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,}\s*\))(?:\s*,\(\s*\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,}\s*\))*)\s*\)
** Grp 1 [All_Polygons] - ( 位置 9 ,长度 55 )
(0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3)
- 如果步骤 1 成功,使用步骤 1 的输出设置一个循环匹配。
目标 (0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3)
(?:\(\s*(?P<Single_Poly_All_Pts>\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,})\s*\))
这一步相当于查找所有匹配项。它应该匹配单个多边形的所有点的连续值,并在 Single_Poly_All_Pts
组字符串中返回。
这将给你这两个单独的匹配项,可以将它们放入一个临时数组中,其中有两个值字符串:
** Grp 1 [Single_Poly_All_Pts] - ( 位置 1 ,长度 27 )
0 0, 0 10, 10 10, 10 0, 0 0
** Grp 1 [Single_Poly_All_Pts] - ( 位置 31 ,长度 23 )
3 3, 3 7, 7 7, 7 3, 3 3
- 如果步骤 2 成功,使用步骤 2 的临时数组输出设置一个循环匹配。
这将给出每个多边形的 单个 点。
(?P<Single_Point>\d+\s+\d+)
同样,这是一个循环匹配(或查找所有类型的匹配)。对于每个数组元素(多边形),这将产生单个点。
目标[元素 1] 0 0, 0 10, 10 10, 10 0, 0 0
** Grp 1 [Single_Point] - ( 位置 0 ,长度 3 )
0 0
** Grp 1 [Single_Point] - ( 位置 5 ,长度 4 )
0 10
** Grp 1 [Single_Point] - ( 位置 11 ,长度 5 )
10 10
** Grp 1 [Single_Point] - ( 位置 18 ,长度 4 )
10 0
** Grp 1 [Single_Point] - ( 位置 24 ,长度 3 )
0 0
以及,
目标[元素 2] 3 3, 3 7, 7 7, 7 3, 3 3
** Grp 1 [Single_Point] - ( 位置 0 ,长度 3 )
3 3
** Grp 1 [Single_Point] - ( 位置 5 ,长度 3 )
3 7
** Grp 1 [Single_Point] - ( 位置 10 ,长度 3 )
7 7
** Grp 1 [Single_Point] - ( 位置 15 ,长度 3 )
7 3
** Grp 1 [Single_Point] - ( 位置 20 ,长度 3 )
3 3
英文:
Try to add some whitespace to the regex.
Also note that this engine won't retain all capture group values that are
within a quantified outer grouping like (a|b|c)+
where this group will only contain the last a or b or c it finds.
And, your regex can be reduced to this
(POLYGON\s*\((?P<polygons>\(\s*(?P<points>(?P<point>\s*(\d+\s+\d+)\s*,){3,})\s*(?P<last_point>\d+\s+\d+)\s*\)(?:\s*,\s*|\s*\)))+)
https://play.golang.org/p/rLaaEa_7GX
The original:
(POLYGON\s*\((?P<polygons>\(\s*(?P<points>(?P<point>\s*(\d+\s+\d+)\s*,){3,})\s*(?P<last_point>\d+\s+\d+)\s*\),)*(?P<last_polygon>\(\s*(?P<points>(?P<point>\s*(\d+\s+\d+)\s*,){3,})\s*(?P<last_point>\d+\s+\d+)\s*\))\s*\))
https://play.golang.org/p/rZgJYPDMzl
See below for what the groups contain.
( # (1 start)
POLYGON \s* \(
(?P<polygons> # (2 start)
\( \s*
(?P<points> # (3 start)
(?P<point> # (4 start)
\s*
( \d+ \s+ \d+ ) # (5)
\s*
,
){3,} # (4 end)
) # (3 end)
\s*
(?P<last_point> \d+ \s+ \d+ ) # (6)
\s* \),
)* # (2 end)
(?P<last_polygon> # (7 start)
\( \s*
(?P<points> # (8 start)
(?P<point> # (9 start)
\s*
( \d+ \s+ \d+ ) # (10)
\s*
,
){3,} # (9 end)
) # (8 end)
\s*
(?P<last_point> \d+ \s+ \d+ ) # (11)
\s* \)
) # (7 end)
\s* \)
) # (1 end)
Input
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
Output
** Grp 0 - ( pos 0 , len 65 )
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
** Grp 1 - ( pos 0 , len 65 )
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
** Grp 2 [polygons] - ( pos 9 , len 30 )
(0 0, 0 10, 10 10, 10 0, 0 0),
** Grp 3 [points] - ( pos 10 , len 23 )
0 0, 0 10, 10 10, 10 0,
** Grp 4 [point] - ( pos 27 , len 6 )
10 0,
** Grp 5 - ( pos 28 , len 4 )
10 0
** Grp 6 [last_point] - ( pos 34 , len 3 )
0 0
** Grp 7 [last_polygon] - ( pos 39 , len 25 )
(3 3, 3 7, 7 7, 7 3, 3 3)
** Grp 8 [points] - ( pos 40 , len 19 )
3 3, 3 7, 7 7, 7 3,
** Grp 9 [point] - ( pos 54 , len 5 )
7 3,
** Grp 10 - ( pos 55 , len 3 )
7 3
** Grp 11 [last_point] - ( pos 60 , len 3 )
3 3
Possible Solution
It's not impossible. It just takes a few extra steps.
(As an aside, isn't there a library for WKT that can parse this for you ?)
Now, I don't know your language capabilities, so this is just a general approach.
1. Validate the form you're parsing.
This will validate and return all polygon sets as a single string in All_Polygons
group.
Target POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
POLYGON\s*\((?P<All_Polygons>(?:\(\s*\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,}\s*\))(?:\s*,\(\s*\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,}\s*\))*)\s*\)
** Grp 1 [All_Polygons] - ( pos 9 , len 55 )
(0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3)
2. If 1 was successful, set up a loop match using the output of All_Polygons
string.
Target (0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3)
(?:\(\s*(?P<Single_Poly_All_Pts>\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,})\s*\))
This step is equivalent of a find all type of match. It should match successive values of all the points of a single polygon, returned in Single_Poly_All_Pts
group string.
This will give you these 2 separate matches, which can be put into a temp array having 2 value strings:
** Grp 1 [Single_Poly_All_Pts] - ( pos 1 , len 27 )
0 0, 0 10, 10 10, 10 0, 0 0
** Grp 1 [Single_Poly_All_Pts] - ( pos 31 , len 23 )
3 3, 3 7, 7 7, 7 3, 3 3
3. If 2 was successful, set up a loop match using the temp array output of step 2.
This will give you the individual points of each polygon.
(?P<Single_Point>\d+\s+\d+)
Again this is a loop match (or a find all type of match). For each array element
(Polygon), this will produce the individual points.
Target[element 1] 0 0, 0 10, 10 10, 10 0, 0 0
** Grp 1 [Single_Point] - ( pos 0 , len 3 )
0 0
** Grp 1 [Single_Point] - ( pos 5 , len 4 )
0 10
** Grp 1 [Single_Point] - ( pos 11 , len 5 )
10 10
** Grp 1 [Single_Point] - ( pos 18 , len 4 )
10 0
** Grp 1 [Single_Point] - ( pos 24 , len 3 )
0 0
And,
Target[element 2] 3 3, 3 7, 7 7, 7 3, 3 3
** Grp 1 [Single_Point] - ( pos 0 , len 3 )
3 3
** Grp 1 [Single_Point] - ( pos 5 , len 3 )
3 7
** Grp 1 [Single_Point] - ( pos 10 , len 3 )
7 7
** Grp 1 [Single_Point] - ( pos 15 , len 3 )
7 3
** Grp 1 [Single_Point] - ( pos 20 , len 3 )
3 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论