不使用否定的情况下编写正则表达式。

huangapple go评论94阅读模式
英文:

Write regex without negations

问题

在之前的帖子中,我请求帮助重写一个没有否定的正则表达式。

起始正则表达式:

https?:\/\/(?:.(?!https?:\/\/))+$

最终得到:

https?:[^:]*$

这个正则表达式工作得很好,但我注意到如果我的URL中除了http\s中的:之外还有:,它将无法选择。

这是一个不起作用的字符串:

sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2

你可以注意到:query2

我该如何修改这里列出的第二个正则表达式,以便选择包含:的URL。

期望的输出:

http://websites.com/path/subpath/cc:query2

另外,我想选择直到第一次出现?=param之前的所有内容。

输入:
sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param

输出:

http://websites.com/path/subpath/cc:query2/text/

英文:

In a previous post I've asked for some help on rewriting a regex without negation

Starting regex:

https?:\/\/(?:.(?!https?:\/\/))+$

Ended up with:

https?:[^:]*$

This works fine but i've noticed that in case I will have : in my URL besides the : from http\s it will not select.

Here is a string which is not working:

sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2

You can notice the :query2

How can I modify the second regex listed here so it will select urls which contain :.

Expected output:

http://websites.com/path/subpath/cc:query2

Also I would like to select everything till the first occurance of ?=param

Input:
sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param

Output:

http://websites.com/path/subpath/cc:query2/text/

答案1

得分: 4

很遗憾,Go的正则表达式不支持lookaround。但是,你可以通过一种技巧来获取最后一个链接:贪婪地匹配所有可能的链接和其他字符,并使用捕获组捕获最后一个链接:

^(?:https?://|.)*(https?://\S+?)(?:\?=|$)

结合使用\S*?进行懒惰的空白匹配,这还可以捕获链接直到?=

请参见正则表达式演示Go演示

var r = regexp.MustCompile(`^(?:https?://|.)*(https?://\S+?)(?:\?=|$)`)
fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2", -1)[0][1])
fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param", -1)[0][1])

结果:

"http://websites.com/path/subpath/:query2"
"http://websites.com/path/subpath/cc:query2/text/"

如果最后一个链接中可能包含空格,请使用.+?

^(?:https?://|.)*(https?://.+?)(?:\?=|$)
英文:

It is a pity that Go regex does not support lookarounds.
However, you can obtain the last link with a sort of a trick: match all possible links and other characters greedily and capture the last link with a capturing group:

^(?:https?://|.)*(https?://\S+?)(?:\?=|$)

Together with \S*? lazy whitespace matching, this also lets capture the link up to the ?=.

See regex demo and Go demo

var r = regexp.MustCompile(`^(?:https?://|.)*(https?://\S+?)(?:\?=|$)`)
fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2", -1)[0][1])
fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param", -1)[0][1])

Results:

"http://websites.com/path/subpath/:query2"
"http://websites.com/path/subpath/cc:query2/text/"

In case there can be spaces in the last link, use just .+?:

^(?:https?://|.)*(https?://.+?)(?:\?=|$)

huangapple
  • 本文由 发表于 2015年8月6日 04:56:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/31842666.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定