英文:
Write regex without negations
问题
在之前的帖子中,我请求帮助重写一个没有否定的正则表达式。
起始正则表达式:
https?:\/\/(?:.(?!https?:\/\/))+$
最终得到:
https?:[^:]*$
这个正则表达式工作得很好,但我注意到如果我的URL中除了http\s中的:
之外还有:
,它将无法选择。
这是一个不起作用的字符串:
sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2
你可以注意到:query2
。
我该如何修改这里列出的第二个正则表达式,以便选择包含:
的URL。
期望的输出:
http://websites.com/path/subpath/cc:query2
另外,我想选择直到第一次出现?=param
之前的所有内容。
输入:
sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param
输出:
http://websites.com/path/subpath/cc:query2/text/
英文:
In a previous post I've asked for some help on rewriting a regex without negation
Starting regex:
https?:\/\/(?:.(?!https?:\/\/))+$
Ended up with:
https?:[^:]*$
This works fine but i've noticed that in case I will have :
in my URL besides the :
from http\s it will not select.
Here is a string which is not working:
sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2
You can notice the :query2
How can I modify the second regex listed here so it will select urls which contain :
.
Expected output:
http://websites.com/path/subpath/cc:query2
Also I would like to select everything till the first occurance of ?=param
Input:
sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param
Output:
http://websites.com/path/subpath/cc:query2/text/
答案1
得分: 4
很遗憾,Go的正则表达式不支持lookaround。但是,你可以通过一种技巧来获取最后一个链接:贪婪地匹配所有可能的链接和其他字符,并使用捕获组捕获最后一个链接:
^(?:https?://|.)*(https?://\S+?)(?:\?=|$)
结合使用\S*?
进行懒惰的空白匹配,这还可以捕获链接直到?=
。
var r = regexp.MustCompile(`^(?:https?://|.)*(https?://\S+?)(?:\?=|$)`)
fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2", -1)[0][1])
fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param", -1)[0][1])
结果:
"http://websites.com/path/subpath/:query2"
"http://websites.com/path/subpath/cc:query2/text/"
如果最后一个链接中可能包含空格,请使用.+?
:
^(?:https?://|.)*(https?://.+?)(?:\?=|$)
英文:
It is a pity that Go regex does not support lookarounds.
However, you can obtain the last link with a sort of a trick: match all possible links and other characters greedily and capture the last link with a capturing group:
^(?:https?://|.)*(https?://\S+?)(?:\?=|$)
Together with \S*?
lazy whitespace matching, this also lets capture the link up to the ?=
.
See regex demo and Go demo
var r = regexp.MustCompile(`^(?:https?://|.)*(https?://\S+?)(?:\?=|$)`)
fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2", -1)[0][1])
fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param", -1)[0][1])
Results:
"http://websites.com/path/subpath/:query2"
"http://websites.com/path/subpath/cc:query2/text/"
In case there can be spaces in the last link, use just .+?
:
^(?:https?://|.)*(https?://.+?)(?:\?=|$)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论