正则表达式查找字符串中域名的第二个出现。

huangapple go评论57阅读模式
英文:

Regex to find second occurrence of a domain within a string

问题

我正在尝试使用正则表达式在Google App Script中从字符串中提取web域的第二次出现。在下面的示例中,您可以看到它返回duckduckgo.com和chicaspoderosas.org。

<a rel="nofollow" class="result__a" href="//duckduckgo.com/l/?uddg=https%3A%2F%2Fchicaspoderosas.org%2Fabout%2F&rut=6df21641031fd7b57d82fcdbc2312bc4b27034927655759d8e270840fae4fab1">ABOUT - Chicas Poderosas</a>

当第二个域名具有www等子域时,我可以毫无问题地执行操作,使用以下正则表达式,但是似乎在第二个域名没有子域时无法正确提取目标。

当前在有子域时有效的正则表达式:

var regExp = new RegExp("(www.[a-z]+.[a-z]+.[a-z])", "gi");

我做错了什么?

英文:

I'm trying to extract the second occurrence of a web domain in a string with regex within Google App Script. In the below example you can see it is returning duckduckgo.com and chicaspoderosas.org

<a rel="nofollow" class="result__a" href="//duckduckgo.com/l/?uddg=https%3A%2F%2Fchicaspoderosas.org%2Fabout%2F&rut=6df21641031fd7b57d82fcdbc2312bc4b27034927655759d8e270840fae4fab1">ABOUT - Chicas Poderosas</a>

I can do it no problem when the second domain has a subdomain such as www with the below regex, but can't seem to get the target extraction right when there isn't a subdomain on the second domain.

Current one that works when subdomain is present:

var regExp = new RegExp("(www.[a-z]+.[a-z]+.[a-z])", "gi");

What am I doing wrong?

答案1

得分: 0

你可以从示例字符串中匹配 href 后跟两倍可能的 URL 格式,然后使用一个捕获组(值在组 1 中)来获取第二个想要获取的 URL。

假设 // 是示例数据中 URL 的起始点:

\bhref="(?:https?(?:%3A|:))?(?:%2F%2F|\/\/)[^\s"]*?((?:https?(?:%3A|:))?(?:%2F%2F|\/\/)[^\s"]*)

正则表达式演示

英文:

You could start matching the href followed by 2 times the possible url format that is in the example string and use a capture group (where the value is in group 1) for the second url that you want to get.

Assuming that the // is the start of an url in the example data:

\bhref="(?:https?(?:%3A|:))?(?:%2F%2F|\/\/)[^\s"]*?((?:https?(?:%3A|:))?(?:%2F%2F|\/\/)[^\s"]*)

Regex demo

huangapple
  • 本文由 发表于 2023年3月3日 21:11:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/75627504.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定