英文:
Regex to find second occurrence of a domain within a string
问题
我正在尝试使用正则表达式在Google App Script中从字符串中提取web域的第二次出现。在下面的示例中,您可以看到它返回duckduckgo.com和chicaspoderosas.org。
<a rel="nofollow" class="result__a" href="//duckduckgo.com/l/?uddg=https%3A%2F%2Fchicaspoderosas.org%2Fabout%2F&amp;rut=6df21641031fd7b57d82fcdbc2312bc4b27034927655759d8e270840fae4fab1">ABOUT - Chicas Poderosas</a>
当第二个域名具有www等子域时,我可以毫无问题地执行操作,使用以下正则表达式,但是似乎在第二个域名没有子域时无法正确提取目标。
当前在有子域时有效的正则表达式:
var regExp = new RegExp("(www.[a-z]+.[a-z]+.[a-z])", "gi");
我做错了什么?
英文:
I'm trying to extract the second occurrence of a web domain in a string with regex within Google App Script. In the below example you can see it is returning duckduckgo.com and chicaspoderosas.org
<a rel="nofollow" class="result__a" href="//duckduckgo.com/l/?uddg=https%3A%2F%2Fchicaspoderosas.org%2Fabout%2F&amp;rut=6df21641031fd7b57d82fcdbc2312bc4b27034927655759d8e270840fae4fab1">ABOUT - Chicas Poderosas</a>
I can do it no problem when the second domain has a subdomain such as www with the below regex, but can't seem to get the target extraction right when there isn't a subdomain on the second domain.
Current one that works when subdomain is present:
var regExp = new RegExp("(www.[a-z]+.[a-z]+.[a-z])", "gi");
What am I doing wrong?
答案1
得分: 0
你可以从示例字符串中匹配 href 后跟两倍可能的 URL 格式,然后使用一个捕获组(值在组 1 中)来获取第二个想要获取的 URL。
假设 //
是示例数据中 URL 的起始点:
\bhref="(?:https?(?:%3A|:))?(?:%2F%2F|\/\/)[^\s"]*?((?:https?(?:%3A|:))?(?:%2F%2F|\/\/)[^\s"]*)
英文:
You could start matching the href followed by 2 times the possible url format that is in the example string and use a capture group (where the value is in group 1) for the second url that you want to get.
Assuming that the //
is the start of an url in the example data:
\bhref="(?:https?(?:%3A|:))?(?:%2F%2F|\/\/)[^\s"]*?((?:https?(?:%3A|:))?(?:%2F%2F|\/\/)[^\s"]*)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论