2023年3月3日 21:11:12go评论106阅读模式

英文:

Regex to find second occurrence of a domain within a string

问题

我正在尝试使用正则表达式在Google App Script中从字符串中提取web域的第二次出现。在下面的示例中，您可以看到它返回duckduckgo.com和chicaspoderosas.org。

&lt;a rel=&quot;nofollow&quot; class=&quot;result__a&quot; href=&quot;//duckduckgo.com/l/?uddg=https%3A%2F%2Fchicaspoderosas.org%2Fabout%2F&amp;amp;rut=6df21641031fd7b57d82fcdbc2312bc4b27034927655759d8e270840fae4fab1&quot;&gt;ABOUT - Chicas Poderosas&lt;/a&gt;

当第二个域名具有www等子域时，我可以毫无问题地执行操作，使用以下正则表达式，但是似乎在第二个域名没有子域时无法正确提取目标。

当前在有子域时有效的正则表达式：

var regExp = new RegExp(&quot;(www.[a-z]+.[a-z]+.[a-z])&quot;, &quot;gi&quot;);

我做错了什么？

英文:

I'm trying to extract the second occurrence of a web domain in a string with regex within Google App Script. In the below example you can see it is returning duckduckgo.com and chicaspoderosas.org

&lt;a rel=&quot;nofollow&quot; class=&quot;result__a&quot; href=&quot;//duckduckgo.com/l/?uddg=https%3A%2F%2Fchicaspoderosas.org%2Fabout%2F&amp;amp;rut=6df21641031fd7b57d82fcdbc2312bc4b27034927655759d8e270840fae4fab1&quot;&gt;ABOUT - Chicas Poderosas&lt;/a&gt;

I can do it no problem when the second domain has a subdomain such as www with the below regex, but can't seem to get the target extraction right when there isn't a subdomain on the second domain.

Current one that works when subdomain is present:

var regExp = new RegExp(&quot;(www.[a-z]+.[a-z]+.[a-z])&quot;, &quot;gi&quot;);

What am I doing wrong?

答案1

得分: 0

你可以从示例字符串中匹配 href 后跟两倍可能的 URL 格式，然后使用一个捕获组（值在组 1 中）来获取第二个想要获取的 URL。

假设 // 是示例数据中 URL 的起始点：

\bhref=&quot;(?:https?(?:%3A|:))?(?:%2F%2F|\/\/)[^\s&quot;]*?((?:https?(?:%3A|:))?(?:%2F%2F|\/\/)[^\s&quot;]*)

正则表达式演示

英文:

You could start matching the href followed by 2 times the possible url format that is in the example string and use a capture group (where the value is in group 1) for the second url that you want to get.

Assuming that the // is the start of an url in the example data:

\bhref=&quot;(?:https?(?:%3A|:))?(?:%2F%2F|\/\/)[^\s&quot;]*?((?:https?(?:%3A|:))?(?:%2F%2F|\/\/)[^\s&quot;]*)

Regex demo

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

正则表达式查找字符串中域名的第二个出现。

问题

答案1

数据元素点击 URL 在 Adobe Launch 中真的不可能吗？

JavaScript正则表达式似乎忽略了锚点。

如何检查本地安装的nodemon版本

在Web应用程序中显示版本信息的最佳做法是什么？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。