2022年5月13日 17:21:12go评论119阅读模式

英文:

Golang - extract links using regex

问题

Golang - 使用正则表达式提取链接

我需要使用Go中的正则表达式从文本中获取所有在特定域名example.de中的链接。

以下是应该提取的所有可能链接：

https://example.de
https://example.de/
https://example.de/home
https://example.de/home/
https://example.de/home 一些不应该被提取的文本
https://abc.example.de
https://abc.example.de/
https://abc.example.de/home
https://abc.example.de/home
https://abc.example.de/home 一些不应该被提取的文本

我已经尝试过的方法

我使用了这个网站来检查我的正则表达式是否正确：https://regex101.com/r/ohxUcG/2
以下是失败的组合：

https?://*.+example.de*.+ 在表达式 https://abc.example.de/a1b2c3 dsadsa 上失败，获取整个文本到\n而不是https://abc.example.de/a1b2c3，不包括dsadsa
https?://*.+example.de*.+\s(\w+)$ 这个只获取以空格结尾的链接，但有时链接可能以\n或\t等结尾。

可能有用的资源

英文:

Golang - extract links using regex

I need to get all links from text which are in specific domain example.de using Regex in Go

Below are all possible links that should be extracted:

https://example.de 
https://example.de/
https://example.de/home
https://example.de/home/
https://example.de/home some text that should not be extracted
https://abc.example.de
https://abc.example.de/
https://abc.example.de/home
https://abc.example.de/home
https://abc.example.de/home some text that should not be extracted

What I already tried

I used this website to check if my regex are correct: https://regex101.com/r/ohxUcG/2
and here are combinations that failed:

https?://*.+example.de*.+ failed on expression https://abc.example.de/a1b2c3 dsadsa getting whole text to the \n instead of https://abc.example.de/a1b2c3 without dsadsa
https?://*.+example.de*.+\s(\w+)$ this gets links that are terminated only with space but sometimes links can be terminated with \n or \t etc.

Resources which may be useful

答案1

得分: 3

你可以使用以下正则表达式进行匹配：

(?:https?://)?(?:[^/.]+\.)*\bexample\.de\b(?:/[^/\s]+)*/?

详细说明如下：

(?:https?://)? - 可选的 http:// 或 https:// 字符串
(?:[^/.]+\.)* - 零个或多个由一个或多个非 / 和 . 字符组成的序列，然后是一个 . 字符
\bexample\.de\b - 完整的单词 example.de
(?:/[^/\s]+)* - 零个或多个重复的 /，然后是一个或多个非空格和 / 字符
/? - 可选的 / 字符。

英文:

You can use

(?:https?://)?(?:[^/.]+\.)*\bexample\.de\b(?:/[^/\s]+)*/?

See the regex demo. Details:

(?:https?://)? - an optional http:// or https:// string
(?:[^/.]+\.)* - zero or more sequences of one or more chars other than a / and . chars and then a . char
\bexample\.de\b - a whole word example.de
(?:/[^/\s]+)* - zero or more repetitions of / and then one or more chars other than whitespace and /
/? - an optional / char.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Golang – extract links using regex

问题

Golang - 使用正则表达式提取链接

我已经尝试过的方法

可能有用的资源

Golang - extract links using regex

What I already tried

Resources which may be useful

答案1

golang: core net/http package import errors

去，将大写键解组为结构体。

Golang无法解析反射创建的对象的JSON。

如何在Go中实现可调整大小的数组

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论