Golang – extract links using regex

huangapple go评论78阅读模式
英文:

Golang - extract links using regex

问题

Golang - 使用正则表达式提取链接

我需要使用Go中的正则表达式从文本中获取所有在特定域名example.de中的链接。

以下是应该提取的所有可能链接:

https://example.de
https://example.de/
https://example.de/home
https://example.de/home/
https://example.de/home 一些不应该被提取的文本
https://abc.example.de
https://abc.example.de/
https://abc.example.de/home
https://abc.example.de/home
https://abc.example.de/home 一些不应该被提取的文本

我已经尝试过的方法

我使用了这个网站来检查我的正则表达式是否正确:https://regex101.com/r/ohxUcG/2
以下是失败的组合:

  • https?://*.+example.de*.+ 在表达式 https://abc.example.de/a1b2c3 dsadsa 上失败,获取整个文本到\n而不是https://abc.example.de/a1b2c3,不包括dsadsa
  • https?://*.+example.de*.+\s(\w+)$ 这个只获取以空格结尾的链接,但有时链接可能以\n\t等结尾。

可能有用的资源

英文:

Golang - extract links using regex

I need to get all links from text which are in specific domain example.de using Regex in Go

Below are all possible links that should be extracted:

https://example.de 
https://example.de/
https://example.de/home
https://example.de/home/
https://example.de/home some text that should not be extracted
https://abc.example.de
https://abc.example.de/
https://abc.example.de/home
https://abc.example.de/home
https://abc.example.de/home some text that should not be extracted

What I already tried

I used this website to check if my regex are correct: https://regex101.com/r/ohxUcG/2
and here are combinations that failed:

  • https?://*.+example.de*.+ failed on expression https://abc.example.de/a1b2c3 dsadsa getting whole text to the \n instead of https://abc.example.de/a1b2c3 without dsadsa
  • https?://*.+example.de*.+\s(\w+)$ this gets links that are terminated only with space but sometimes links can be terminated with \n or \t etc.

Resources which may be useful

答案1

得分: 3

你可以使用以下正则表达式进行匹配:

(?:https?://)?(?:[^/.]+\.)*\bexample\.de\b(?:/[^/\s]+)*/?

详细说明如下:

  • (?:https?://)? - 可选的 http://https:// 字符串
  • (?:[^/.]+\.)* - 零个或多个由一个或多个非 /. 字符组成的序列,然后是一个 . 字符
  • \bexample\.de\b - 完整的单词 example.de
  • (?:/[^/\s]+)* - 零个或多个重复的 /,然后是一个或多个非空格和 / 字符
  • /? - 可选的 / 字符。
英文:

You can use

(?:https?://)?(?:[^/.]+\.)*\bexample\.de\b(?:/[^/\s]+)*/?

See the regex demo. Details:

  • (?:https?://)? - an optional http:// or https:// string
  • (?:[^/.]+\.)* - zero or more sequences of one or more chars other than a / and . chars and then a . char
  • \bexample\.de\b - a whole word example.de
  • (?:/[^/\s]+)* - zero or more repetitions of / and then one or more chars other than whitespace and /
  • /? - an optional / char.

huangapple
  • 本文由 发表于 2022年5月13日 17:21:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/72227241.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定