提取字符串中所有URL匹配的模式

huangapple go评论67阅读模式
英文:

Pattern to extract all matches of URLs within a string

问题

我有一些类似于 http://app14.co.ad.local:90/ATT\me1111419.org 的 URL 在字符串中。点 . 和反斜杠 \ 或正斜杠 / 的数量不是固定的。URL 以 "http" 开头,以点+("org","com","net") 结尾

字符串 期望结果
阅读 http://app14.co.ad.local:90/ATT\me1111419.org blasss 测试 wwww http://app14.co.ad.local:90/ATT\me1.111.com xxxxbb  aaa<br>qwer fff http://app14.co.ad.local:90/ATT\bbb1419.net www http://app14.co.ad.local:90/ATT\me1111419.org<br>http://app14.co.ad.local:90/ATT\me1.111.com<br>http://app14.co.ad.local:90/ATT\b.bb1.419.net

下面的代码只有在字符串仅包含 URL 而没有其他单词时才能正常工作。我对模式本身有问题。

Option Explicit
Option Compare Text

Function RegexMatches(strInput As String) As String

    Dim re As New RegExp
    Dim rMatch As Object, s As String, arrayMatches(), i As Long

     With re
       .Pattern = &quot;(http:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?&quot;
       .Global = True
       .MultiLine = True
       .IgnoreCase = True
     End With
    
     If re.test(strInput) Then
        For Each rMatch In re.Execute(strInput)
            ReDim Preserve arrayMatches(i)
            arrayMatches(i) = rMatch.Value
            i = i + 1
        Next
     End If
    
    RegexMatches = Join(arrayMatches, vbLf)

End Function
英文:

I have some URLs like this http://app14.co.ad.local:90/ATT\me1111419.org within a string.
The numbers of dots . and backslash \ or forward slash / is not constant.
The URLs start with &quot;http" and end with a dot+("org" , "com", "net")

Strings expected result
Read me http://app14.co.ad.local:90/ATT\me1111419.org blasss test wwww http://app14.co.ad.local:90/ATT\me1.111.com xxxxbb  aaa<br>qwer fff http://app14.co.ad.local:90/ATT\bbb1419.net www http://app14.co.ad.local:90/ATT\me1111419.org<br>http://app14.co.ad.local:90/ATT\me1.111.com<br>http://app14.co.ad.local:90/ATT\b.bb1.419.net

the below code will work correctly only if my string contains only URLs and no other words.
my problem with the pattern itself.

Option Explicit
Option Compare Text

Function RegexMatches(strInput As String) As String

    Dim re As New RegExp
    Dim rMatch As Object, s As String, arrayMatches(), i As Long

     With re
       .Pattern = &quot;(http:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?&quot;
       .Global = True
       .MultiLine = True
       .IgnoreCase = True
     End With
    
     If re.test(strInput) Then
        For Each rMatch In re.Execute(strInput)
            ReDim Preserve arrayMatches(i)
            arrayMatches(i) = rMatch.Value
            i = i + 1
        Next
     End If
    
    RegexMatches = Join(arrayMatches, vbLf)

End Function

答案1

得分: 1

你可以使用:

http://            # 匹配 'http://'
\S+                # 后跟一个或多个非空白字符
\.(?:net|com|org)  # 然后是 '.net'、'.com' 或 '.org' 中的一个。

regex101.com 上尝试它。

英文:

You can use:

http://            # Match &#39;http://&#39;
\S+                # followed by one or more non-whitespace characters
\.(?:net|com|org)  # then either &#39;.net&#39;, &#39;.com&#39; or &#39;.org&#39;.

Try it on regex101.com.

huangapple
  • 本文由 发表于 2023年6月5日 13:46:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76403761.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定