确定在“类似”比较中模式的功能长度

huangapple go评论73阅读模式
英文:

Determining the functional length of a pattern in a "Like" comparison

问题

有没有办法确定在“Like”比较中模式的功能长度?

例如,“[A-Z]”的功能长度为1,但Len("[A-Z]")会得到答案5(因为字符串本身长度为五个字符)。

我认为能够确定模式的功能长度将使查找字符串的方式更灵活。

我有可用的代码,但它依赖于要搜索的字符串具有空格,并且要找到的子字符串两侧都有空格。

这是我尝试的代码;它不起作用,因为len(pattern)比我试图找到的模式长 - 在我的特定示例中,模式是#####[A-Z][A-Z],以查找(例如)11111AA22222BB - 五个数字后跟两个大写字母。

Function ExtractMatch(Source As String, Pattern As String) As String
' 提取与给定模式匹配的子字符串

    Dim i As Long, lenP As Long, Test As String
    lenP = Len(Pattern)
        For i = 1 To Len(Source) - lenP
            Test = Mid(Source, i, lenP)
            Debug.Print Test
            If Test Like Pattern Then
            ExtractMatch = Test
        Exit For
    End If
    Next
    
End Function

背景信息:我正在尝试从手工输入的某处提取报价编号。示例数据和期望输出 - 第5行当前未捕获,因为有一个拼写错误,并且在引号编号之前没有空格。

英文:

Is there a way to determine the functional length of a pattern in a "Like" comparison?

For example, "[A-Z]" has a functional length of 1, but Len("[A-Z]") gives the answer 5 (because the string itself is five characters long).

I think being able to determine the functional length of the pattern would enable a more flexible way of looking through the string.

I have working code but it relies on the string being searched having spaces, and the substring to be found having a space on either side.

This is the code I tried; it doesn't work, because len(pattern) is longer than the pattern I'm trying to find - in my specific example the pattern is #####[A-Z][A-Z], to find (for example) 11111AA or 22222BB - five numbers followed by two upper case letters.

Function ExtractMatch(Source As String, Pattern As String) As String
' extracts a substring matching the given pattern

    Dim i As Long, lenP As Long, Test As String
    lenP = Len(Pattern)
        For i = 1 To Len(Source) - lenP
            Test = Mid(Source, i, lenP)
            Debug.Print Test
            If Test Like Pattern Then
            ExtractMatch = Test
        Exit For
    End If
    Next
    
End Function

For context I am trying to extract a quote number from somewhere in a hand-typed entry. Sample data and desired output - line 5 is currently not captured as there's a typo, and no space before the quote number.

Sample input/output

答案1

得分: 3

以下是已翻译的内容:

假设在模式中没有 *,我们可以使用以下函数获取模式的长度:

Function getPatterLength(ByVal Pattern As String) As Long
    If InStr(Pattern, ""*"") > 0 Then
        getPatterLength = -1
        Exit Function
    End If

    Dim Parts() As String
    Parts = Split(Pattern, ""[")

    Dim i As Long
    For i = 0 To UBound(Parts)
        If InStr(Parts(i), ""]"") > 0 Then
            Parts(i) = Mid(Parts(i), InStr(Parts(i), ""]""))
        End If
    Next

    getPatterLength = Len(Join(Parts, """"))
End Function

为了提取函数添加一个参数来表示模式长度会更容易:

Function ExtractMatch(Source As String, Pattern As String, PatterLength as long) As String
    ' 提取与给定模式匹配的子字符串

    Dim i As Long, Test As String
    For i = 1 To Len(Source) - PatterLength
        Test = Mid(Source, i, PatterLength)
        Debug.Print Test
        If Test Like Pattern Then
            ExtractMatch = Test
            Exit Function
        End If

        Exit For
    End If
    Next
End Function
英文:

Assuming that there are no * in the pattern we can get the length of the pattern using this function:

Function getPatterLength(ByVal Pattern As String) As Long
    If InStr(Pattern, "*") > 0 Then
        getPatterLength = -1
        Exit Function
    End If

    Dim Parts() As String
    Parts = Split(Pattern, "[")

    Dim i As Long
    For i = 0 To UBound(Parts)
        If InStr(Parts(i), "]") > 0 Then
            Parts(i) = Mid(Parts(i), InStr(Parts(i), "]"))
        End If
    Next

    getPatterLength = Len(Join(Parts, ""))
End Function

It would be easier to add a parameter to the extract function for the pattern length.

Function ExtractMatch(Source As String, Pattern As String, PatterLength as long) As String
' extracts a substring matching the given pattern

    Dim i As Long, Test As String
        For i = 1 To Len(Source) - PatterLength
            Test = Mid(Source, i, PatterLength)
            Debug.Print Test
            If Test Like Pattern Then
            ExtractMatch = Test
            Exit Function
        End If

        Exit For
    End If
    Next
    
End Function

答案2

得分: 3

在单元格B2中的公式:

=MAP(A2:A5,LAMBDA(x,FILTERXML("<t><s>"&TEXTJOIN("</s><s>",,MID(x,SEQUENCE(LEN(x)),7))&"</s></t>","//s[string-length(normalize-space())=7][substring(.,1,5)*0=0][translate(substring(.,6),'ABCDEFGHIJKLMNOPQRSTUVWXYZ','')=''][1]"))

XPath的拆解与逻辑:

  • [string-length(normalize-space())=7] - 我们需要测试节点的长度是否等于7个字符(经过规范化:由于第3个谓词的工作原理,这很重要)。在正则表达式中的表示为:^.{7}$
  • [substring(.,1,5)*0=0] - 测试节点的前5个字符是否为数字。在正则表达式中的表示为:^\d{5}
  • [translate(substring(.,6),'ABCDEFGHIJKLMNOPQRSTUVWXYZ','')=''] - 确保最后两个字符是大写字母。在正则表达式中的表示为:[A-Z]{2}$
  • [1] - 从过滤后的节点列表中获取第一个节点。
英文:

Thought it would be fun (for some at least) to see if this can be done through FILTERXML():

确定在“类似”比较中模式的功能长度

Formula in B2:

=MAP(A2:A5,LAMBDA(x,FILTERXML("<t><s>"&TEXTJOIN("</s><s>",,MID(x,SEQUENCE(LEN(x)),7))&"</s></t>","//s[string-length(normalize-space())=7][substring(.,1,5)*0=0][translate(substring(.,6),'ABCDEFGHIJKLMNOPQRSTUVWXYZ','')=''][1]")))

The deconstructed xpath anding predicates:

  • [string-length(normalize-space())=7] - We need to test if the node's length equals 7 characters (after normalization: This is important due to the working of the 3rd predicate). In regex terms: ^.{7}$;

  • [substring(.,1,5)*0=0] - Test that the first 5 characters of the node are numeric. In regex terms: ^\d{5};

  • [translate(substring(.,6),'ABCDEFGHIJKLMNOPQRSTUVWXYZ','')=''] - Assert that the last two characters are uppercase alpha chars. In regex terms: [A-Z]{2}$;

  • [1] - Retrieve the 1st node from resulting filtered node-list.

答案3

得分: 2

您可以使用正则表达式来提取子匹配,如下所示:

Public Sub Example()
    Debug.Print ExtractSubstring("Quote number 11111AA refers", "[0-9]{5}[A-Z]{2}")
End Sub

Public Function ExtractSubstring(ByVal InputString As String, ByVal Pattern As String) As String
    Dim RetVal As String
    Dim AllMatches As Object
    Dim RegEx As Object
    Set RegEx = CreateObject("vbscript.regexp")
    
    With RegEx
        .Pattern = Pattern
        .Global = True
        .IgnoreCase = True  ' 如果需要区分大小写,请更改此选项
    End With
    Set AllMatches = RegEx.Execute(InputString)
    
    If AllMatches.Count <> 0 Then
        RetVal = AllMatches.Item(0)
    End If
    
    ExtractSubstring = RetVal
End Function

或者直接像用户自定义函数(UDF)一样使用:

=ExtractSubstring(A2, "[0-9]{5}[A-Z]{2}")

请注意,如果存在更多与此模式匹配的项,这将仅返回第一个匹配项。

英文:

You can use Regular Expressions to extract a submatch like below:

Public Sub Example()
    Debug.Print ExtractSubstring(&quot;Quote number 11111AA refers&quot;, &quot;[0-9]{5}[A-Z]{2}&quot;)
End Sub

Public Function ExtractSubstring(ByVal InputString As String, ByVal Pattern As String) As String
    Dim RetVal As String
    Dim AllMatches As Object
    Dim RegEx As Object
    Set RegEx = CreateObject(&quot;vbscript.regexp&quot;)
    
    With RegEx
        .Pattern = Pattern
        .Global = True
        .IgnoreCase = True  &#39; change this if you want case sensitivity
    End With
    Set AllMatches = RegEx.Execute(InputString)
    
    If AllMatches.Count &lt;&gt; 0 Then
        RetVal = AllMatches.Item(0)
    End If
    
    ExtractSubstring = RetVal
End Function

Or even use it directly like a UDF (User Defined Function)

=ExtractSubstring(A2,&quot;[0-9]{5}[A-Z]{2}&quot;)

确定在“类似”比较中模式的功能长度

Note this will only return the first match if there are more matches with this pattern.

答案4

得分: 0

为什么不使用正则表达式:

Public Sub test()

Dim pattern As String
pattern = &quot;\d{5}[A-Z]{2}&quot;

Debug.Print regex_PatternMatch(pattern, &quot;11111AA&quot;)  &#39;将返回true
Debug.Print regex_PatternMatch(pattern, &quot;111AA&quot;)   &#39;将返回false

End Sub

Public Function regex_PatternMatch(pattern As String, textToCheck As String) As Boolean
    Dim regEx As New RegExp
    With regEx
        .pattern = pattern
        regex_PatternMatch = .test(textToCheck)
    End With
End Function

您需要添加对VB脚本正则表达式的引用。

英文:

Why don't you use regular expressions:

Public Sub test()

Dim pattern As String
pattern = &quot;\d{5}[A-Z]{2}&quot;

Debug.Print regex_PatternMatch(pattern, &quot;11111AA&quot;)  &#39;will return true
Debug.Print regex_PatternMatch(pattern, &quot;111AA&quot;)   &#39;will return false

End Sub

Public Function regex_PatternMatch(pattern As String, textToCheck As String) As Boolean
    Dim regEx As New RegExp
    With regEx
        .pattern = pattern
        regex_PatternMatch = .test(textToCheck)
    End With
End Function

You have to add a reference to VB Script regular expressions

确定在“类似”比较中模式的功能长度

huangapple
  • 本文由 发表于 2023年6月22日 15:58:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76529713.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定