从字符串中提取一个满足条件的数字。

huangapple go评论62阅读模式
英文:

Extract one numbers from string with condition

问题

I will provide the translation of the code portion as requested:

    Public Function GetNumeric(CellRef As String)
        Dim StringLength As Long, i As Long, Result As Variant
        StringLength = Len(CellRef)
        For i = 1 To StringLength
          If IsNumeric(Mid(CellRef, i, 1)) Then
             Result = Result & Mid(CellRef, i, 1)
          End If
        Next i
        GetNumeric = Result
    End Function

Please note that I won't provide translations for the non-code content.

英文:

the below code extract all numbers from string and even combine them.
But I need to extract only one whole number with rules:
1- the number is one or two digits (plus the decimal part if it exsists).
2- if the number is followed by " or inch or in , then extract it and ignore rest of numbers in string.
3- if the above condition (2) is not found, then extract the first numbers and ignore rest of numbers in string.

Current String Expected Result
INSPECT-8'' Water 12 Pipe 8
INSPECT- 8.5" Water 12 8.5
INSPECT- 4 Water 5.5" 5.5
PM- 6.5 inch From H44 6.5
PM-36in Pipe M1T 36
PM-36 Pipe From M2T 36
PM-18"*12" Pipe From M1T 18
PM-36 From 5" M1T 5
PM-123 Pipe From MT
Public Function GetNumeric(CellRef As String)
    Dim StringLength As Long, i As Long, Result As Variant
    StringLength = Len(CellRef)
    For i = 1 To StringLength
      If IsNumeric(Mid(CellRef, i, 1)) Then
         Result = Result & Mid(CellRef, i, 1)
      End If
    Next i
    GetNumeric = Result
End Function

答案1

得分: 6

以下是您要翻译的代码部分:

Public Function RegexExtract(str, pat, Optional gFlag As Boolean = False, Optional pos As Integer = 0, Optional cse as Boolean = True) As String
    
    Static RE As Object: If RE Is Nothing Then Set RE = CreateObject("vbscript.regexp")
    
    RE.Pattern = pat
    RE.Global = gFlag
    RE.IgnoreCase = cse
    
    If RE.Test(str) Then
        RegexExtract = RE.Execute(str)(pos)
    Else
        RegexExtract = vbNullString
    End If
    
End Function
=IFERROR(--RegexExtract(A1,"\b\d\d?(?!\d)(?=\s*(?:""|''|in(?:ch)?\b)?)),"")
\b\d\d? - 一个字边界,后面跟着1个数字和一个可选的第二个数字;
(?!\d) - 一个负向先行断言,用于确认后面没有更多的数字;
(?=\s*(?:""|''|in(?:ch)?\b)?) - 一个正向先行断言,用于确认位置后面跟着0个或多个(贪婪)空白字符和:
  * "" - 一个双引号,或;
  * '' - 两个单引号,或;
  * in(?:ch)?\b - 字面上的'in',后面跟着可选的'ch'和一个字边界,以确保这些字母不是较大子字符串的一部分,以防误报。

如果您需要更多翻译,请告诉我。

英文:

Maybe create your own UDF making use of a regular expression. Perhaps something like:

Public Function RegexExtract(str, pat, Optional gFlag As Boolean = False, Optional pos As Integer = 0, Optional cse as Boolean = True) As String

Static RE As Object: If RE Is Nothing Then Set RE = CreateObject("vbscript.regexp")

RE.Pattern = pat
RE.Global = gFlag
RE.IgnoreCase = cse

If RE.Test(str) Then
    RegexExtract = RE.Execute(str)(pos)
Else
    RegexExtract = vbNullString
End If

End Function

Note that I created an optional global flag that is false by default which should just pull the very 1st hit in a cell. The optional pos variable is there to return a certain match in case you wish the somehow return other numbers when you set the global flag to true. Also note the use of a case flag set to true to match case-insensitive by default.

You can call the above like:

=IFERROR(--RegexExtract(A1,"\b\d\d?(?!\d)(?=\s*(?:""|''|in(?:ch)?\b)?)"),"")

The pattern used stands for:

  • \b\d\d? - A word-boundary with 1 digit and a 2nd optional one;
  • (?!\d) - A negative lookahead to assert no more digits;
  • (?=\s*(?:"|''|in(?:ch)?\b)?) - A positive lookahead to assert position is followed by 0+ (greedy) whitespace characters and:
    • " - A double quote, or;
    • '' - Two single quotes, or;
    • in(?:ch)?\b - Literally 'in' followed by optional 'ch' and a word-boundary to confirm the letters are not part of a larger substring to prevent false positives.

EDIT1:

As per OP's comments below; there are case where there could be a number of interest that is not at the 1st position. Since OP allows for a number without inches to also be matched, the addition here is to include a negative lookahead that will assert that there is no 2nd occurence of the valid pattern:

\b\d\d?(?!\d|.*\b\d+\s*(?:""|''|in(?:ch)?\b))(?=\s*(?:""|''|in(?:ch)?\b)?)

I suppose this is implicitly the same as:

\b\d\d?(?!\d|.*\b\d+\s*(?:""|''|in(?:ch)?\b))

EDIT2:

To allow for decimals you could include an optional non-capture group:

=IFERROR(--RegexExtract(A2,"\b\d\d?(?:\.\d+)?(?!\d|.*\b\d+\s*(?:""|''|in(?:ch)?\b))"),"")

答案2

得分: 2

我认为答案已经包含在您提出的要求中。
您需要添加if语句作为检查来执行此操作,或者使用case语句。我建议在将其添加到结果之前定义一个额外的变量进行检查。将其定义为您正在循环遍历的字符串的字符。示例:

Dim Check as String

Check = Mid(CellRef, i, 1)
If Check = Chr(34) then GetNumeric = Result

处理多个数字而没有进一步定义将产生不可解决的情况,例如,您倒数第二个示例没有指示1或36是否是正确的一个。

英文:

I think the answer already lays in your proposed requirements
You need to added if statement as checks to do this or a case statement, I recommend defining an extra variable to check before you added to results. Define this as the character of your string you are looping through. Example:

Dim Check as string

Check = Mid(CellRef, i, 1)
If Check =  Chr(34) then GetNumeric = Result    

Dealing with multiple numbers without further definition will yield situations that are not solvable e.g you second to last example has no indication if the 1 or 36 is at the correct one.

答案3

得分: 2

以下是翻译好的部分:

这里有两种不需要使用正则表达式的解决方案。

第一种解决方案对字符串应用一系列转换,以便我们可以使用Split函数获取一系列字符串,其中一些将是数字。问题在于选择正确的转换,以便可以应用Split函数来分离数字。有时这可能是不可能的。

第二种解决方案只是解析字符串,直到提取了一系列数字字符,然后返回该数字字符串以供进一步处理。这可能是您的情况中最好的解决方案。

请注意,这两种解决方案都未经过边缘情况的测试。

考虑到您试图解析的似乎是自由文本,可能会有许多边缘情况。

英文:

Here are two solutions which do not require Regex.

The first solution applies a series of transforms to the string so that we can use Split to get a sequence of strings, some of which will be numbers. The issue here is choosing the correct transforms so that Split can be applied to isolate numbers. Sometime this may not be possible.

The second solution just parses the string until it has extracted a sequence of characters that are numeric and then returns that numeric string for further processing. This is likely the best solution in your case.

Note that neither solution has been tested for edge cases.

Given that you are trying to parse what appears to be freeform text there could be lots of edge cases.

Sub Test()

    Dim myC As Collection
    Set myC = New Collection
    With myC
    
        .Add "INSPECT - 8" & Chr$(34) & " Water 12 Pipe    8"
        .Add "INSPECT- 18" & Chr$(34) & " Water 12   18"
        .Add "PM-6in Pipe From M37 st 6"
        .Add "PM- 6 inch Pipe From H44    6"
        .Add "PM-36 Pipe From M-1T    36"
        .Add "PM-123 Pipe From MT"
        
    End With
    

    Dim myItem As Variant
    Dim myNumber As Long
    For Each myItem In myC
    
    'Option 1
'        If TryGetFirstNumber(myItem, myNumber) Then
'            Debug.Print myNumber
'        End If
     ' option 2
'        Debug.Print ParseFirstNumber(VBA.CStr(myItem))
    Next
    
End Sub

' Pass ByVal so we don't alter the original string
Public Function ApplyTransforms(ByVal ipString As String) As String

    ipString = VBA.LCase(ipString)
    ipString = VBA.Replace(ipString, "-", " ")
    ipString = VBA.Replace(ipString, VBA.Chr(34), " ")
    ipString = VBA.Replace(ipString, "in ", " ")
    ipString = VBA.Replace(ipString, "inch ", " ")
    
    ApplyTransforms = ipString
    
End Function

'The try function indicates success by the returned boolean value, the result of the success is returned Byref in parameter opNumber
Public Function TryGetFirstNumber(ByRef ipString As Variant, ByRef opNumber As Long, Optional ipLength As Long = 2) As Boolean

    
    Dim myArray As Variant
    myArray = VBA.Split(ApplyTransforms(ipString))
    
    Dim myItem As Variant
    For Each myItem In myArray
    
        If VBA.IsNumeric(myItem) Then
        
            If VBA.Len(myItem) <= ipLength Then
            
                opNumber = VBA.CLng(myItem)
                TryGetFirstNumber = True
                Exit Function
                
            End If
            
        End If
            
    Next
    
    TryGetFirstNumber = False
    
End Function


Public Function ParseFirstNumber(ByRef ipString As String) As String

    Dim myIndex As Long
    myIndex = 1
    
    Dim myLen As Long
    myLen = VBA.Len(ipString)
    
    Dim myNumber As String
    myNumber = vbNullString

    Do While myIndex <= myLen
        If VBA.InStr("0123456789", VBA.Mid$(ipString, myIndex, 1)) > 0 Then
            Exit Do
        End If
        myIndex = myIndex + 1
    Loop

    If myIndex > myLen Then
        ParseFirstNumber = myNumber
        Exit Function
    End If

    Do While VBA.InStr("0123456789", VBA.Mid$(ipString, myIndex, 1)) > 0
        myNumber = myNumber & VBA.Mid$(ipString, myIndex, 1)
        myIndex = myIndex + 1
        If myIndex > myLen Then
            ParseFirstNumber = myNumber
            Exit Function
        End If
    Loop
    
    ParseFirstNumber = myNumber
    
End Function

答案4

得分: 2

请,(还要)测试下一种方法。它使用标准的VBA和一个数组来进行处理。处理后的数组内容将在代码末尾删除,因此即使对于大范围,它应该非常快。它假定要处理的范围从“A1”开始,工作表的第一行包含标题:

但是上面的代码只会处理您在问题中显示的模式字符串。例如,如果有更多不同目的的双引号字符(在前面有一个数字),代码只会处理找到的第一个。如果搜索的字符串在字符串中是第一个等等,可能会出现问题。可以根据需要进行调整以处理更多情况,但是我们无法猜测未显示的情况...

英文:

Please, (also) test the next way. It uses standard VBA and an array to be processed. The processed array content will be dropped at the end of the code, so, it should be very fast even for large ranges. It assumes that the range to be processed starts from "A1", headers existing on the sheet first row:

Sub extractInchesNoFromAllRange()
   Dim sh As Worksheet, lastR As Long, arr, i As Long
   Dim dblQ As Long, sQ As Long, strIn As Long, No As String
   
   Set sh = ActiveSheet
   lastR = sh.Range("A" & sh.rows.count).End(xlUp).row
   
   arr = sh.Range("A2:B" & lastR).Value2
   
   For i = 1 To UBound(arr)
        dblQ = InStr(arr(i, 1), """") 'check if a double quote caracters exists and return its postition if it does
        sQ = InStr(arr(i, 1), "''")      'check if two simple quote caracters exists and return its postition if it does
        strIn = InStr(arr(i, 1), "in") 'the same as above for "in" string
        
        No = "" 'reinitialize the variable to keep the extracted number (as string...)
        If dblQ > 0 Or sQ > 0 Then 'if doble quote exists:
            If IsNumeric(Mid(arr(i, 1), IIf(dblQ > 0, dblQ, sQ) - 1, 1)) Then 'if a number exists before the quote ch
                No = Mid(arr(i, 1), IIf(dblQ > 0, dblQ, sQ) - 1, 1)                      'extract first digit
                arr(i, 2) = extractNo(IIf(dblQ > 0, dblQ, sQ) - 2, CStr(arr(i, 1)), No, True) 'call the function which make extraction by (backward) iteration
            End If
        ElseIf strIn > 0 Then 'if "in" exists:
            If Mid(arr(i, 1), strIn + 2, 1) = " " Or Mid(arr(i, 1), strIn + 2, 2) = "ch" Or strIn + 1 = Len(arr(i, 1)) Then
                If Mid(arr(i, 1), strIn - 1, 1) = " " Then
                    arr(i, 2) = extractNo(strIn - 2, CStr(arr(i, 1)), No, True)
                Else
                    arr(i, 2) = extractNo(strIn - 1, CStr(arr(i, 1)), No, True)
                End If
            End If
        Else
            arr(i, 2) = extractNo(0, CStr(arr(i, 1)), "")
        End If
   Next i
   
   'drop the processed arran content back in its range:
   sh.Range("A2").Resize(UBound(arr), UBound(arr, 2)).Value2 = arr
End Sub

Function extractNo(pos As Long, str As String, No As String, Optional boolChar = False) As Variant
   Dim i As Long, boolNo As Boolean
   
   On Error GoTo WrongPatt
   If boolChar Then 'if one of the searched characters has been found:
        For i = pos To 1 Step -1
            If IsNumeric(Mid(str, i, 1)) Or Mid(str, i, 1) = "." Then
                No = CStr(Mid(str, i, 1)) & No
            Else
                extractNo = CDbl(No): Exit For
            End If
        Next i
  Else 'if no searched string has been found:
        For i = 1 To Len(str)
            If IsNumeric(Mid(str, i, 1)) Then
                boolNo = True
                No = No & Mid(str, i, 1)
            Else
                If boolNo Then Exit For
            End If
        Next i
        If Len(No) <= 2 And No <> "" Then
                extractNo = CLng(No)
        Else
                extractNo = ""
        End If
  End If
  Exit Function
  
WrongPatt:
   extractNo = "Wrong pttern"
End Function

But the above code will process only the pattern string you show in your question. If, for instance, there will be more double quotes characters, with a different purpose **before the one having a number in front of it), the code will process only the first found. It may have problems if the searched strings are the first in the string and so on... It can be adapted to deal with more conditions, but we here are not the mind readers to cover such not shown cases...

huangapple
  • 本文由 发表于 2023年4月11日 16:42:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75983960.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定