从字符串中提取一个满足条件的数字。

huangapple go评论94阅读模式
英文:

Extract one numbers from string with condition

问题

I will provide the translation of the code portion as requested:

  1. Public Function GetNumeric(CellRef As String)
  2. Dim StringLength As Long, i As Long, Result As Variant
  3. StringLength = Len(CellRef)
  4. For i = 1 To StringLength
  5. If IsNumeric(Mid(CellRef, i, 1)) Then
  6. Result = Result & Mid(CellRef, i, 1)
  7. End If
  8. Next i
  9. GetNumeric = Result
  10. End Function

Please note that I won't provide translations for the non-code content.

英文:

the below code extract all numbers from string and even combine them.
But I need to extract only one whole number with rules:
1- the number is one or two digits (plus the decimal part if it exsists).
2- if the number is followed by " or inch or in , then extract it and ignore rest of numbers in string.
3- if the above condition (2) is not found, then extract the first numbers and ignore rest of numbers in string.

Current String Expected Result
INSPECT-8'' Water 12 Pipe 8
INSPECT- 8.5" Water 12 8.5
INSPECT- 4 Water 5.5" 5.5
PM- 6.5 inch From H44 6.5
PM-36in Pipe M1T 36
PM-36 Pipe From M2T 36
PM-18"*12" Pipe From M1T 18
PM-36 From 5" M1T 5
PM-123 Pipe From MT
  1. Public Function GetNumeric(CellRef As String)
  2. Dim StringLength As Long, i As Long, Result As Variant
  3. StringLength = Len(CellRef)
  4. For i = 1 To StringLength
  5. If IsNumeric(Mid(CellRef, i, 1)) Then
  6. Result = Result & Mid(CellRef, i, 1)
  7. End If
  8. Next i
  9. GetNumeric = Result
  10. End Function

答案1

得分: 6

以下是您要翻译的代码部分:

  1. Public Function RegexExtract(str, pat, Optional gFlag As Boolean = False, Optional pos As Integer = 0, Optional cse as Boolean = True) As String
  2. Static RE As Object: If RE Is Nothing Then Set RE = CreateObject("vbscript.regexp")
  3. RE.Pattern = pat
  4. RE.Global = gFlag
  5. RE.IgnoreCase = cse
  6. If RE.Test(str) Then
  7. RegexExtract = RE.Execute(str)(pos)
  8. Else
  9. RegexExtract = vbNullString
  10. End If
  11. End Function
  1. =IFERROR(--RegexExtract(A1,"\b\d\d?(?!\d)(?=\s*(?:""|''|in(?:ch)?\b)?)),"")
  1. \b\d\d? - 一个字边界,后面跟着1个数字和一个可选的第二个数字;
  2. (?!\d) - 一个负向先行断言,用于确认后面没有更多的数字;
  3. (?=\s*(?:""|''|in(?:ch)?\b)?) - 一个正向先行断言,用于确认位置后面跟着0个或多个(贪婪)空白字符和:
  4. * "" - 一个双引号,或;
  5. * '' - 两个单引号,或;
  6. * in(?:ch)?\b - 字面上的'in',后面跟着可选的'ch'和一个字边界,以确保这些字母不是较大子字符串的一部分,以防误报。

如果您需要更多翻译,请告诉我。

英文:

Maybe create your own UDF making use of a regular expression. Perhaps something like:

  1. Public Function RegexExtract(str, pat, Optional gFlag As Boolean = False, Optional pos As Integer = 0, Optional cse as Boolean = True) As String
  2. Static RE As Object: If RE Is Nothing Then Set RE = CreateObject("vbscript.regexp")
  3. RE.Pattern = pat
  4. RE.Global = gFlag
  5. RE.IgnoreCase = cse
  6. If RE.Test(str) Then
  7. RegexExtract = RE.Execute(str)(pos)
  8. Else
  9. RegexExtract = vbNullString
  10. End If
  11. End Function

Note that I created an optional global flag that is false by default which should just pull the very 1st hit in a cell. The optional pos variable is there to return a certain match in case you wish the somehow return other numbers when you set the global flag to true. Also note the use of a case flag set to true to match case-insensitive by default.

You can call the above like:

  1. =IFERROR(--RegexExtract(A1,"\b\d\d?(?!\d)(?=\s*(?:""|''|in(?:ch)?\b)?)"),"")

The pattern used stands for:

  • \b\d\d? - A word-boundary with 1 digit and a 2nd optional one;
  • (?!\d) - A negative lookahead to assert no more digits;
  • (?=\s*(?:"|''|in(?:ch)?\b)?) - A positive lookahead to assert position is followed by 0+ (greedy) whitespace characters and:
    • " - A double quote, or;
    • '' - Two single quotes, or;
    • in(?:ch)?\b - Literally 'in' followed by optional 'ch' and a word-boundary to confirm the letters are not part of a larger substring to prevent false positives.

EDIT1:

As per OP's comments below; there are case where there could be a number of interest that is not at the 1st position. Since OP allows for a number without inches to also be matched, the addition here is to include a negative lookahead that will assert that there is no 2nd occurence of the valid pattern:

  1. \b\d\d?(?!\d|.*\b\d+\s*(?:""|''|in(?:ch)?\b))(?=\s*(?:""|''|in(?:ch)?\b)?)

I suppose this is implicitly the same as:

  1. \b\d\d?(?!\d|.*\b\d+\s*(?:""|''|in(?:ch)?\b))

EDIT2:

To allow for decimals you could include an optional non-capture group:

  1. =IFERROR(--RegexExtract(A2,"\b\d\d?(?:\.\d+)?(?!\d|.*\b\d+\s*(?:""|''|in(?:ch)?\b))"),"")

答案2

得分: 2

我认为答案已经包含在您提出的要求中。
您需要添加if语句作为检查来执行此操作,或者使用case语句。我建议在将其添加到结果之前定义一个额外的变量进行检查。将其定义为您正在循环遍历的字符串的字符。示例:

  1. Dim Check as String
  2. Check = Mid(CellRef, i, 1)
  3. If Check = Chr(34) then GetNumeric = Result

处理多个数字而没有进一步定义将产生不可解决的情况,例如,您倒数第二个示例没有指示1或36是否是正确的一个。

英文:

I think the answer already lays in your proposed requirements
You need to added if statement as checks to do this or a case statement, I recommend defining an extra variable to check before you added to results. Define this as the character of your string you are looping through. Example:

  1. Dim Check as string
  2. Check = Mid(CellRef, i, 1)
  3. If Check = Chr(34) then GetNumeric = Result

Dealing with multiple numbers without further definition will yield situations that are not solvable e.g you second to last example has no indication if the 1 or 36 is at the correct one.

答案3

得分: 2

以下是翻译好的部分:

这里有两种不需要使用正则表达式的解决方案。

第一种解决方案对字符串应用一系列转换,以便我们可以使用Split函数获取一系列字符串,其中一些将是数字。问题在于选择正确的转换,以便可以应用Split函数来分离数字。有时这可能是不可能的。

第二种解决方案只是解析字符串,直到提取了一系列数字字符,然后返回该数字字符串以供进一步处理。这可能是您的情况中最好的解决方案。

请注意,这两种解决方案都未经过边缘情况的测试。

考虑到您试图解析的似乎是自由文本,可能会有许多边缘情况。

英文:

Here are two solutions which do not require Regex.

The first solution applies a series of transforms to the string so that we can use Split to get a sequence of strings, some of which will be numbers. The issue here is choosing the correct transforms so that Split can be applied to isolate numbers. Sometime this may not be possible.

The second solution just parses the string until it has extracted a sequence of characters that are numeric and then returns that numeric string for further processing. This is likely the best solution in your case.

Note that neither solution has been tested for edge cases.

Given that you are trying to parse what appears to be freeform text there could be lots of edge cases.

  1. Sub Test()
  2. Dim myC As Collection
  3. Set myC = New Collection
  4. With myC
  5. .Add "INSPECT - 8" & Chr$(34) & " Water 12 Pipe 8"
  6. .Add "INSPECT- 18" & Chr$(34) & " Water 12 18"
  7. .Add "PM-6in Pipe From M37 st 6"
  8. .Add "PM- 6 inch Pipe From H44 6"
  9. .Add "PM-36 Pipe From M-1T 36"
  10. .Add "PM-123 Pipe From MT"
  11. End With
  12. Dim myItem As Variant
  13. Dim myNumber As Long
  14. For Each myItem In myC
  15. 'Option 1
  16. ' If TryGetFirstNumber(myItem, myNumber) Then
  17. ' Debug.Print myNumber
  18. ' End If
  19. ' option 2
  20. ' Debug.Print ParseFirstNumber(VBA.CStr(myItem))
  21. Next
  22. End Sub
  23. ' Pass ByVal so we don't alter the original string
  24. Public Function ApplyTransforms(ByVal ipString As String) As String
  25. ipString = VBA.LCase(ipString)
  26. ipString = VBA.Replace(ipString, "-", " ")
  27. ipString = VBA.Replace(ipString, VBA.Chr(34), " ")
  28. ipString = VBA.Replace(ipString, "in ", " ")
  29. ipString = VBA.Replace(ipString, "inch ", " ")
  30. ApplyTransforms = ipString
  31. End Function
  32. 'The try function indicates success by the returned boolean value, the result of the success is returned Byref in parameter opNumber
  33. Public Function TryGetFirstNumber(ByRef ipString As Variant, ByRef opNumber As Long, Optional ipLength As Long = 2) As Boolean
  34. Dim myArray As Variant
  35. myArray = VBA.Split(ApplyTransforms(ipString))
  36. Dim myItem As Variant
  37. For Each myItem In myArray
  38. If VBA.IsNumeric(myItem) Then
  39. If VBA.Len(myItem) <= ipLength Then
  40. opNumber = VBA.CLng(myItem)
  41. TryGetFirstNumber = True
  42. Exit Function
  43. End If
  44. End If
  45. Next
  46. TryGetFirstNumber = False
  47. End Function
  48. Public Function ParseFirstNumber(ByRef ipString As String) As String
  49. Dim myIndex As Long
  50. myIndex = 1
  51. Dim myLen As Long
  52. myLen = VBA.Len(ipString)
  53. Dim myNumber As String
  54. myNumber = vbNullString
  55. Do While myIndex <= myLen
  56. If VBA.InStr("0123456789", VBA.Mid$(ipString, myIndex, 1)) > 0 Then
  57. Exit Do
  58. End If
  59. myIndex = myIndex + 1
  60. Loop
  61. If myIndex > myLen Then
  62. ParseFirstNumber = myNumber
  63. Exit Function
  64. End If
  65. Do While VBA.InStr("0123456789", VBA.Mid$(ipString, myIndex, 1)) > 0
  66. myNumber = myNumber & VBA.Mid$(ipString, myIndex, 1)
  67. myIndex = myIndex + 1
  68. If myIndex > myLen Then
  69. ParseFirstNumber = myNumber
  70. Exit Function
  71. End If
  72. Loop
  73. ParseFirstNumber = myNumber
  74. End Function

答案4

得分: 2

请,(还要)测试下一种方法。它使用标准的VBA和一个数组来进行处理。处理后的数组内容将在代码末尾删除,因此即使对于大范围,它应该非常快。它假定要处理的范围从“A1”开始,工作表的第一行包含标题:

但是上面的代码只会处理您在问题中显示的模式字符串。例如,如果有更多不同目的的双引号字符(在前面有一个数字),代码只会处理找到的第一个。如果搜索的字符串在字符串中是第一个等等,可能会出现问题。可以根据需要进行调整以处理更多情况,但是我们无法猜测未显示的情况...

英文:

Please, (also) test the next way. It uses standard VBA and an array to be processed. The processed array content will be dropped at the end of the code, so, it should be very fast even for large ranges. It assumes that the range to be processed starts from "A1", headers existing on the sheet first row:

  1. Sub extractInchesNoFromAllRange()
  2. Dim sh As Worksheet, lastR As Long, arr, i As Long
  3. Dim dblQ As Long, sQ As Long, strIn As Long, No As String
  4. Set sh = ActiveSheet
  5. lastR = sh.Range("A" & sh.rows.count).End(xlUp).row
  6. arr = sh.Range("A2:B" & lastR).Value2
  7. For i = 1 To UBound(arr)
  8. dblQ = InStr(arr(i, 1), """") 'check if a double quote caracters exists and return its postition if it does
  9. sQ = InStr(arr(i, 1), "''") 'check if two simple quote caracters exists and return its postition if it does
  10. strIn = InStr(arr(i, 1), "in") 'the same as above for "in" string
  11. No = "" 'reinitialize the variable to keep the extracted number (as string...)
  12. If dblQ > 0 Or sQ > 0 Then 'if doble quote exists:
  13. If IsNumeric(Mid(arr(i, 1), IIf(dblQ > 0, dblQ, sQ) - 1, 1)) Then 'if a number exists before the quote ch
  14. No = Mid(arr(i, 1), IIf(dblQ > 0, dblQ, sQ) - 1, 1) 'extract first digit
  15. arr(i, 2) = extractNo(IIf(dblQ > 0, dblQ, sQ) - 2, CStr(arr(i, 1)), No, True) 'call the function which make extraction by (backward) iteration
  16. End If
  17. ElseIf strIn > 0 Then 'if "in" exists:
  18. If Mid(arr(i, 1), strIn + 2, 1) = " " Or Mid(arr(i, 1), strIn + 2, 2) = "ch" Or strIn + 1 = Len(arr(i, 1)) Then
  19. If Mid(arr(i, 1), strIn - 1, 1) = " " Then
  20. arr(i, 2) = extractNo(strIn - 2, CStr(arr(i, 1)), No, True)
  21. Else
  22. arr(i, 2) = extractNo(strIn - 1, CStr(arr(i, 1)), No, True)
  23. End If
  24. End If
  25. Else
  26. arr(i, 2) = extractNo(0, CStr(arr(i, 1)), "")
  27. End If
  28. Next i
  29. 'drop the processed arran content back in its range:
  30. sh.Range("A2").Resize(UBound(arr), UBound(arr, 2)).Value2 = arr
  31. End Sub
  32. Function extractNo(pos As Long, str As String, No As String, Optional boolChar = False) As Variant
  33. Dim i As Long, boolNo As Boolean
  34. On Error GoTo WrongPatt
  35. If boolChar Then 'if one of the searched characters has been found:
  36. For i = pos To 1 Step -1
  37. If IsNumeric(Mid(str, i, 1)) Or Mid(str, i, 1) = "." Then
  38. No = CStr(Mid(str, i, 1)) & No
  39. Else
  40. extractNo = CDbl(No): Exit For
  41. End If
  42. Next i
  43. Else 'if no searched string has been found:
  44. For i = 1 To Len(str)
  45. If IsNumeric(Mid(str, i, 1)) Then
  46. boolNo = True
  47. No = No & Mid(str, i, 1)
  48. Else
  49. If boolNo Then Exit For
  50. End If
  51. Next i
  52. If Len(No) <= 2 And No <> "" Then
  53. extractNo = CLng(No)
  54. Else
  55. extractNo = ""
  56. End If
  57. End If
  58. Exit Function
  59. WrongPatt:
  60. extractNo = "Wrong pttern"
  61. End Function

But the above code will process only the pattern string you show in your question. If, for instance, there will be more double quotes characters, with a different purpose **before the one having a number in front of it), the code will process only the first found. It may have problems if the searched strings are the first in the string and so on... It can be adapted to deal with more conditions, but we here are not the mind readers to cover such not shown cases...

huangapple
  • 本文由 发表于 2023年4月11日 16:42:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75983960.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定