大小写不敏感的正则表达式 – VBA

huangapple go评论68阅读模式
英文:

Case-insensitive Regular Expression - VBA

问题

以下是翻译好的部分:

背景:

刚刚,我正在回答一个问题,同时在VBA中使用RegEx进行实验。目标是在字符串中创建一个存在于字符串中的名称列表。由于我们希望避免VBA遇到标点符号和看起来相似的子字符串,例如JackJacky,因此RegEx是首选解决方案。

示例数据:

让我给出一个简单的示例。想象我们有一个字符串:

Dim str As String: str = "Jack's turn, Becky's or Frank?"

我们想知道在字符串中提到了哪些在某个数组中的名称,例如:

Dim arr As Variant: arr = Array("Jack", "Frank")

示例代码:

为了避免对数组进行迭代,我采用了以下代码:

Sub Test()

Dim str As String: str = "Jack's turn, Becky's or Frank?"
Dim arr As Variant: arr = Array("Jack", "Frank", "Beth")
Dim regex As Object: Set regex = CreateObject("VBScript.RegExp")

regex.Pattern = "\b(" & Join(arr, "|") & ")\b"
regex.Global = True

Set hits = regex.Execute(str)
For Each hit In hits
    Debug.Print hit
Next hit

End Sub

问题:

尽管上述代码会很好地返回两个命中项,但它不会对大小写不敏感。例如,更改以下行将只返回Jack

Dim str As String: str = "Jack's turn, Becky's or frank?"

我认为可以通过使用(?i)来关闭大小写敏感性来解决这个问题:

regex.Pattern = "(?i)\b(" & Join(arr, "|") & ")\b"

但问题是,这对大多数语言来说都可以很好地工作(在此处测试),但是VBA似乎对此有问题,并且在执行时生成Error 5017

问题:

有谁知道为什么吗?VBA不支持这个吗,还是我的语法有问题?如果不支持,有什么替代方法可以使命中项不区分大小写,同时保留Join名称数组的可能性?

额外问题:

最终,我想通过分隔符Join这些Hits,例如:

Debug.Print Join(regex.Execute(str), ", ")

然而,我意识到执行会返回一个集合,需要首先进行迭代,我想避免这种情况。

英文:

<sub>Background:</sub>

Just now, I was answering a question and was playing around with RegEx within VBA. The goal is to create a list of names that exist within a string. RegEx was the go-to solution since we want to prevent VBA to stumble over punctuation marks and substrings that look similar e.g.: Jack or Jacky.


<sub>Sample Data:</sub>

Let me give a simple sample. Imagine we have a string like:

Dim str As String: str = &quot;Jack&#39;s turn, Becky&#39;s or Frank?&quot;

we want to know which names in a certain array are mentioned within the string, for example:

Dim arr As Variant: arr = Array(&quot;Jack&quot;, &quot;Frank&quot;)

<sub>Sample Code:</sub>

To prevent an iteration over the array, I went with the following code:

Sub Test()

Dim str As String: str = &quot;Jack&#39;s turn, Becky&#39;s or Frank?&quot;
Dim arr As Variant: arr = Array(&quot;Jack&quot;, &quot;Frank&quot;, &quot;Beth&quot;)
Dim regex As Object: Set regex = CreateObject(&quot;VBScript.RegExp&quot;)

regex.Pattern = &quot;\b(&quot; &amp; Join(arr, &quot;|&quot;) &amp; &quot;)\b&quot;
regex.Global = True

Set hits = regex.Execute(str)
For Each hit In hits
    Debug.Print hit
Next hit

End Sub

<sub>Problem:</sub>

Whereas the above would neatly return the two hits, it would not work case-insensitive. For example, changing the following line will only return Jack:

Dim str As String: str = &quot;Jack&#39;s turn, Becky&#39;s or frank?&quot;

I thought I could counter that by turning off case-sensitivity using (?i):

regex.Pattern = &quot;(?i)\b(&quot; &amp; Join(arr, &quot;|&quot;) &amp; &quot;)\b&quot;

But the problem is that this would work perfectly for most languages (test here), however VBA seems to have a problem with it and generates an Error 5017 upon execution.


<sub>Question:</sub>

Does anybody know why? Is this not supported within VBA or is my syntax wrong? If not supported, what is the alternative to get hits case-insensitive while retaining the possibility to Join the array of names?

<sub>Bonus-Question:</sub>

Ultimately I would like to Join the Hits together through a delimiter, for example like:

Debug.Print Join(regex.Execute(str),&quot;, &quot;)

However, I realized execution returns a collection and needs iteration first which I would like to avoid.

答案1

得分: 14

将正则表达式对象上的属性设置为:

regex.ignorecase = True

正则表达式对象

正则表达式对象有三个属性影响正则表达式如何应用:

IgnoreCase:我相信这是不区分大小写的。

Global:确定是否在输入字符串中查找所有可能的匹配项。如果Global设置为false,则只会找到或替换第一个匹配项(视情况而定)。

MultiLine:确定匹配是否可以跨行。

另请参阅:https://learn.microsoft.com/en-us/previous-versions//1400241x%28v%3dvs.85%29

以及来自 https://www.regular-expressions.info/vbscript.html 的有关VBScript的正则表达式支持,因为我们使用的是Microsoft VBScript Regular Expressions 5.5

没有模式修饰符来设置正则表达式内的匹配选项。

英文:

Set the property on the RegExp object i.e.

regex.ignorecase = True

RegExp object

> The RegExp object has three properties that affect how regular
> expressions are applied:
>
> IgnoreCase: I trust this is self-explanatory
>
> Global: Determines
> whether or not to find all possible matches in the input string. If
> Global is set to false, only the first match will be found or
> replaced, as applicable.
>
> MultiLine: Determines whether matches can
> span accross line breaks.

See also: https://learn.microsoft.com/en-us/previous-versions//1400241x%28v%3dvs.85%29

And from https://www.regular-expressions.info/vbscript.html regarding VBScript’s Regular Expression Support as we are using Microsoft VBScript Regular Expressions 5.5

> No mode modifiers to set matching options within the regular expression.

huangapple
  • 本文由 发表于 2020年1月3日 20:54:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/59579008.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定