正则表达式字符串替换在PowerShell中结束得太早。

huangapple go评论79阅读模式
英文:

Regex string replacement ends too soon in PowerShell

问题

我正在使用PowerShell(5.1版本),尝试在将字符串传递到REST API的URL之前对其进行编码。该API在某些情况下使用特定字符(例如冒号(:)表示“等于”),用于属性和其值之间的分隔。如果值部分中出现冒号(例如),则应替换为URL安全编码(%3A)。

以下是PowerShell代码示例,当字符串为filter=username:"user1@domain.com"时有效。@符号将被替换为%40。然而,如果字符串是filter=username:"user1@domain.com"|"user2@domain.com",那么只有第一个@会被替换。

$Filter = ''filter=username:"user1@domain.com"''
$pattern1 = ''[^a-zA-Z\d\s]'' # 匹配任何非字母数字或空白字符。
$pattern2 = ''(?:>:|<:|:|>|<|!:|:|~|!~|@)(?:")(.*?)(?:")'' # 允许我们替换过滤器中的字符。我们将保留一些字符,因为它们在API的某些位置使用。例如,":" 表示属性名称和值之间的等于,但应在对值部分进行替换。

$regex = [Regex]::new($pattern2)
$regex.Matches($Filter) | ForEach-Object {
    $Filter = $Filter -replace ([regex]::Escape($_.Groups[1].Value)), ([uri]::EscapeDataString($_.Groups[1].Value))
}

我在这里漏掉了什么?

英文:

I am using PowerShell (5.1) and am trying to encode a string before I pass it in the URL, to a REST API. The API users certain characters (e.g. a colon (:) means "equal") when used between a property and its value. If a colon (for example) appears in the value portion, then it should be replaced with the URL-safe encoding (%3A).

The following PowerShell works when the string is filter=username:"user1@domain.com". The @ symbol is replaced with %40. However, if the string is, filter=username:"user1@domain.com"|"user2@domain.com", then only the first @ is replaced.

The PowerShell snipet looks like this:

$Filter = 'filter=username:"user1@domain.com"'
$pattern1 = '[^a-zA-Z\d\s]' # Match any non-alpha numeric or white space character.
$pattern2 = '(?:>:|<:|:|>|<|!:|:|~|!~|@)(?:")(.*?)(?:")' # Allow us to replace characters in the filter. We will leave some of the characters alone, since they are used by the API in certain spots. For example, ":" means equal between the property name and value but should be replaced in the value portion of the pair.
$regex = [Regex]::new($pattern2)
$regex.Matches($Filter) | ForEach-Object {
    $Filter = $Filter -replace ([regex]::Escape($_.Groups[1].value)), ([uri]::EscapeDataString($_.Groups[1].value))
}

What am I missing here?

答案1

得分: 1

我对确切的要求不太清楚,但也许以下内容符合您的要求:

要转换的示例字符串。

$filter = 'filter=username:"user1@domain.com"|"user2@domain.com"'

$filter,
'(?<=[:|><~@]").*?(?=")',
{
param($m)
[Uri]::EscapeDataString($m.Value)
}
)


这依赖于所有属性值都被包含在`&quot;...&quot;`中(并且没有转义的嵌套`&quot;`),并且前面有以下任何一个字符:`: | &lt; &gt; ~ @`

结果(只有`&quot;...&quot;`子字符串中的属性值被URI转义):

filter=username:"user1%40domain.com"|"user2%40domain.com"


有关正则表达式的解释以及进行实验的能力,请参阅[此 regex101.com 页面](https://regex101.com/r/1PziwI/2)。

---

至于您**尝试过的内容**:

您的正则表达式唯一缺少的部分是`\|`片段,以便在属性值的开头`&quot;`(插入在`|@`之前)之前也匹配一个明文`|`:

(?:>:|<:|:|>|<|!:|:|~|!~|||@)(?:")(.*?)(?:")


请参阅[此 regex101.com 页面](https://regex101.com/r/OvjHmF/1)。

但是,请注意,您的正则表达式可以通过使用字符集(`[...]`)并删除冗余部分来简化:

(?::|><~@(.*?)(?:")


尽管在上面的解决方案中使用了简化形式,但是通过使用_前瞻断言_,因此不需要捕获组。
英文:

<!-- language-all: sh -->

I'm unclear on the exact requirements, but perhaps the following does what you want:

# Sample string to transform.
$filter = &#39;filter=username:&quot;user1@domain.com&quot;|&quot;user2@domain.com&quot;&#39; 

[regex]::Replace(
  $filter,
  &#39;(?&lt;=[:|&gt;&lt;~@]&quot;).*?(?=&quot;)&#39;,
  {
    param($m)
    [Uri]::EscapeDataString($m.Value)
  }
)

This relies on all property values being enclosed in &quot;...&quot; (and having no escaped embedded &quot;) and being preceded by any one of the following chars.: : | &lt; &gt; ~ @

Result (only the property values inside &quot;...&quot; substrings were URI-escaped):

filter=username:&quot;user1%40domain.com&quot;|&quot;user2%40domain.com&quot;

For an explanation of the regex and the ability to experiment with it, see this regex101.com page.


As for what you tried:

The only thing missing from your regex is a \| segment so as to also match a verbatim | before a property value's opening &quot; (inserted before |@):

(?:&gt;:|&lt;:|:|&gt;|&lt;|!:|:|~|!~|\||@)(?:&quot;)(.*?)(?:&quot;)

See this regex101.com page.

However, note that your regex can be simplified, by using character sets ([...]) and removing redundant parts:

(?:[:|&gt;&lt;~@](?:&quot;)(.*?)(?:&quot;)

This simplified form is used in the solution above, albeit via using lookaround assertions so that no capture group is necessary.

huangapple
  • 本文由 发表于 2023年6月5日 23:30:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76407935.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定