英文:
Regex string replacement ends too soon in PowerShell
问题
我正在使用PowerShell(5.1版本),尝试在将字符串传递到REST API的URL之前对其进行编码。该API在某些情况下使用特定字符(例如冒号(:)表示“等于”),用于属性和其值之间的分隔。如果值部分中出现冒号(例如),则应替换为URL安全编码(%3A)。
以下是PowerShell代码示例,当字符串为filter=username:"user1@domain.com"时有效。@符号将被替换为%40。然而,如果字符串是filter=username:"user1@domain.com"|"user2@domain.com",那么只有第一个@会被替换。
$Filter = ''filter=username:"user1@domain.com"''
$pattern1 = ''[^a-zA-Z\d\s]'' # 匹配任何非字母数字或空白字符。
$pattern2 = ''(?:>:|<:|:|>|<|!:|:|~|!~|@)(?:")(.*?)(?:")'' # 允许我们替换过滤器中的字符。我们将保留一些字符,因为它们在API的某些位置使用。例如,":" 表示属性名称和值之间的等于,但应在对值部分进行替换。
$regex = [Regex]::new($pattern2)
$regex.Matches($Filter) | ForEach-Object {
$Filter = $Filter -replace ([regex]::Escape($_.Groups[1].Value)), ([uri]::EscapeDataString($_.Groups[1].Value))
}
我在这里漏掉了什么?
英文:
I am using PowerShell (5.1) and am trying to encode a string before I pass it in the URL, to a REST API. The API users certain characters (e.g. a colon (:) means "equal") when used between a property and its value. If a colon (for example) appears in the value portion, then it should be replaced with the URL-safe encoding (%3A).
The following PowerShell works when the string is filter=username:"user1@domain.com". The @ symbol is replaced with %40. However, if the string is, filter=username:"user1@domain.com"|"user2@domain.com", then only the first @ is replaced.
The PowerShell snipet looks like this:
$Filter = 'filter=username:"user1@domain.com"'
$pattern1 = '[^a-zA-Z\d\s]' # Match any non-alpha numeric or white space character.
$pattern2 = '(?:>:|<:|:|>|<|!:|:|~|!~|@)(?:")(.*?)(?:")' # Allow us to replace characters in the filter. We will leave some of the characters alone, since they are used by the API in certain spots. For example, ":" means equal between the property name and value but should be replaced in the value portion of the pair.
$regex = [Regex]::new($pattern2)
$regex.Matches($Filter) | ForEach-Object {
$Filter = $Filter -replace ([regex]::Escape($_.Groups[1].value)), ([uri]::EscapeDataString($_.Groups[1].value))
}
What am I missing here?
答案1
得分: 1
我对确切的要求不太清楚,但也许以下内容符合您的要求:
要转换的示例字符串。
$filter = 'filter=username:"user1@domain.com"|"user2@domain.com"'
$filter,
'(?<=[:|><~@]").*?(?=")',
{
param($m)
[Uri]::EscapeDataString($m.Value)
}
)
这依赖于所有属性值都被包含在`"..."`中(并且没有转义的嵌套`"`),并且前面有以下任何一个字符:`: | < > ~ @`
结果(只有`"..."`子字符串中的属性值被URI转义):
filter=username:"user1%40domain.com"|"user2%40domain.com"
有关正则表达式的解释以及进行实验的能力,请参阅[此 regex101.com 页面](https://regex101.com/r/1PziwI/2)。
---
至于您**尝试过的内容**:
您的正则表达式唯一缺少的部分是`\|`片段,以便在属性值的开头`"`(插入在`|@`之前)之前也匹配一个明文`|`:
(?:>:|<:|:|>|<|!:|:|~|!~|||@)(?:")(.*?)(?:")
请参阅[此 regex101.com 页面](https://regex101.com/r/OvjHmF/1)。
但是,请注意,您的正则表达式可以通过使用字符集(`[...]`)并删除冗余部分来简化:
(?::|><~@(.*?)(?:")
尽管在上面的解决方案中使用了简化形式,但是通过使用_前瞻断言_,因此不需要捕获组。
英文:
<!-- language-all: sh -->
I'm unclear on the exact requirements, but perhaps the following does what you want:
# Sample string to transform.
$filter = 'filter=username:"user1@domain.com"|"user2@domain.com"'
[regex]::Replace(
$filter,
'(?<=[:|><~@]").*?(?=")',
{
param($m)
[Uri]::EscapeDataString($m.Value)
}
)
This relies on all property values being enclosed in "..." (and having no escaped embedded ") and being preceded by any one of the following chars.: : | < > ~ @
Result (only the property values inside "..." substrings were URI-escaped):
filter=username:"user1%40domain.com"|"user2%40domain.com"
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.
As for what you tried:
The only thing missing from your regex is a \| segment so as to also match a verbatim | before a property value's opening " (inserted before |@):
(?:>:|<:|:|>|<|!:|:|~|!~|\||@)(?:")(.*?)(?:")
However, note that your regex can be simplified, by using character sets ([...]) and removing redundant parts:
(?:[:|><~@](?:")(.*?)(?:")
This simplified form is used in the solution above, albeit via using lookaround assertions so that no capture group is necessary.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论