英文:
Regex string replacement ends too soon in PowerShell
问题
我正在使用PowerShell(5.1版本),尝试在将字符串传递到REST API的URL之前对其进行编码。该API在某些情况下使用特定字符(例如冒号(:
)表示“等于”),用于属性和其值之间的分隔。如果值部分中出现冒号(例如),则应替换为URL安全编码(%3A
)。
以下是PowerShell代码示例,当字符串为filter=username:"user1@domain.com"
时有效。@
符号将被替换为%40
。然而,如果字符串是filter=username:"user1@domain.com"|"user2@domain.com"
,那么只有第一个@
会被替换。
$Filter = ''filter=username:"user1@domain.com"''
$pattern1 = ''[^a-zA-Z\d\s]'' # 匹配任何非字母数字或空白字符。
$pattern2 = ''(?:>:|<:|:|>|<|!:|:|~|!~|@)(?:")(.*?)(?:")'' # 允许我们替换过滤器中的字符。我们将保留一些字符,因为它们在API的某些位置使用。例如,":" 表示属性名称和值之间的等于,但应在对值部分进行替换。
$regex = [Regex]::new($pattern2)
$regex.Matches($Filter) | ForEach-Object {
$Filter = $Filter -replace ([regex]::Escape($_.Groups[1].Value)), ([uri]::EscapeDataString($_.Groups[1].Value))
}
我在这里漏掉了什么?
英文:
I am using PowerShell (5.1) and am trying to encode a string before I pass it in the URL, to a REST API. The API users certain characters (e.g. a colon (:
) means "equal") when used between a property and its value. If a colon (for example) appears in the value portion, then it should be replaced with the URL-safe encoding (%3A
).
The following PowerShell works when the string is filter=username:"user1@domain.com"
. The @
symbol is replaced with %40
. However, if the string is, filter=username:"user1@domain.com"|"user2@domain.com"
, then only the first @
is replaced.
The PowerShell snipet looks like this:
$Filter = 'filter=username:"user1@domain.com"'
$pattern1 = '[^a-zA-Z\d\s]' # Match any non-alpha numeric or white space character.
$pattern2 = '(?:>:|<:|:|>|<|!:|:|~|!~|@)(?:")(.*?)(?:")' # Allow us to replace characters in the filter. We will leave some of the characters alone, since they are used by the API in certain spots. For example, ":" means equal between the property name and value but should be replaced in the value portion of the pair.
$regex = [Regex]::new($pattern2)
$regex.Matches($Filter) | ForEach-Object {
$Filter = $Filter -replace ([regex]::Escape($_.Groups[1].value)), ([uri]::EscapeDataString($_.Groups[1].value))
}
What am I missing here?
答案1
得分: 1
我对确切的要求不太清楚,但也许以下内容符合您的要求:
要转换的示例字符串。
$filter = 'filter=username:"user1@domain.com"|"user2@domain.com"'
$filter,
'(?<=[:|><~@]").*?(?=")',
{
param($m)
[Uri]::EscapeDataString($m.Value)
}
)
这依赖于所有属性值都被包含在`"..."`中(并且没有转义的嵌套`"`),并且前面有以下任何一个字符:`: | < > ~ @`
结果(只有`"..."`子字符串中的属性值被URI转义):
filter=username:"user1%40domain.com"|"user2%40domain.com"
有关正则表达式的解释以及进行实验的能力,请参阅[此 regex101.com 页面](https://regex101.com/r/1PziwI/2)。
---
至于您**尝试过的内容**:
您的正则表达式唯一缺少的部分是`\|`片段,以便在属性值的开头`"`(插入在`|@`之前)之前也匹配一个明文`|`:
(?:>:|<:|:|>|<|!:|:|~|!~|||@)(?:")(.*?)(?:")
请参阅[此 regex101.com 页面](https://regex101.com/r/OvjHmF/1)。
但是,请注意,您的正则表达式可以通过使用字符集(`[...]`)并删除冗余部分来简化:
(?::|><~@(.*?)(?:")
尽管在上面的解决方案中使用了简化形式,但是通过使用_前瞻断言_,因此不需要捕获组。
英文:
<!-- language-all: sh -->
I'm unclear on the exact requirements, but perhaps the following does what you want:
# Sample string to transform.
$filter = 'filter=username:"user1@domain.com"|"user2@domain.com"'
[regex]::Replace(
$filter,
'(?<=[:|><~@]").*?(?=")',
{
param($m)
[Uri]::EscapeDataString($m.Value)
}
)
This relies on all property values being enclosed in "..."
(and having no escaped embedded "
) and being preceded by any one of the following chars.: : | < > ~ @
Result (only the property values inside "..."
substrings were URI-escaped):
filter=username:"user1%40domain.com"|"user2%40domain.com"
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.
As for what you tried:
The only thing missing from your regex is a \|
segment so as to also match a verbatim |
before a property value's opening "
(inserted before |@
):
(?:>:|<:|:|>|<|!:|:|~|!~|\||@)(?:")(.*?)(?:")
However, note that your regex can be simplified, by using character sets ([...]
) and removing redundant parts:
(?:[:|><~@](?:")(.*?)(?:")
This simplified form is used in the solution above, albeit via using lookaround assertions so that no capture group is necessary.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论