英文:
Regex to Glob and vice-versa conversion
问题
我们有一个需求,我们想将正则表达式转换为支持CloudFront的Glob格式,反之亦然。有什么建议可以实现这个需求吗?首先,是否可能实现这个需求?特别是从正则表达式到Glob的转换,因为我了解正则表达式是一种超集,所以可能无法将所有的正则表达式转换为相应的Glob格式。
英文:
We have a requirement where we want to convert Regex to cloudfront supported Glob and vice-versa. Any suggestion how can we achieve that and first of all whether it's possible?especially from Regex to Glob, as I understand regex is kind of superset so it might not be possible to convert all the Regex to corresponding Glob?
答案1
得分: 2
要从通配符转换为glob,您需要编写一个解析器,将模式拆分为抽象语法树。例如,通配符*-{[0-9],draft}.docx
可能解析为[Anything(), "-", OneOf([Range("0", "9"), "draft"]), ".docx"]
。
然后,您需要遍历AST并为每个节点输出等效的正则表达式。例如,您可能会使用以下规则:
Anything() -> .*
Range(x, y) -> [x-y]
OneOf(x, y) -> (x|y)
生成的正则表达式为.*-([0-9]|draft).docx
。
这还不完美,因为您还必须记住转义任何特殊字符;.
是正则表达式中的特殊字符,因此您应该转义它,最终得到.*-([0-9]|draft)\.docx
。
严格来说,不是所有的正则表达式都可以转换为glob模式。通配符中不存在Kleene星号操作;简单的正则表达式a*
(即任意数量的a
字符)无法转换为glob模式。
我不确定CloudFront支持哪些类型的glob(文档中没有找到“glob”一词),但是这里有一些关于常见的shell glob模式通配符的文档。
以下是一些等效序列的总结:
Glob通配符 | 正则表达式 | 含义 |
---|---|---|
? |
. |
任意单个字符 |
* |
.* |
零个或多个字符 |
[a-z] |
[a-z] |
范围内的任意字符 |
[!a-m] |
[^a-m] |
不在范围内的字符 |
[a,b,c] |
[abc] |
给定字符中的任意一个 |
{cat,dog,bat} |
`(cat | dog |
{*.tar,*.gz} |
`(.*.tar | .*.gz)` |
英文:
To convert from a glob you would need to write a parser that split the pattern into an abstract syntax tree. For example, the glob *-{[0-9],draft}.docx
might parse to [Anything(), "-", OneOf([Range("0", "9"), "draft"]), ".docx"]
.
Then you would walk the AST and output the equivalent regular expression for each node. For example, the rules you might use for this could be:
Anything() -> .*
Range(x, y) -> [x-y]
OneOf(x, y) -> (x|y)
resulting in the regular expression .*-([0-9]|draft).docx
.
That's not perfect, because you also have to remember to escape any special characters; .
is a special character in regular expressions, so you should escape it, yielding finally .*-([0-9]|draft)\.docx
.
Strictly speaking regular expression cannot all be translated to glob patterns. The Kleene star operation does not exist in globbing; the simple regular expression a*
(i.e., any number of a
characters) cannot be translated to a glob pattern.
I'm not sure what types of globs CloudFront supports (the documentation returned no hits for the term "glob"), but here is some documentation on commonly-supported shell glob pattern wildcards.
Here is a summarization of the some equivalent sequences:
Glob Wildcard | Regular Expression | Meaning |
---|---|---|
? |
. |
Any single character |
* |
.* |
Zero or more characters |
[a-z] |
[a-z] |
Any character from the range |
[!a-m] |
[^a-m] |
A character not in the range |
[a,b,c] |
[abc] |
One of the given characters |
{cat,dog,bat} |
`(cat | dog |
{*.tar,*.gz} |
`(.*.tar | .*.gz)` |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论