正则表达式到通配符(Glob)和通配符到正则表达式的转换

huangapple go评论94阅读模式
英文:

Regex to Glob and vice-versa conversion

问题

我们有一个需求,我们想将正则表达式转换为支持CloudFront的Glob格式,反之亦然。有什么建议可以实现这个需求吗?首先,是否可能实现这个需求?特别是从正则表达式到Glob的转换,因为我了解正则表达式是一种超集,所以可能无法将所有的正则表达式转换为相应的Glob格式。

英文:

We have a requirement where we want to convert Regex to cloudfront supported Glob and vice-versa. Any suggestion how can we achieve that and first of all whether it's possible?especially from Regex to Glob, as I understand regex is kind of superset so it might not be possible to convert all the Regex to corresponding Glob?

答案1

得分: 2

要从通配符转换为glob,您需要编写一个解析器,将模式拆分为抽象语法树。例如,通配符*-{[0-9],draft}.docx可能解析为[Anything(), "-", OneOf([Range("0", "9"), "draft"]), ".docx"]

然后,您需要遍历AST并为每个节点输出等效的正则表达式。例如,您可能会使用以下规则:

Anything()  -> .*
Range(x, y) -> [x-y]
OneOf(x, y) -> (x|y)

生成的正则表达式为.*-([0-9]|draft).docx

这还不完美,因为您还必须记住转义任何特殊字符;.是正则表达式中的特殊字符,因此您应该转义它,最终得到.*-([0-9]|draft)\.docx

严格来说,不是所有的正则表达式都可以转换为glob模式。通配符中不存在Kleene星号操作;简单的正则表达式a*(即任意数量的a字符)无法转换为glob模式。

我不确定CloudFront支持哪些类型的glob(文档中没有找到“glob”一词),但是这里有一些关于常见的shell glob模式通配符的文档

以下是一些等效序列的总结:

Glob通配符 正则表达式 含义
? . 任意单个字符
* .* 零个或多个字符
[a-z] [a-z] 范围内的任意字符
[!a-m] [^a-m] 不在范围内的字符
[a,b,c] [abc] 给定字符中的任意一个
{cat,dog,bat} `(cat dog
{*.tar,*.gz} `(.*.tar .*.gz)`
英文:

To convert from a glob you would need to write a parser that split the pattern into an abstract syntax tree. For example, the glob *-{[0-9],draft}.docx might parse to [Anything(), "-", OneOf([Range("0", "9"), "draft"]), ".docx"].

Then you would walk the AST and output the equivalent regular expression for each node. For example, the rules you might use for this could be:

Anything()  -> .*
Range(x, y) -> [x-y]
OneOf(x, y) -> (x|y)

resulting in the regular expression .*-([0-9]|draft).docx.

That's not perfect, because you also have to remember to escape any special characters; . is a special character in regular expressions, so you should escape it, yielding finally .*-([0-9]|draft)\.docx.

Strictly speaking regular expression cannot all be translated to glob patterns. The Kleene star operation does not exist in globbing; the simple regular expression a* (i.e., any number of a characters) cannot be translated to a glob pattern.

I'm not sure what types of globs CloudFront supports (the documentation returned no hits for the term "glob"), but here is some documentation on commonly-supported shell glob pattern wildcards.

Here is a summarization of the some equivalent sequences:

Glob Wildcard Regular Expression Meaning
? . Any single character
* .* Zero or more characters
[a-z] [a-z] Any character from the range
[!a-m] [^a-m] A character not in the range
[a,b,c] [abc] One of the given characters
{cat,dog,bat} `(cat dog
{*.tar,*.gz} `(.*.tar .*.gz)`

huangapple
  • 本文由 发表于 2022年12月5日 11:17:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/74683013.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定