字符串表示方法调用的正则表达式

huangapple go评论87阅读模式
英文:

Regex for string representation of a method call

问题

我有一个遵循特定模式的字符串,如下所示:
operator(field,value)

我想使用正则表达式提取出操作符、字段和值。我在如何捕获它们的语法上遇到了困难。在这种情况下,值也可以是字母数字的组合,例如:

"contains(name, Joe)"
或者 "lt(quantity, 2.5)"

英文:

I have a string that follows a specific pattern like so
operator(field,value)

and I'd like to use regex to extract out all three of operator, field and value. I'm struggling to come up with the syntax for how to capture these. In this case value can be alphanumeric as well, for example

"contains(name, Joe)"
or "lt(quantity, 2.5)"

答案1

得分: 0

我不懂Go语言,但我了解正则表达式,所以我会尽力帮助你。

你可能希望为“操作符”、“字段”和“值”分别设置一个分组。我现在假设每个分组都可以由字母、数字或下划线的任意组合表示,长度至少为一个字符。在正则表达式中,我们有一个快捷方式:\w表示一个字母、数字或下划线字符,+修饰符表示“一个或多个”。因此,\w+表示连续一个或多个这样的字符。如果你希望对这些字段的命名有更复杂的定义,你可以在问题中具体说明。

你说你想支持“操作符(字段,值)”的格式。我将先不考虑任何空格,因为这样更简单,你可以在运行正则表达式之前自行删除所有空格。如果你希望添加一些空格支持到正则表达式中,我们稍后可以做调整,但这会增加一些复杂性。

为此,我们需要三个分组,“1(2,3)”,其中1是操作符名称,2是字段名称,3是值名称。根据上述要求,每个分组在正则表达式中表示为\w+。我们还希望匹配括号和逗号,但我们会将它们丢弃,因为它们只是分隔符。由于正则表达式对括号有特殊含义,所以括号在正则表达式中需要转义。结果如下所示

(\w+)\((\w+),(\w+)\)
\ 1 /  \ 2 / \ 3 /

第二行显示了每个分组的定义位置。

如果你想支持一些空格,你需要在所有这些位置添加\s*。这会变得复杂,但你可以这样做

(\w+)\s*\(\s*(\w+)\s*,\s*(\w+)\s*\)
\ 1 /        \ 2 /       \ 3 /

你举了一个支持浮点数值的例子,我假设还有其他类型的值。你可以使用“或”管道符号|来实现。例如,第三个分组,不仅仅是\w+,可以定义为:

[a-zA-Z_]\w*|\d+\.?|\d*\.\d+

这个字符串将支持字母数字加下划线字符串,其中第一个字符必须是字母或下划线,或者整数,或者浮点数(定义为以句点开头、中间或结尾的整数字符串)。显然,这可以继续扩展以支持更复杂的字符串值,但你明白我的意思。

因此,最终的正则表达式可能如下所示

(\w+)\s*\(\s*(\w+)\s*,\s*([a-zA-Z_]\w+|\d+\.?|\d*\.\d+)\s*\)

很抱歉没有提供任何关于Go语言的帮助,希望其他人可以编辑我的回答并填补这个重要的空白。

英文:

I don't know golang, but I do know regex's, so I'll do what I can here.

You probably want a group each for the "operator", "field", and "value". I'm going to assume for now that each of these can be represented as any combination of alphabetic, numeric, or underscore characters, with length of at least one character. In regex, we have a shortcut for that: \w represents a single alpha-numeric or underscore character, and the + modifier means "one or more". So \w+ means one or more such character in a row. If you want a more complex definition of what these fields can be named, I'll let you specify that in your question.

You say that you want to support "operator(field,value)". I'll start without whitespace anywhere, because it's simpler and you can easily remove all whitespace yourself before running the regex. We'll later add some whitespace support to the regex if you want it, but it'll make life difficult.

To do this, we want three groups, "1(2,3)" where 1 is the operator name, 2 is the field name, and 3 is the value name. Each of these, as given above, will be \w+ in our regex. We'll want to match the open and close parentheses as well as the comma, but we'll throw them away because they're really just delimiters. The parentheses will need to be escaped in the regex, since regex's have a special meaning for parentheses. The result looks like:

(\w+)\((\w+),(\w+)\)
\ 1 /  \ 2 / \ 3 /

Where the second line shows you where the groups are each defined.

If you want to support some whitespace, you'll need to add \s* in all such locations. This gets hairy, but you can do it as such:

(\w+)\s*\(\s*(\w+)\s*,\s*(\w+)\s*\)
\ 1 /        \ 2 /       \ 3 /

You give an example of wanting to support floating point values, and I presume other kinds of values too. You can accomplish this using the "or" pipe, |. For example, group 3, instead of just being \w+, could be defined as

[a-zA-Z_]\w*|\d+\.?|\d*\.\d+

This string will support alphanumeric+underscore strings where the first character must be alphabetic or underscore, OR integers, OR floating point (defined as an integer string with a period at the beginning, middle, or end). Clearly, this can go on and on to support more complex string values, but you get the idea.

So the final regex might look like:

(\w+)\s*\(\s*(\w+)\s*,\s*([a-zA-Z_]\w+|\d+\.?|\d*\.\d+)\s*\)

Sorry for not giving any golang help, I hope someone else can edit my answer and fill in that major gap.

答案2

得分: 0

使用类似以下的代码来捕获分组,你可以使用[]来限制接受的字符,注意正则表达式中的`和对()的转义:

func main() {
    re := regexp.MustCompile(`(.+)\((.+),\s?(.+)\)`)
    for _, t := range tests {
        fmt.Println("result", re.FindStringSubmatch(t))
    }
}

输出结果:

result [contains(field, value) contains field value]
result [contains(name, Joe) contains name Joe]
result [lt(quantity, 2.5) lt quantity 2.5]
result [plus(no,44) plus no 44]

根据你想要的严格程度,你可以使用[a-z]+或类似的表达式来匹配特定字符,而不是使用.+来匹配任意字符,但如果你不担心无效的值,这种方式可能是可以的。

英文:

Use something like this to capture groups, you may want to limit the characters accepted with [], note the use of ` and the use of \ escaping for () within the regexp:

func main() {
    re := regexp.MustCompile(`(.+)\((.+),\s?(.+)\)`)
    for _, t := range tests {
		fmt.Println("result", re.FindStringSubmatch(t))
	}
}

https://play.golang.org/p/43YLTafgQt

output:

result [contains(field, value) contains field value]
result [contains(name, Joe) contains name Joe]
result [lt(quantity, 2.5) lt quantity 2.5]
result [plus(no,44) plus no 44]

Depending on how strict you want to be you could use [a-z]+ or similar instead of .+ to match only certain characters but if you are not worried about bogus values this would probably be fine.

huangapple
  • 本文由 发表于 2017年8月24日 05:22:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/45849468.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定