英文:
Regex to Identify expression between paranthesis in Java
问题
我有一个字符串表达式如下:
(((status==SUBMITTED) && (submit_date>2020-01-03)) &&(dueDate<(proof_date+1)))
我想要从上述字符串中识别出所有的内部表达式,如下所示:
(status==SUBMITTED)
(submit_date>2020-01-03)
(dueDate<(proof_date+1)
((status==SUBMITTED) && (submit_date>2020-01-03))
((status==SUBMITTED) && (submit_date>2020-01-03)) &&(dueDate<(proof_date+1))
以下是我用 Java 写的代码:
String expression = "((status==SUBMITTED)&&(submit_date>2020-01-03)&&(dueDate<(proof_date+1)))";
//Matcher m = Pattern.compile("\\((.*?)\\)").matcher(expression);
//Matcher m = Pattern.compile("\\([^()]*\\)").matcher(expression);
Matcher m = Pattern.compile("\\(([^()]*|\\([^()]*|\\))*\\)").matcher(expression);
while(m.find()) {
System.out.println("Group Result: "+m.group(0));
}
然而,我无法获得组合结果,只显示如下值:
Group Result: ((status==SUBMITTED)
Group Result: (submit_date>2020-01-03)
Group Result: (dueDate<(proof_date+1)
如何才能获得所有上述的组合结果?是否有一个正则表达式可以正确识别?
英文:
I have an expression as a string below:
(((status==SUBMITTED) && (submit_date>2020-01-03)) &&(dueDate<(proof_date+1)))
I wanted to identify all the inner expressions in the above string as below:
(status==SUBMITTED)
(submit_date>2020-01-03)
(dueDate<(proof_date+1)
((status==SUBMITTED) && (submit_date>2020-01-03))
((status==SUBMITTED) && (submit_date>2020-01-03)) &&(dueDate<(proof_date+1))
Below is my code written in Java:
String expression = "((status==SUBMITTED)&&(submit_date>2020-01-03)&&(dueDate<(proof_date+1)))";
//Matcher m = Pattern.compile("\\((.*?)\\)").matcher(expression);
//Matcher m = Pattern.compile("\\([^()]*\\)").matcher(expression);
Matcher m = Pattern.compile("\\(([^()]*|\\([^()]*|\\))*\\)").matcher(expression);
while(m.find()) {
System.out.println("Group Result: "+m.group(0));
}
However I am not getting the combinations and its only showing the below values:
Group Result: ((status==SUBMITTED)
Group Result: (submit_date>2020-01-03)
Group Result: (dueDate<(proof_date+1)
How can I get all the above combinations. Is there a Regex that can identify this properly?
答案1
得分: 2
正则表达式在一般情况下并不足以解决这个问题。就好像用花园水管来对抗五号警报的大火一样。
使用正则表达式,你最多只能期望得到一个能够找到带有特定嵌套层级的括号表达式的模式。例如,识别 ((...)...(...))
应该是可能的,或者 (((...)...(...))..((...)..(...)))
。但是即使是看似简单的情况,比如 (((...)..(...))...(...))
,也会变得困难,甚至是不可能的。
要处理任意的表达式,你需要比正则表达式更强大的技术。在形式语法的层次结构中,从正则表达式上升的下一步是被称为上下文无关文法(CFGs)的技术;它们足够强大,可以处理任意数量、正确嵌套以及任意深度的嵌套结构。
然而,对于仅有括号表达式来说,你实际上并不需要释放CFG解析器的全部威力。你只需要一个堆栈,用来跟踪尚未找到匹配右括号的左括号:
String expression = "((status==SUBMITTED)&&(submit_date>2020-01-03)&&(dueDate<(proof_date+1)))";
Stack<Integer> lpar = new Stack<>();
for (int i = 0; i < expression.length(); ++i) {
char c = expression.charAt(i);
if (c == '(') {
lpar.push(i);
} else if (c == ')') {
if (lpar.isEmpty()){
System.out.println("Unbalanced )");
break;
}
int start = lpar.pop();
System.out.println(expression.substring(start, i+1));
}
}
if (!lpar.isEmpty()){
System.out.println("Missing )");
}
英文:
Regular expressions simply are not powerful enough to solve this problem in general. Think of it like trying to fight a five-alarm inferno with a garden hose.
The best you could hope for with regular expressions would be a pattern that can find parenthesized expressions with a specific level of nesting. For example, recognizing ((...)...(...))
should be possible, or (((...)...(...))..((...)..(...)))
. But even something as deceptively simple as (((...)..(...))...(...))
would be difficult, if not impossible.
To handle arbirary expressions, you need a more powerful technique than regular expressions. The next step up from regular expressions in the formal hierarchy of grammars is known as Context-free grammars (CFGs); they're powerful enough to handle any number of types of nested structures, properly nested within each other, and nested to arbitrary depths.
However, for just parenthesized expressions, you don't actually need to unleash the full power of a CFG parser. All you need is a stack, to keep track of left-parens for which you haven't yet seen a matching right-paren:
String expression = "((status==SUBMITTED)&&(submit_date>2020-01-03)&&(dueDate<(proof_date+1)))";
Stack<Integer> lpar = new Stack<>();
for (int i = 0; i < expression.length(); ++i) {
char c = expression.charAt(i);
if (c == '(') {
lpar.push(i);
} else if (c == ')') {
if (lpar.isEmpty()){
System.out.println("Unbalanced )");
break;
}
int start = lpar.pop();
System.out.println(expression.substring(start, i+1));
}
}
if (!lpar.isEmpty()){
System.out.println("Missing )");
}
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论