Catastrophic Backtracking在Java 9+中的正则表达式示例

huangapple go评论54阅读模式
英文:

Catastrophic Backtracking regular expression example in Java 9+

问题

有人是否有一个在Java 11中运行的正则表达式catastrophic backtracking示例吗?大多数常见示例(比如"(a+a+)+b")已经自Java 9起得到修复。最好的例子是一个没有回溯引用的例子,不知道在JDK 9+中是否可能。

在我们的应用程序中,我们有一个控制这种回溯的逻辑,并且为了测试该逻辑,我们使用了表达式"(x+x+)+y"。升级到JDK 11后,它不再引起需要的行为。

英文:

Does anyone have an example of a catastrophic backtracking in a regular expression, which works in Java 11? Most of the usual examples (like "(a+a+)+b") are fixed since java 9. The best would be one without back-references, no idea if it's possible in JDK 9+.

In our application we have a logic to control such backtracking, and to test that logic we used the expression "(x+x+)+y". After upgrading to JDK 11 it no longer causes the need behavior.

答案1

得分: 0

以下是翻译好的内容:

我正在使用 Java 14,回溯问题仍然存在。以下操作大约需要6秒钟才会失败。

String regex = "(a+a+a+a+a+a+)+b";
String str = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac";
System.out.println(str.matches(regex));

可以通过使用独立组 (?>(X)) 来避免这个问题。一旦组在第一次匹配时成功,引擎会检查接下来的内容是否匹配 b。如果不匹配,无需重复,因为回溯不会改变这个结果。所以它会立即失败(即使针对更大的表达式和/或测试字符串)。

String regex1 = "(?>(a+a+a+a+a+a+)+)b";
System.out.println(str.matches(regex1));
英文:

I am running java 14 and the backtracking issue still exists. It takes the following about 6 seconds to fail.

String regex = "(a+a+a+a+a+a+)+b";
String str = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac";
System.out.println(str.matches(regex));

This can be prevented by using independent group(s) (?>X). Once the group matches the first time, the engine checks to see if the following matches the b. If it doesn't, no need to repeat as backtracking will not change that outcome. So it fails instantly (even for larger expressions and/or test strings).

String regex1 = "(?>(a+a+a+a+a+a+)+)b";
System.out.println(str.matches(regex1));

</details>



huangapple
  • 本文由 发表于 2020年10月9日 20:45:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/64280370.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定