2023年5月17日 16:51:06go评论75阅读模式

英文:

Optimizing CPU Usage in Java Regex Matching

问题

我遇到了与正则表达式匹配相关的性能问题，这在我的Java项目中导致了高CPU使用率。尽管我尝试过优化正则表达式，但在多个线程同时调用代码时，性能问题仍然存在。

例如，当同时调用执行正则表达式匹配的方法100次时，CPU使用率会在短时间内飙升到90％。

以下是我的代码的简化示例：

String BUY_PATTERN = ".*\\b(purchase)\\b.*";

private static boolean isMatchPattern(String pattern, String text) {
     return text.matches(BUY_PATTERN);
}

我想减少正则表达式匹配期间的CPU使用率。您能提供更高效的正则表达式模式建议，以实现相同的功能吗？

此外，我看到了一篇文章（提供的链接）讨论了回溯对性能的影响，但我发现很难重写正则表达式以最小化回溯。

感谢您的帮助！

英文:

I encountered a performance issue related to regex matching in my Java project, which resulted in high CPU usage. Despite my attempts at regex optimization, I'm still experiencing performance problems, particularly when multiple threads concurrently invoke the code.

For instance, when calling a method that performs regex matching concurrently 100 times, the CPU usage spikes to 90% for a brief period.

Here's a simplified example of my code:

String BUY_PATTERN =&quot;.*\\b(purchase)\\b.*&quot;;

private static boolean isMatchPattern(String pattern, String text) {
     return text.matches(BUY_PATTERN);
}

I would like to reduce the CPU usage during regex matching. Can you provide suggestions for more efficient regex patterns that achieve the same functionality?

Additionally, I came across an article (link provided) discussing the impact of backtracking on performance, but I find it challenging to rewrite the regex to minimize backtracking.

Thank you for your assistance!

答案1

得分: 1

以下是翻译的内容：

有一些改变你可以做。

通过创建可重用对象，可以大大减少CPU消耗。

public class Example {
    Pattern pattern = Pattern.compile("\\bpurchase\\b");
    Matcher matcher;

    private boolean isMatchPattern(String text) {
        matcher = pattern.matcher(text);
        return matcher.find();
    }
}

在幕后，每次调用String.matches时都会创建一个新的Pattern和Matcher对象。

为了解决这个问题，你可以在你的类中创建Pattern和Matcher字段。
然后，从你的isMatchPattern方法内访问这些字段。

此外，对于正则表达式模式，没有必要捕获文本"purchase"，所以你可以删除括号。

另外，Pattern模式的上下文是不符合要求的；它期望在文本的任何地方。与String.matches调用相反，它要求整个参数匹配。所以，你不需要起始和结束的.*，因为它们是多余的。

关于使用String.indexOf或String.contains。

如果你需要单词边界检查，那么从成语角度来看，这在某种程度上是不合适的，因为你需要进行多次调用。

如果你不需要这种检查，那么这将是一种可行的方法。

作为最终解决方案，你可以创建一个字符数组循环，这更或多或少是Matcher类所做的。

英文:

There are a few changes you can make.

You can greatly reduce the CPU consumption by creating re-usable objects.

public class Example {
    Pattern pattern = Pattern.compile(&quot;\\bpurchase\\b&quot;);
    Matcher matcher;

    private boolean isMatchPattern(String text) {
        matcher = pattern.matcher(text);
        return matcher.find();
    }
}

Behind the scenes, upon each call of String.matches, a new Pattern and Matcher object is created.

To combat this, you can create Pattern and Matcher fields within your class.
Then, access these fields from within your isMatchPattern method.

Furthermore, for the regular expression pattern, there is no need to capture the text "purchase", so you can remove the parentheses.

Additionally, the context of a Pattern pattern is non-conforming; it's expected to be anywhere within the text.
As opposed to a String.matches call, which requires the entire parameter to match.
So, you don't need the starting and ending .*, as they are redundant.

In regard to using an String.indexOf, or String.contains.

If you require the word-boundary check, then this is somewhat out of the question in terms of an idiomatic approach, as you'd have to make more than one call.

If you don't require the check, then this would be the way to go.

As a final solution, you can create a character array for-loop, which is more or less what the Matcher class does.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Optimizing CPU Usage in Java Regex Matching

问题

答案1

删除所有文本，在第一个括号之前和最后一个括号之后的字符。

如何创建一个嵌套的 JSONObject

在数据库更新后的模型类问题，

如何在一个二维双精度数组中去除某些双精度数值的小数位？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论