Java中的String.replaceAll()与matcher.replaceAll()在循环中的差异

huangapple go评论61阅读模式
英文:

Java: String.replaceAll() vs matcher.replaceAll() in a loop

问题

这可能是一个非常简单的问题,也可能是一个重复的问题(尽管我之前确实尝试过检查),但在循环中使用时,哪个更便宜,String.replaceAll() 还是 matcher.replaceAll()
尽管有人告诉我

Pattern regexPattern = Pattern.compile("[^a-zA-Z0-9]");
Matcher matcher;
String thisWord;
while (Scanner.hasNext()) {
   matcher = regexPattern.matcher(Scanner.next());
   thisWord = matcher.replaceAll("");
   ...
} 

更好,因为你只需要编译一次正则表达式,但我认为以下方式的优势

String thisWord;
while (Scanner.hasNext()) {
   thisWord = Scanner.next().replaceAll("[^a-zA-Z0-9]","");
   ...
}

远远超过 matcher 的方法,因为不需要每次都初始化 matcher。(我理解 matcher 已经存在,所以你不是在每次都重新创建它。)

有人能否解释一下我的推理哪里出错了?我是否误解了 Pattern.matcher() 的作用?

英文:

This is probably an incredibly simple question, as well as likely a duplicate (although I did try to check beforehand), but which is less expensive when used in a loop, String.replaceAll() or matcher.replaceAll()?<br>
While I was told

Pattern regexPattern = Pattern.compile(&quot;[^a-zA-Z0-9]&quot;);
Matcher matcher;
String thisWord;
while (Scanner.hasNext()) {
   matcher = regexPattern.matcher(Scanner.next());
   thisWord = matcher.replaceAll(&quot;&quot;);
   ...
} 

is better, because you only have to compile the regex once, I would think that the benefits of

String thisWord;
while (Scanner.hasNext()) {
   thisWord = Scanner.next().replaceAll(&quot;[^a-zA-Z0-9]&quot;,&quot;&quot;);
   ...
}

far outweigh the matcher method, due to not having to initialize the matcher every time. (I understand the matcher exists already, so you are not recreating it.) <br><br>
Can someone please explain how my reasoning is false? Am I misunderstanding what Pattern.matcher() does?

答案1

得分: 1

在OpenJDK中,String.replaceAll的定义如下:

    public String replaceAll(String regex, String replacement) {
        return Pattern.compile(regex).matcher(this).replaceAll(replacement);
    }

代码链接

因此,至少根据这个实现,它不会比仅编译一次模式并使用Matcher.replaceAll获得更好的性能。

可能有其他JDK实现的String.replaceAll有不同的实现方式,但如果有任何一种实现比Matcher.replaceAll执行得更好,我会感到非常惊讶。


> […] 由于无需每次都初始化匹配器。(我理解匹配器已经存在,因此您不需要重新创建它。)

我认为您在这里有误解。您实际上确实在每次循环迭代中创建了一个新的Matcher实例;但是那非常廉价,从性能的角度来看不是什么需要担心的事情。


顺便说一句,如果您不想要一个单独的'matcher'变量,实际上是不需要的;如果您这样编写:

   thisWord = regexPattern.matcher(Scanner.next()).replaceAll("");
英文:

In OpenJDK, String.replaceAll is defined as follows:

    public String replaceAll(String regex, String replacement) {
        return Pattern.compile(regex).matcher(this).replaceAll(replacement);
    }

[code link]

So at least with that implementation, it won't give better performance than compiling the pattern only once and using Matcher.replaceAll.

It's possible that there are other JDK implementations where String.replaceAll is implemented differently, but I'd be very surprised if there were any where it performed better than Matcher.replaceAll.


> [&hellip;] due to not having to initialize the matcher every time. (I understand the matcher exists already, so you are not recreating it.)

I think you have a misunderstanding here. You really do create a new Matcher instance on each loop iteration; but that is very cheap, and not something to be concerned about performance-wise.


Incidentally, you don't actually need a separate 'matcher' variable if you don't want one; you'll get exactly the same behavior and performance if you write:

   thisWord = regexPattern.matcher(Scanner.next()).replaceAll(&quot;&quot;);

答案2

得分: 0

有一种更高效的方法,如果您重置同一个匹配器,那么它就不会在循环内的每个场合都被重新生成,从而复制了与Pattern结构大部分相同的信息。

Pattern regexPattern = Pattern.compile("[^a-zA-Z0-9]");
Matcher matcher = regexPattern.matcher("");
String thisWord;
while (Scanner.hasNext()) {
    matcher = matcher.reset(Scanner.next());
    thisWord = matcher.replaceAll("");
    // ...
}

在循环外创建匹配器的一次性成本是 regexPattern.matcher(""),但调用 matcher.reset(xxx) 将会更快,因为它们会重用该匹配器,而不是每次重新生成一个新的匹配器实例。这减少了所需的垃圾回收量。

英文:

There is a more efficient way if you reset the same matcher, then it is not regenerated on each occasion inside the loop which makes a copy of most of the same information relating to the Pattern structure.

Pattern regexPattern = Pattern.compile(&quot;[^a-zA-Z0-9]&quot;);
Matcher matcher = regexPattern.matcher(&quot;&quot;);
String thisWord;
while (Scanner.hasNext()) {
   matcher = matcher.reset(Scanner.next());
   thisWord = matcher.replaceAll(&quot;&quot;);
   // ...
} 

There is a one-off cost to create the matcher outside the loop regexPattern.matcher(&quot;&quot;) but the calls to matcher.reset(xxx) will be quicker because they re-use that matcher rather than re-generating a new matcher instance each time. This reduces the amount of GC required.

huangapple
  • 本文由 发表于 2020年9月22日 12:10:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/64002967.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定