英文:
Performance wise is better to use a split or a matching regex to extract subtext from a string?
问题
我有一个类似这样的字符串:
/good/312321312/bad/3213122131
我必须从中提取两组数字。
我考虑了两种解决方案:要么使用split()
,要么简单地编写一个正则表达式来匹配数字。
从性能角度来看,哪种解决方案更好?
如果您有其他建议,请告诉我。
英文:
I have a string like this:
/good/312321312/bad/3213122131
I have to extract the two sets of digits from there.
I thought about two solutions: either using the split()
or simply writing a regex to match the digits.
What would be the better solution performance wise?
If you have any other suggestion please tell me so.
答案1
得分: 2
由于创建新字符串意味着复制所有字符,`split` 的隐式 `substring` 操作是这里最昂贵的部分。创建一个数组来容纳所有字符串会增加开销,但与字符串创建相比微不足道。然而,我们可以避免这两者。
```java
static final Pattern NUMBER = Pattern.compile("\\d+");
public static void main(String[] args) {
String s = "/good/312321312/bad/3213122131";
long first = -1, second = -1;
Matcher m = NUMBER.matcher(s);
if(m.find()) {
first = Long.parseLong(s, m.start(), m.end(), 10);
if(m.find()) {
second = Long.parseLong(s, m.start(), m.end(), 10);
}
}
System.out.println(first + "\t" + second);
}
或者
public static void main(String[] args) {
String s = "/good/312321312/bad/3213122131";
LongStream.Builder b = LongStream.builder();
Matcher m = NUMBER.matcher(s);
while(m.find()) b.add(Long.parseLong(s, m.start(), m.end(), 10));
long[] result = b.build().toArray();
System.out.println(Arrays.toString(result));
}
在性能要求高的情况下,重用已编译的 Pattern
实例非常重要,而不是使用像 String.split
这样的便捷方法,它在操作后会丢弃 Pattern
实例。
显然,这只有在代码被执行多次时才会有影响。但当代码仅执行一次时,性能不会成为问题。
允许跳过 substring
操作的 Long.parseLong
方法自 Java 9 开始存在。但即使在这里使用 Long.parseLong(m.group())
,您也会避免为非数字部分创建字符串,并使临时字符串尽可能短,这对优化器友好。
<details>
<summary>英文:</summary>
Since creating a new string implies copying all characters, the implicit `substring` operations of `split` are the most expensive aspect here. Creating an array, to hold all the strings, adds to it, but is minuscule compared to the string creations. Still, we can avoid both.
```java
static final Pattern NUMBER = Pattern.compile("\\d+");
public static void main(String[] args) {
String s = "/good/312321312/bad/3213122131";
long first = -1, second = -1;
Matcher m = NUMBER.matcher(s);
if(m.find()) {
first = Long.parseLong(s, m.start(), m.end(), 10);
if(m.find()) {
second = Long.parseLong(s, m.start(), m.end(), 10);
}
}
System.out.println(first + "\t" + second);
}
or
public static void main(String[] args) {
String s = "/good/312321312/bad/3213122131";
LongStream.Builder b = LongStream.builder();
Matcher m = NUMBER.matcher(s);
while(m.find()) b.add(Long.parseLong(s, m.start(), m.end(), 10));
long[] result = b.build().toArray();
System.out.println(Arrays.toString(result));
}
When performance matters, it’s important to keep and reuse compiled Pattern
instances instead of using convenience methods like String.split
which throw away the Pattern
instance after the operation.
Obviously, this only matters if the code is executed more than once. But when the code is executed only once, its performance wouldn’t matter anyway.
The Long.parseLong
method that allows to skip the substring
operation exists since Java 9. But even when you use Long.parseLong(m.group())
here, you avoid creating strings for the non-numerical parts and retain the temporary strings as short as possible, which is optimizer-friendly.
答案2
得分: 1
使用拆分方法可能是更高效的方法。我们可以将输入拆分成组件,然后检查每个组件,看它是否是长整型。
String path = "/good/312321312/bad/3213122131";
String[] parts = path.split("/");
List<Long> nums = new ArrayList<>();
for (String part : parts) {
try {
long num = Long.parseLong(part);
nums.add(num);
}
catch (NumberFormatException nfe) {
}
}
System.out.println("Found nums: " + nums);
这将打印:
Found nums: [312321312, 3213122131]
任何只使用基本字符串函数的解决方案可能优于调用正则表达式引擎的成本。
英文:
Using a split approach might typically be the more efficient approach. We can split the input into components and then check each one to see if it be an long integer.
<!-- language: java -->
String path = "/good/312321312/bad/3213122131";
String[] parts = path.split("/");
List<Long> nums = new ArrayList<>();
for (String part : parts) {
try {
long num = Long.parseLong(part);
nums.add(num);
}
catch (NumberFormatException nfe) {
}
}
System.out.println("Found nums: " + nums);
This prints:
Found nums: [312321312, 3213122131]
Any solution which only uses base string functions might outperform the cost of invoking a regex engine.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论