英文:
Faster way to get the numerical difference between 2 byte[] arrays?
问题
我正在处理一个程序,其中我有两个字节数组,需要计算它们之间的差异。例如,如果第一个数组是{1,2,3},第二个数组是{2,3,4},差异将为3。
我目前的方法如下:
public long calculateDifference(byte[] a, byte[] b) {
long difference = 0;
for(int i = 0; i < a.length; i++) {
difference += Math.abs(a[i] - b[i]);
}
return difference;
}
然而,该程序需要能够处理具有多达约500万个元素的字节数组,因此使用当前的方法会太慢。
因为我有16个线程,我看过并行流作为一个选项。但是由于没有ByteStream
,在没有拆箱和装箱的情况下,使用reduce
和collect
操作是不可能的。
另一个选项是使用IntStream.range(0, byteArrayLength)
创建并行流,然后使用整数访问索引。然而,为了做到这一点,需要LongAdder
或AtomicLong
,在我的基准测试中,这两者都要慢得多(LongAdder
似乎在内部使用数组,然后在最后将其汇总)。
有没有更有效的方法来实现这一点?我不介意添加外部依赖。谢谢!
英文:
I'm working on a program where I have 2 byte arrays and need to calculate the difference between them. For example, if the first array was {1, 2, 3}, and the second array {2, 3, 4}, the difference would be 3.
My current method to do this looks like this:
public long calculateDifference(byte[] a, byte[] b) {
long difference = 0;
for(int i = 0; i < a.length; i++) {
difference += Math.abs(a[i] - b[i]);
}
return difference;
}
However, the program will need to be able to process byte arrays that have up to around 5,000,000 elements, so using the current method would be too slow.
Because I have 16 threads, I've seen parallel streams as an option. But because there's no ByteStream, using the reduce and collect operations wouldn't be possible without unboxing and boxing.
Another option would be to use IntStream.range(0, byteArrayLength)
to create a parallel stream and the access the index using the int. However, to do this a LongAdder or AtomicLong would be necessary, both of which are much slower in my benchmarks. (LongAdder seems to use an array internally then sum it up at the end)
Is there a more efficient way to achieve this? I don't mind adding external dependencies. Thanks!
答案1
得分: 2
一个尝试的方法是将数据分成两个或更多区域,分别在不同的线程中处理。对于包含十亿项的数组,这可能会产生足够大的差异,使得这样做是值得的,但对于仅有五百万项的数组,可能不值得。
接下来是一个非常简陋的概念验证,您可以用来评估这个想法是否有任何价值。
创建一个用于计算区域差异的方法:
public long calculateDifference(byte[] a, byte[] b, int start, int end) {
long difference = 0;
for (int i = start; i < end; i++) {
difference += Math.abs(a[i] - b[i]);
}
return difference;
}
然后从多个线程中调用此方法,并组合结果:
ExecutorService threadPool = Executors.newFixedThreadPool(2);
public long calculateDifference(byte[] a, byte[] b) throws Exception {
Future<Long> diff1 = threadPool.submit(() -> calculateDifference2(a, b, 0, a.length / 2));
Future<Long> diff2 = threadPool.submit(() -> calculateDifference2(a, b, a.length / 2, a.length));
return diff1.get() + diff2.get();
}
英文:
One thing you can try is divide the data into two or more regions that are each processed in separate threads. It may make enough of a difference for arrays of a billion items to make it worth it, but for as few as 5 million, probably not.
What follows is a very crude proof-of-concept you can use to evaluate if the idea has any merit at all.
Make a method that does the computation for a region:
public long calculateDifference(byte[] a, byte[] b, int start, int end) {
long difference = 0;
for(int i = start; i < end; i++) {
difference += Math.abs(a[i] - b[i]);
}
return difference;
}
And call this method from several threads, and combine the results:
ExecutorService threadPool = Executors.newFixedThreadPool(2);
public long calculateDifference(byte[] a, byte[] b) throws Exception {
Future<Long> diff1 = threadPool.submit(() -> calculateDifference2(a, b, 0, a.length / 2));
Future<Long> diff2 = threadPool.submit(() -> calculateDifference2(a, b, a.length / 2, a.length));
return diff1.get() + diff2.get();
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论