英文:
Why is summing a sorted array faster than an unsorted one?
问题
I have an ArrayList with numbers from 10 MM to 20 MM
List<Long> l = new ArrayList<>();
for (long i = 10_000_000; i < 20_000_000; i++) {
l.add(i);
}
Why is the sum() function faster on a sorted array than on an unsorted?
public class Main {
public static void main(String[] args) {
List<Long> l = new ArrayList<>();
for (long i = 10_000_000; i < 20_000_000; i++) {
l add(i);
}
System.out.println(sum(l));
Collections.shuffle(l);
System.out.println(sum(l));
Collections.sort(l);
System.out.println(sum(l));
}
static long sum(List<Long> l) {
long started = System.nanoTime();
long result = 0;
for (long v : l) {
result += v;
}
System.out.println(" duration: " + (System.nanoTime() - started) / 1_000_000 + "ms");
return result;
}
}
I execute the sum() function three times
- First time, on a sorted array
- Second time on unsorted
- And the third time again on sorted
On my computer, the execution time was
-
85ms
-
250ms
-
97ms
Why is the sum() function faster on a sorted array than on an unsorted?
英文:
I have an ArrayList with numbers from 10 MM to 20 MM
List<Long> l = new ArrayList<>();
for (long i = 10_000_000; i < 20_000_000; i++) {
l.add(i);
}
Why is the sum() function faster on a sorted array than on an unsorted?
public class Main {
public static void main(String[] args) {
List<Long> l = new ArrayList<>();
for (long i = 10_000_000; i < 20_000_000; i++) {
l.add(i);
}
System.out.println(sum(l));
Collections.shuffle(l);
System.out.println(sum(l));
Collections.sort(l);
System.out.println(sum(l));
}
static long sum(List<Long> l) {
long started = System.nanoTime();
long result = 0;
for (long v : l) {
result += v;
}
System.out.println(" duration: " + (System.nanoTime() - started) / 1_000_000 + "ms");
return result;
}
}
I execute the sum() function three times
- First time, on a sorted array
- Second time on unsorted
- And the third time again on sorted
On my computer, the execution time was
-
85ms
-
250ms
-
97ms
Why is the sum() function faster on a sorted array than on an unsorted?
答案1
得分: 5
不仅仅是排序的问题,而是内存访问的顺序。
我将简要描述一些实验证明这一点:
-
尝试使用
long[]
而不是ArrayList<Long>
。然后,没有单独分配的Long
对象,只有一个包含数据的大数组。这使得sum
的内存访问顺序与数组的内容无关。我没有执行这个实验,因为编写代码需要更多工作。 -
尝试使用随机顺序的值初始化列表。我尝试过(还将列表稍微缩短以节省时间),得到了以下结果:
持续时间:31毫秒 -6262510617230624864 持续时间:77毫秒 -6262510617230624864 持续时间:160毫秒 -6262510617230624864
即最快的顺序与创建
Long
对象的顺序相同,与它们的值无关。 -
在洗牌后按顺序重新创建
Long
对象,例如:for (int i = 0; i < l.size(); i++) l.set(i, new Long(l.get(i)));
这会再次使洗牌列表的
sum
操作变得快速(随后对列表进行排序会改变内存访问模式并再次使sum
变慢)。
实际上,并没有保证连续分配的Long
对象将按照相同的顺序分配到内存中,但当分配了足够多的对象并且同时没有发生太多干扰时,这是一个相当常见的情况。
英文:
It's not about being sorted per-se, it's about the order in which memory is accessed.
I will briefly describe a few experiments to confirm that:
-
Try using
long[]
instead ofArrayList<Long>
. Then there are no individually allocatedLong
objects, just one big array that directly contains the data. That makes the memory access order ofsum
independent of the contents of the array. I did not perform this experiment since it would take a bit more work to write the code. -
Try initializing the list with randomly-ordered values. I tried that (I also made the list a bit shorter to save time) and got this result:
duration: 31ms -6262510617230624864 duration: 77ms -6262510617230624864 duration: 160ms -6262510617230624864
ie the fastest order is the same as the order in which the
Long
objects were created, independently of their values. -
Recreate the
Long
objects in-order after shuffling, for example:for (int i = 0; i < l.size(); i++) l.set(i, new Long(l.get(i)));
That makes the
sum
of the shuffled list fast again (subsequently sorting the list un-orders the memory access pattern and makessum
slow again).
There is actually no guarantee that Long
objects that are allocated one after the other will be allocated in the same order in memory, but that is a pretty common thing to happen, when a sufficiently large number of them is allocated and not much disturbance happens meanwhile.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论