英文:
Quick Sort significantly slower
问题
I understand that you'd like a translation of the code and an explanation for why Quick Sort (QS) is taking more time to sort a random array compared to other sorting algorithms. Let's start with the translation of the code:
// 生成介于low和high(含)之间的随机枢纽索引
int random_pivot(int low, int high) {
srand(static_cast<unsigned int>(time(nullptr)));
return low + rand() % (high - low + 1);
}
// 对数组进行分区并返回分区索引
int partition(int* arr, int low, int high) {
int pivotIndex = random_pivot(low, high);
int pivot = arr[pivotIndex];
std::swap(arr[pivotIndex], arr[high]);
int i = low - 1; // 较小元素的索引
for (int j = low; j <= high - 1; j++) {
// 如果当前元素小于等于枢纽
if (arr[j] <= pivot) {
i++; // 增加较小元素的索引
std::swap(arr[i], arr[j]); // 交换当前元素与较小元素
}
}
std::swap(arr[i + 1], arr[high]); // 交换枢纽与分区索引处的元素
return i + 1; // 返回分区索引
}
// 使用QuickSort算法对数组进行排序
void quick_sort_helper(int* arr, int low, int high) {
if (low < high) {
int partition_index = partition(arr, low, high); // 分区数组并获取分区索引
quick_sort_helper(arr, low, partition_index - 1); // 递归排序左子数组
quick_sort_helper(arr, partition_index + 1, high); // 递归排序右子数组
}
}
// 使用QuickSort算法对数组进行排序
void quick_sort(int* arr, int size) {
quick_sort_helper(arr, 0, size - 1);
}
// 测量不同大小数组上排序算法的执行时间
void measure_sort(void (*sorting_function)(int*, int)) {
int sizes[] = {10, 100, 1000, 10000, 100000}; // 数组的大小
int const MAX = 100000;
int const SMALL = 10;
for (auto i = 0; i < 5; i++) {
int* arr = new int[sizes[i]];
for(auto j = 0; j < sizes[i]; j++) {
arr[j] = rand() % MAX; // 用随机数填充数组
}
if (sizes[i] == SMALL) { // 在排序前打印原始数组
std::cout << "\n[Original]: ";
print_arr(arr, sizes[i]);
}
// 测量执行时间
auto start = std::chrono::high_resolution_clock::now();
sorting_function(arr, sizes[i]);
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
if(sizes[i] == SMALL) {
std::string const SPACE = " "; // 用于对齐输出的宽度
std::cout << std::setw(4) << "[Sorted]:" << SPACE;
print_arr(arr, sizes[i]);
std::cout << std::endl << std::endl;
}
int const SIZE_W = 9;
int const TIME_W = 8;
int const W = 6;
std::cout << std::left << std::setw(SIZE_W) << "[size]: " << std::setw(W+1) << sizes[i] << std::left <<std::setw(TIME_W) << "[time]: " << std::setw(W) << duration << " [ms]" << std::endl;
// 清理动态分配的内存
delete[] arr;
}
}
Now, let's discuss why Quick Sort might be taking more time than the other sorting algorithms, Heap Sort and Merge Sort.
Quick Sort is known for its average-case time complexity of O(n log n), which makes it very efficient on average. However, its performance can degrade in certain cases, such as when the pivot selection results in unbalanced partitions. In your code, you are using a random pivot, which can sometimes lead to unbalanced partitions.
Heap Sort and Merge Sort, on the other hand, have a consistent O(n log n) time complexity, and they do not depend on pivot selection like Quick Sort does. This makes them more predictable in terms of performance.
To further optimize Quick Sort, you might consider using a more sophisticated pivot selection strategy, such as the "median-of-three" pivot, which chooses the pivot as the median of three randomly selected elements. This can help mitigate the risk of unbalanced partitions and improve the average-case performance of Quick Sort.
In summary, Quick Sort's performance can vary depending on the pivot selection strategy, and in some cases, it may exhibit higher time complexity than Heap Sort and Merge Sort, which have more predictable performance characteristics.
英文:
I am working on my lab assignment which is all about sorting algorithms, Heap Sort
, Merge Sort
, and Quick Sort
. I finished the assigment pretty much, but after taking time measurements for each algorithm I got suprising results.
[***** [Merge Sort] *****]
[Original]: [54599, 62697, 92032, 19179, 17296, 27068, 99563, 9829, 89929, 57140]
[Sorted]: [9829, 17296, 19179, 27068, 54599, 57140, 62697, 89929, 92032, 99563]
[size]: 10 [time]: 2 [ms]
[size]: 100 [time]: 15 [ms]
[size]: 1000 [time]: 170 [ms]
[size]: 10000 [time]: 2122 [ms]
[size]: 100000 [time]: 22946 [ms]
[***** [Quick Sort] *****]
[Original]: [10017, 37607, 51285, 83517, 7500, 81469, 40379, 19721, 48524, 74062]
[Sorted]: [7500, 10017, 19721, 37607, 40379, 48524, 51285, 74062, 81469, 83517]
[size]: 10 [time]: 24 [ms]
[size]: 100 [time]: 95 [ms]
[size]: 1000 [time]: 1001 [ms]
[size]: 10000 [time]: 9697 [ms]
[size]: 100000 [time]: 107627 [ms]
[***** [Heap Sort] *****]
[Original]: [62697, 92032, 19179, 17296, 27068, 99563, 9829, 89929, 57140, 33429]
[Sorted]: [9829, 17296, 19179, 27068, 33429, 57140, 62697, 89929, 92032, 99563]
[size]: 10 [time]: 1 [ms]
[size]: 100 [time]: 14 [ms]
[size]: 1000 [time]: 239 [ms]
[size]: 10000 [time]: 3088 [ms]
[size]: 100000 [time]: 39615 [ms]
I know that all of these algorithms are supposed to run in O(nlogn)
and are considered the "fastest" sorting algorithms out there, but the time measurements for Quick Sort
are very different from Heap Sort
and Merge Sort
.
I am using a random pivot, since I read that QS
is very in-efficient if all of the elements are sorted or if all of them are the same.
Here is my code for QS:
/**
* @brief Generates a random pivot index between low and high (inclusive)
* @param low Starting index of the array
* @param high Ending index of the array
* @return Random pivot index
*/
int random_pivot(int low, int high) {
srand(static_cast<unsigned int>(time(nullptr)));
return low + rand() % (high - low + 1);
}
/**
* @brief Partitions the array and returns the partition index
* @param arr The array to be partitioned
* @param low Starting index of the partition
* @param high Ending index of the partition
* @return Partition index
*/
int partition(int* arr, int low, int high) {
int pivotIndex = random_pivot(low, high);
int pivot = arr[pivotIndex];
std::swap(arr[pivotIndex], arr[high]);
int i = low - 1; // Index of the smaller element
for (int j = low; j <= high - 1; j++) {
// If current element is smaller than or equal to the pivot
if (arr[j] <= pivot) {
i++; // Increment index of smaller element
std::swap(arr[i], arr[j]); // Swap current element with the smaller element
}
}
std::swap(arr[i + 1], arr[high]); // Swap the pivot with the element at the partition index
return i + 1; // Return the partition index
}
/**
* @brief Sorts an array using the QuickSort algorithm
* @param arr The array to be sorted
* @param low Starting index of the array
* @param high Ending index of the array
*/
void quick_sort_helper(int* arr, int low, int high) {
if (low < high) {
int partition_index = partition(arr, low, high); // partition the array and get the partition index
quick_sort_helper(arr, low, partition_index - 1); // recursively sort the left subarray
quick_sort_helper(arr, partition_index + 1, high); // recursively sort the right subarray
}
}
/**
* @brief Sorts an array using the QuickSort algorithm
* @param arr The array to be sorted
* @param size The size of the array
*/
void quick_sort(int* arr, int size) {
quick_sort_helper(arr, 0, size - 1);
}
Code block for taking time measurements:
/**
* @brief Measures the execution time of a sorting algorithm on arrays of different sizes.
* @param sorting_function The sorting function to be measured.
*/
void measure_sort(void (*sorting_function)(int*, int)) {
int sizes[] = {10, 100, 1000, 10000, 100000}; // sizes of the array
int const MAX = 100000;
int const SMALL = 10;
for (auto i = 0; i < 5; i++) {
int* arr = new int[sizes[i]];
for(auto j = 0; j < sizes[i]; j++) { //fill array with random numbers
arr[j] = rand() % MAX;
}
if (sizes[i] == SMALL) { //print og array before sorting
std::cout << "\n[Original]: "; // << std::setw(2);
print_arr(arr, sizes[i]);
}
// Measure execution time
auto start = std::chrono::high_resolution_clock::now();
sorting_function(arr, sizes[i]);
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
if(sizes[i] == SMALL) {
std::string const SPACE = " "; //width const to align output
std::cout << std::setw(4) << "[Sorted]:" << SPACE;
print_arr(arr, sizes[i]);
std::cout << std::endl << std::endl;
}
int const SIZE_W = 9;
int const TIME_W = 8;
int const W = 6;
std::cout << std::left << std::setw(SIZE_W) << "[size]: " << std::setw(W+1) << sizes[i] << std::left <<std::setw(TIME_W) << "[time]: " << std::setw(W) << duration << " [ms]" << std::endl;
// Clean up dynamically allocated memory
delete[] arr;
}
}
Could someone explain to me why QS
is taking a lot more time to sort a random array than the other algorithms?
I reviewed this question and this and this but I still don't understand what's going on.
答案1
得分: 3
调用rand
和time
来获取每个随机枢纽几乎肯定会影响您的quick_sort
实现的性能。相对于您执行的其他操作,这些调用是昂贵的。
您是正确的,选择一个不恰当的枢纽对于有序数据来说可能是灾难性的。然而,我建议您使用“中值三数法”(中文链接)来选择枢纽,而不是随机枢纽(还可以参考这个答案)。
中值三数法将枢纽选择为分区的第一个、中间和最后一个元素的中值。这对于有序数据可以完美运行,并且甚至在随机数据上通过更均匀地分区数据来提高性能。
更新
您还可以查看我对另一个与排序相关的问题的答案。相关的代码可以在GitHub上找到。
英文:
Calling srand
and time
for every random pivot is almost certainly impacting the performance of your quick_sort
implementation. Those calls are expensive relative to the other operations you are performing.
You are correct that a poor choice of pivot can be disastrous for sorted data. However, instead of a random pivot I would suggest the median of three strategy for selecting the pivot (also see this answer).
The median of three selects the pivot as the median of the first, middle and last element of the partition. This works flawlessly with sorted data and even improves performance on random data by more evenly partitioning the data on average.
Update
You may also want to look at my answer to another sort related question. The associated code can be found on GitHub.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论