快速排序明显较慢

huangapple go评论64阅读模式
英文:

Quick Sort significantly slower

问题

I understand that you'd like a translation of the code and an explanation for why Quick Sort (QS) is taking more time to sort a random array compared to other sorting algorithms. Let's start with the translation of the code:

// 生成介于low和high(含)之间的随机枢纽索引
int random_pivot(int low, int high) {
    srand(static_cast<unsigned int>(time(nullptr)));
    return low + rand() % (high - low + 1);
}

// 对数组进行分区并返回分区索引
int partition(int* arr, int low, int high) {
    int pivotIndex = random_pivot(low, high);
    int pivot = arr[pivotIndex];
    std::swap(arr[pivotIndex], arr[high]);

    int i = low - 1; // 较小元素的索引

    for (int j = low; j <= high - 1; j++) {
        // 如果当前元素小于等于枢纽
        if (arr[j] <= pivot) {
            i++; // 增加较小元素的索引
            std::swap(arr[i], arr[j]); // 交换当前元素与较小元素
        }
    }

    std::swap(arr[i + 1], arr[high]); // 交换枢纽与分区索引处的元素
    return i + 1; // 返回分区索引
}

// 使用QuickSort算法对数组进行排序
void quick_sort_helper(int* arr, int low, int high) {
    if (low < high) {
        int partition_index = partition(arr, low, high); // 分区数组并获取分区索引
        quick_sort_helper(arr, low, partition_index - 1); // 递归排序左子数组
        quick_sort_helper(arr, partition_index + 1, high); // 递归排序右子数组
    }
}

// 使用QuickSort算法对数组进行排序
void quick_sort(int* arr, int size) {
    quick_sort_helper(arr, 0, size - 1);
}

// 测量不同大小数组上排序算法的执行时间
void measure_sort(void (*sorting_function)(int*, int)) {
    int sizes[] = {10, 100, 1000, 10000, 100000}; // 数组的大小
    int const MAX = 100000;
    int const SMALL = 10;

    for (auto i = 0; i < 5; i++) {
        int* arr = new int[sizes[i]];
        for(auto j = 0; j < sizes[i]; j++) {
            arr[j] = rand() % MAX; // 用随机数填充数组
        }

        if (sizes[i] == SMALL) { // 在排序前打印原始数组
            std::cout << "\n[Original]: ";
            print_arr(arr, sizes[i]);
        }

        // 测量执行时间
        auto start = std::chrono::high_resolution_clock::now();
        sorting_function(arr, sizes[i]);
        auto end = std::chrono::high_resolution_clock::now();
        auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();

        if(sizes[i] == SMALL) {
            std::string const SPACE = "   "; // 用于对齐输出的宽度
            std::cout << std::setw(4) << "[Sorted]:" << SPACE;
            print_arr(arr, sizes[i]);
            std::cout << std::endl << std::endl;
        }

        int const SIZE_W = 9;
        int const TIME_W = 8;
        int const W = 6;
        std::cout << std::left << std::setw(SIZE_W) << "[size]: " << std::setw(W+1) << sizes[i] << std::left <<std::setw(TIME_W) << "[time]: " << std::setw(W) << duration << " [ms]" << std::endl;

        // 清理动态分配的内存
        delete[] arr;
    }
}

Now, let's discuss why Quick Sort might be taking more time than the other sorting algorithms, Heap Sort and Merge Sort.

Quick Sort is known for its average-case time complexity of O(n log n), which makes it very efficient on average. However, its performance can degrade in certain cases, such as when the pivot selection results in unbalanced partitions. In your code, you are using a random pivot, which can sometimes lead to unbalanced partitions.

Heap Sort and Merge Sort, on the other hand, have a consistent O(n log n) time complexity, and they do not depend on pivot selection like Quick Sort does. This makes them more predictable in terms of performance.

To further optimize Quick Sort, you might consider using a more sophisticated pivot selection strategy, such as the "median-of-three" pivot, which chooses the pivot as the median of three randomly selected elements. This can help mitigate the risk of unbalanced partitions and improve the average-case performance of Quick Sort.

In summary, Quick Sort's performance can vary depending on the pivot selection strategy, and in some cases, it may exhibit higher time complexity than Heap Sort and Merge Sort, which have more predictable performance characteristics.

英文:

I am working on my lab assignment which is all about sorting algorithms, Heap Sort, Merge Sort, and Quick Sort. I finished the assigment pretty much, but after taking time measurements for each algorithm I got suprising results.

[***** [Merge Sort] *****]
[Original]: [54599, 62697, 92032, 19179, 17296, 27068, 99563, 9829, 89929, 57140]
[Sorted]:   [9829, 17296, 19179, 27068, 54599, 57140, 62697, 89929, 92032, 99563]
[size]:  10     [time]: 2      [ms]
[size]:  100    [time]: 15     [ms]
[size]:  1000   [time]: 170    [ms]
[size]:  10000  [time]: 2122   [ms]
[size]:  100000 [time]: 22946  [ms]
[***** [Quick Sort] *****]
[Original]: [10017, 37607, 51285, 83517, 7500, 81469, 40379, 19721, 48524, 74062]
[Sorted]:   [7500, 10017, 19721, 37607, 40379, 48524, 51285, 74062, 81469, 83517]
[size]:  10     [time]: 24     [ms]
[size]:  100    [time]: 95     [ms]
[size]:  1000   [time]: 1001   [ms]
[size]:  10000  [time]: 9697   [ms]
[size]:  100000 [time]: 107627 [ms]
[***** [Heap Sort] *****]
[Original]: [62697, 92032, 19179, 17296, 27068, 99563, 9829, 89929, 57140, 33429]
[Sorted]:   [9829, 17296, 19179, 27068, 33429, 57140, 62697, 89929, 92032, 99563]
[size]:  10     [time]: 1      [ms]
[size]:  100    [time]: 14     [ms]
[size]:  1000   [time]: 239    [ms]
[size]:  10000  [time]: 3088   [ms]
[size]:  100000 [time]: 39615  [ms]

I know that all of these algorithms are supposed to run in O(nlogn) and are considered the "fastest" sorting algorithms out there, but the time measurements for Quick Sort are very different from Heap Sort and Merge Sort.

I am using a random pivot, since I read that QS is very in-efficient if all of the elements are sorted or if all of them are the same.

Here is my code for QS:

/**
 * @brief Generates a random pivot index between low and high (inclusive)
 * @param low Starting index of the array
 * @param high Ending index of the array
 * @return Random pivot index
 */
int random_pivot(int low, int high) {
    srand(static_cast&lt;unsigned int&gt;(time(nullptr)));
    return low + rand() % (high - low + 1);
}

/**
 * @brief Partitions the array and returns the partition index
 * @param arr The array to be partitioned
 * @param low Starting index of the partition
 * @param high Ending index of the partition
 * @return Partition index
 */
int partition(int* arr, int low, int high) {
    int pivotIndex = random_pivot(low, high);
    int pivot = arr[pivotIndex];
    std::swap(arr[pivotIndex], arr[high]);

    int i = low - 1; // Index of the smaller element

    for (int j = low; j &lt;= high - 1; j++) {
        // If current element is smaller than or equal to the pivot
        if (arr[j] &lt;= pivot) {
            i++; // Increment index of smaller element
            std::swap(arr[i], arr[j]); // Swap current element with the smaller element
        }
    }

    std::swap(arr[i + 1], arr[high]); // Swap the pivot with the element at the partition index
    return i + 1; // Return the partition index
}

/**
 * @brief Sorts an array using the QuickSort algorithm
 * @param arr The array to be sorted
 * @param low Starting index of the array
 * @param high Ending index of the array
 */
void quick_sort_helper(int* arr, int low, int high) {
    if (low &lt; high) {
        int partition_index = partition(arr, low, high); // partition the array and get the partition index
        quick_sort_helper(arr, low, partition_index - 1); // recursively sort the left subarray
        quick_sort_helper(arr, partition_index + 1, high); // recursively sort the right subarray
    }
}

/**
 * @brief Sorts an array using the QuickSort algorithm
 * @param arr The array to be sorted
 * @param size The size of the array
 */
void quick_sort(int* arr, int size) {
    quick_sort_helper(arr, 0, size - 1);
}

Code block for taking time measurements:

/**
 * @brief Measures the execution time of a sorting algorithm on arrays of different sizes.
 * @param sorting_function The sorting function to be measured.
 */
void measure_sort(void (*sorting_function)(int*, int)) {
  int sizes[] = {10, 100, 1000, 10000, 100000}; // sizes of the array
  int const MAX = 100000;
  int const SMALL = 10;

  for (auto i = 0; i &lt; 5; i++) {
      int* arr = new int[sizes[i]];
      for(auto j = 0; j &lt; sizes[i]; j++) { //fill array with random numbers
        arr[j] = rand() % MAX;
      }

      if (sizes[i] == SMALL) { //print og array before sorting
        std::cout &lt;&lt; &quot;\n[Original]: &quot;; // &lt;&lt; std::setw(2);
        print_arr(arr, sizes[i]);
      }

      // Measure execution time
      auto start = std::chrono::high_resolution_clock::now();
      sorting_function(arr, sizes[i]);
      auto end = std::chrono::high_resolution_clock::now();
      auto duration = std::chrono::duration_cast&lt;std::chrono::microseconds&gt;(end - start).count();

      if(sizes[i] == SMALL) {
        std::string const SPACE = &quot;   &quot;; //width const to align output
        std::cout &lt;&lt; std::setw(4) &lt;&lt; &quot;[Sorted]:&quot; &lt;&lt; SPACE;
        print_arr(arr, sizes[i]);
        std::cout &lt;&lt; std::endl &lt;&lt; std::endl;
      }

      int const SIZE_W = 9;
      int const TIME_W = 8;
      int const W = 6;
      std::cout &lt;&lt; std::left &lt;&lt; std::setw(SIZE_W) &lt;&lt; &quot;[size]: &quot; &lt;&lt; std::setw(W+1) &lt;&lt; sizes[i] &lt;&lt; std::left &lt;&lt;std::setw(TIME_W) &lt;&lt; &quot;[time]: &quot; &lt;&lt; std::setw(W) &lt;&lt; duration &lt;&lt; &quot; [ms]&quot; &lt;&lt; std::endl;

      // Clean up dynamically allocated memory
      delete[] arr;
  }
}

Could someone explain to me why QS is taking a lot more time to sort a random array than the other algorithms?

I reviewed this question and this and this but I still don't understand what's going on.

答案1

得分: 3

调用randtime来获取每个随机枢纽几乎肯定会影响您的quick_sort实现的性能。相对于您执行的其他操作,这些调用是昂贵的。

您是正确的,选择一个不恰当的枢纽对于有序数据来说可能是灾难性的。然而,我建议您使用“中值三数法”(中文链接)来选择枢纽,而不是随机枢纽(还可以参考这个答案)。

中值三数法将枢纽选择为分区的第一个、中间和最后一个元素的中值。这对于有序数据可以完美运行,并且甚至在随机数据上通过更均匀地分区数据来提高性能。

更新

您还可以查看我对另一个与排序相关的问题的答案。相关的代码可以在GitHub上找到。

英文:

Calling srand and time for every random pivot is almost certainly impacting the performance of your quick_sort implementation. Those calls are expensive relative to the other operations you are performing.

You are correct that a poor choice of pivot can be disastrous for sorted data. However, instead of a random pivot I would suggest the median of three strategy for selecting the pivot (also see this answer).

The median of three selects the pivot as the median of the first, middle and last element of the partition. This works flawlessly with sorted data and even improves performance on random data by more evenly partitioning the data on average.

Update

You may also want to look at my answer to another sort related question. The associated code can be found on GitHub.

huangapple
  • 本文由 发表于 2023年5月22日 05:03:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76301909.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定