英文:
Golang custom sort is faster than native sort
问题
我刚刚在使用Go语言进行排序时偶然发现了一个在stackoverflow上的qsort函数。它似乎比Go语言中的原生排序函数运行速度快大约两倍。我已经尝试了不同的输入大小,并测试了它的有效性。
有人可以解释一下为什么会出现这种情况吗?
以下是你可以在你的电脑上测试的代码:
package main
import (
"fmt"
"math/rand"
"sort"
"time"
)
func qsort(a []int) []int {
if len(a) < 2 {
return a
}
left, right := 0, len(a)-1
// 选择一个基准点
pivotIndex := rand.Int() % len(a)
// 将基准点移到数组末尾
a[pivotIndex], a[right] = a[right], a[pivotIndex]
// 将小于基准点的元素放在左边
for i := range a {
if a[i] < a[right] {
a[i], a[left] = a[left], a[i]
left++
}
}
// 将基准点放在最后一个较小元素之后
a[left], a[right] = a[right], a[left]
// 递归处理左右两边的子数组
qsort(a[:left])
qsort(a[left+1:])
return a
}
func main() {
// 创建一个包含随机整数的数组
rand.Seed(30)
size := 1000000
array1 := make([]int, size)
start := time.Now()
for i, _ := range array1 {
array1[i] = rand.Int()
}
fmt.Println("创建包含", size, "个元素的数组...")
fmt.Println("--- ", time.Since(start), " ---")
// 创建未排序数组的副本
array2 := make([]int, size)
copy(array2, array1)
// 使用原生函数进行排序
start = time.Now()
sort.Ints(array1)
fmt.Println("使用原生排序函数进行排序...")
fmt.Println("--- ", time.Since(start), " ---")
// 使用自定义的qsort函数进行排序
start = time.Now()
qsort(array2)
fmt.Println("使用自定义qsort函数进行排序...")
fmt.Println("--- ", time.Since(start), " ---")
}
希望对你有帮助!
英文:
I was just playing around with sorting in golang and I found a qsort function on stackoverflow. It seems to run about twice as fast as the native sort function in golang. I've tried it with different input sizes and tested that it works.
Could anyone explain why this happens?
Here is the code you can test it on your pc:
package main
import (
"fmt"
"math/rand"
"sort"
"time"
)
func qsort(a []int) []int {
if len(a) < 2 {
return a
}
left, right := 0, len(a)-1
// Pick a pivot
pivotIndex := rand.Int() % len(a)
// Move the pivot to the right
a[pivotIndex], a[right] = a[right], a[pivotIndex]
// Pile elements smaller than the pivot on the left
for i := range a {
if a[i] < a[right] {
a[i], a[left] = a[left], a[i]
left++
}
}
// Place the pivot after the last smaller element
a[left], a[right] = a[right], a[left]
// Go down the rabbit hole
qsort(a[:left])
qsort(a[left+1:])
return a
}
func main() {
// Create an array with random integers
rand.Seed(30)
size := 1000000
array1 := make([]int, size)
start := time.Now()
for i, _ := range array1 {
array1[i] = rand.Int()
}
fmt.Println("Creating array with ", size, " elements...")
fmt.Println("--- ", time.Since(start), " ---")
// Create a copy of the unsorted array
array2 := make([]int, size)
copy(array2, array1)
// Short using native function
start = time.Now()
sort.Ints(array1)
fmt.Println("Sorting with the native sort...")
fmt.Println("--- ", time.Since(start), " ---")
// Sort using custom qsort
start = time.Now()
qsort(array2)
fmt.Println("Sorting with custom qsort...")
fmt.Println("--- ", time.Since(start), " ---")
}
答案1
得分: 14
差异似乎主要是因为你的快速排序使用了内置函数。它使用了切片和len
函数。请记住,sort.Sort
接受一个sort.Interface
。所以每次调用len
函数时,它都会调用slice.Len
,每次执行array[i],array[j] = array[j],array[i]
时,它都必须调用Swap(i,j)
。
我编写了一个类似的版本,可以在任意的qsort.Interface
上工作:
func Qsort(a Interface, prng *rand.Rand) Interface {
if a.Len() < 2 {
return a
}
left, right := 0, a.Len()-1
// 选择一个枢轴
pivotIndex := prng.Int() % a.Len()
// 将枢轴移到右边
a.Swap(pivotIndex, right)
// 将小于枢轴的元素堆叠到左边
for i := 0; i < a.Len(); i++ {
if a.Less(i, right) {
a.Swap(i, left)
left++
}
}
// 将枢轴放在最后一个较小元素之后
a.Swap(left, right)
// 进入递归
leftSide, rightSide := a.Partition(left)
Qsort(leftSide, prng)
Qsort(rightSide, prng)
return a
}
然后我使用了Go的基准测试功能(在可能的情况下,你应该始终使用它进行基准测试)。
为了参考和透明度,qsort.Interface
被定义为:
type Interface interface {
sort.Interface
// Partition返回slice[:i]和slice[i+1:]
// 这些应该引用原始内存
// 因为这是原地排序
Partition(i int) (left Interface, right Interface)
}
qsort
的实际IntSlice
实现如下:
type IntSlice []int
func (is IntSlice) Less(i, j int) bool {
return is[i] < is[j]
}
func (is IntSlice) Swap(i, j int) {
is[i], is[j] = is[j], is[i]
}
func (is IntSlice) Len() int {
return len(is)
}
func (is IntSlice) Partition(i int) (left Interface, right Interface) {
return IntSlice(is[:i]), IntSlice(is[i+1:])
}
最后,这是qsort_test.go
文件的内容:
package qsort_test
import (
"math/rand"
"qsort"
"sort"
"testing"
"time"
)
const size int = 1000000
var list = make([]int, size)
var prng = rand.New(rand.NewSource(int64(time.Now().Nanosecond())))
func BenchmarkQsort(b *testing.B) {
for n := 0; n < b.N; n++ {
b.StopTimer()
for i := range list {
list[i] = prng.Int()
}
b.StartTimer()
qsort.Qsort(qsort.IntSlice(list), prng)
}
}
func BenchmarkNativeQsort(b *testing.B) {
for n := 0; n < b.N; n++ {
b.StopTimer()
for i := range list {
list[i] = prng.Int()
}
b.StartTimer()
qsort.NativeQsort(list, prng)
}
}
func BenchmarkSort(b *testing.B) {
for n := 0; n < b.N; n++ {
b.StopTimer()
for i := range list {
list[i] = prng.Int()
}
b.StartTimer()
sort.Sort(sort.IntSlice(list))
}
}
结果如下(格式由我调整):
PASS
BenchmarkQsort 5 513629360 ns/op
BenchmarkNativeQsort 10 160609180 ns/op
BenchmarkSort 5 292416760 ns/op
如你所见,标准库的排序在随机数据的平均情况下远远优于你的qsort。NativeQsort
指的是你在实际问题中发布的qsort
函数,它优于两者。唯一改变的是我将内置函数替换为qsort.Interface
。因此,泛型很可能是一个较慢的原因。
编辑:由于排序的开销很大,样本数量不多,所以以下是使用-benchtime 10s
得到的结果,以获得稍微更具代表性的结果。
BenchmarkQsort 50 524389994 ns/op
BenchmarkNativeQsort 100 161199217 ns/op
BenchmarkSort 50 302037284 ns/op
英文:
The difference seems to largely be due to the fact that your Quicksort uses builtins. It slices and uses len
. Keep in mind that sort.Sort
takes in a sort.Interface
. So every time you call len
it calls slice.Len
and every time you do array[i],array[j] = array[j],array[i]
it has to call Swap(i,j)
.
I wrote a comparable version that works on an arbitrary qsort.Interface
:
func Qsort(a Interface, prng *rand.Rand) Interface {
if a.Len() < 2 {
return a
}
left, right := 0, a.Len()-1
// Pick a pivot
pivotIndex := prng.Int() % a.Len()
// Move the pivot to the right
a.Swap(pivotIndex, right)
// Pile elements smaller than the pivot on the left
for i := 0; i < a.Len(); i++ {
if a.Less(i, right) {
a.Swap(i, left)
left++
}
}
// Place the pivot after the last smaller element
a.Swap(left, right)
// Go down the rabbit hole
leftSide, rightSide := a.Partition(left)
Qsort(leftSide, prng)
Qsort(rightSide, prng)
return a
}
Then I used Go's benchmark functionality (which you should always use for Benchmarks where possible).
For reference and transparency, qsort.Interface
is defined as:
type Interface interface {
sort.Interface
// Partition returns slice[:i] and slice[i+1:]
// These should references the original memory
// since this does an in-place sort
Partition(i int) (left Interface, right Interface)
}
The actual IntSlice
implementation for qsort
is:
type IntSlice []int
func (is IntSlice) Less(i, j int) bool {
return is[i] < is[j]
}
func (is IntSlice) Swap(i, j int) {
is[i], is[j] = is[j], is[i]
}
func (is IntSlice) Len() int {
return len(is)
}
func (is IntSlice) Partition(i int) (left Interface, right Interface) {
return IntSlice(is[:i]), IntSlice(is[i+1:])
}
Finally, here's the qsort_test.go
file:
package qsort_test
import (
"math/rand"
"qsort"
"sort"
"testing"
"time"
)
const size int = 1000000
var list = make([]int, size)
var prng = rand.New(rand.NewSource(int64(time.Now().Nanosecond())))
func BenchmarkQsort(b *testing.B) {
for n := 0; n < b.N; n++ {
b.StopTimer()
for i := range list {
list[i] = prng.Int()
}
b.StartTimer()
qsort.Qsort(qsort.IntSlice(list), prng)
}
}
func BenchmarkNativeQsort(b *testing.B) {
for n := 0; n < b.N; n++ {
b.StopTimer()
for i := range list {
list[i] = prng.Int()
}
b.StartTimer()
qsort.NativeQsort(list, prng)
}
}
func BenchmarkSort(b *testing.B) {
for n := 0; n < b.N; n++ {
b.StopTimer()
for i := range list {
list[i] = prng.Int()
}
b.StartTimer()
sort.Sort(sort.IntSlice(list))
}
}
The results (formatting mine):
PASS
BenchmarkQsort 5 513629360 ns/op
BenchmarkNativeQsort 10 160609180 ns/op
BenchmarkSort 5 292416760 ns/op
As you can see, the standard library's sort massively outperforms your qsort on average with random data. NativeQsort
refers to the qsort
functions you posted in your actual question, and it outperforms both. The only thing that's changed between that and Qsort
is that I swapped the builtin functions for qsort.Interface
. It follows, then, that genericity is likely the reason one is slower than the other.
Edit: There aren't many samples because of how expensive sorting is, so here are the results with -benchtime 10s
just for slightly more representative results.
BenchmarkQsort 50 524389994 ns/op
BenchmarkNativeQsort 100 161199217 ns/op
BenchmarkSort 50 302037284 ns/op
答案2
得分: 1
似乎它的运行速度大约是golang中本地排序函数的两倍。
请注意,Go 1.19(2022年第四季度)将改进切片的本地排序功能。
参见:
排序:使用pdqsort
- 在所有基准测试中,pdqsort的速度都不会明显慢于之前的算法。
- 在常见模式中,pdqsort通常更快(例如,在已排序的切片中快10倍)。
pdqsort
的描述在Pattern-defeating Quicksort (pdf)中,由**Orson R. L. Peters**提供。
(摘录)
Pattern-defeating quicksort通常是小到中等输入大小或数据类型大小的最佳算法选择。
它和其他快速排序变体在数据集太大无法放入缓存时表现不佳,而is4o算法则在这方面表现出色。
然而,后者在较小的大小上性能较差,未来的研究可能会将这两种算法的优点结合起来。
这个CL的灵感来自C++实现和Rust实现。
英文:
> It seems to run about twice as fast as the native sort function in golang
Note that the native sort for slice will evolve with Go 1.19 (Q4 2022).
See:
> ## sort: use pdqsort
>
>- Across all benchmarks, pdqsort is never significantly slower than the previous algorithm.
>- In common patterns, pdqsort is often faster (i.e. 10x faster in sorted slices).
>
> The pdqsort
is described at Pattern-defeating Quicksort (pdf) by Orson R. L. Peters.
>
> (extract)
> Pattern-defeating
quicksort is often the best choice of algorithm overall for small to medium input sizes or data type sizes.
It and other quicksort variants suffer from datasets that
are too large to fit in cache, where is4o shines.
The latter algorithm however suffers from bad performance on smaller sizes, future research could perhaps combine the best of these two algorithms
>
> This CL is inspired by both C++ implementation and Rust implementation.
> - C++ implementation
> - Rust implementation
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论