Golang的sort.Sort函数输出是随机的,并且是错误的。

huangapple go评论87阅读模式
英文:

golang sort.Sort random output and is wrong

问题

我有一个应用于结构体的自定义排序函数。完整的代码在这里的play.golang.org上

type Stmt struct {
    Name  string
    After []string
}

func sortStmts(stmts []Stmt) []Stmt {
    sort.Sort(ByAfter(stmts))
    return stmts
}

type ByAfter []Stmt

func (a ByAfter) Len() int      { return len(a) }
func (a ByAfter) Swap(i, j int) { a[i], a[j] = a[j], a[i] }
func (a ByAfter) Less(i, j int) bool {
    isLess := true

    //fmt.Printf("%s.%+v is being compared with %s.%+v\n", a[i].Name, a[i].After, a[j].Name, a[j].After)

    for _, v := range a[i].After {
        if a[j].Name == v {
            isLess = false
            break
        }
    }

    if isLess {
        //fmt.Printf("%s.%+v is Before %s.%+v\n", a[i].Name, a[i].After, a[j].Name, a[j].After)
    } else {
        //fmt.Printf("%s.%+v is After %s.%+v\n", a[i].Name, a[i].After, a[j].Name, a[j].After)
    }

    return isLess
}

我的目的是自动创建一组正确排序的SQL创建语句,以便依赖表首先出现。

因此,如果存在Stmt{Name: "user_role", After: []string{"user", "role"}},那么在排序后的列表中,user_role应该在userrole之后。

这似乎在一开始运行良好,直到我们开始添加更多的值。只有在我进去检查后,我才意识到我可能只是第一次偶然得到了幸运,实际上并没有任何一致性。

在排序函数中是否有什么错误导致结果是随机的?我特别想知道为什么"role"项没有在"user_role"项之前,尽管我已经指定"user_role"在"role"之后。

英文:

I have a custom Sort function applied to a struct. The full code is here on play.golang.org.

type Stmt struct {
	Name  string
	After []string
}

func sortStmts(stmts []Stmt) []Stmt {
	sort.Sort(ByAfter(stmts))
	return stmts
}

type ByAfter []Stmt

func (a ByAfter) Len() int      { return len(a) }
func (a ByAfter) Swap(i, j int) { a[i], a[j] = a[j], a[i] }
func (a ByAfter) Less(i, j int) bool {
	isLess := true

	//fmt.Printf("%s.%+v is being compared with %s.%+v\n", a[i].Name, a[i].After, a[j].Name, a[j].After)

	for _, v := range a[i].After {
		if a[j].Name == v {
			isLess = false
			break
		}
	}

	if isLess {
		//fmt.Printf("%s.%+v is Before %s.%+v\n", a[i].Name, a[i].After, a[j].Name, a[j].After)
	} else {
		//fmt.Printf("%s.%+v is After %s.%+v\n", a[i].Name, a[i].After, a[j].Name, a[j].After)
	}

	return isLess
}

My purpose is to automatically create a set of sql create statements ordered properly so that dependent tables come first.

So if Stmt{Name: "user_role", After: []string{"user", "role"} } is present, then in the ordered list user_role should be after both user and role.

This seemed to work fine for a bit until we started adding more values. Only then did I go in and check and realize that I might have just got accidentally lucky the first time, but there really isn't any consistency.

Is there something I'm doing wrong in the sort function that the results are random. I'm particularly interested in seeing why the "role" item is not coming before the "user_role" item even though I've designated it user_role as coming after role.

答案1

得分: 7

你的“Less”函数不是传递的。也就是说,如果A < B且B < C,则必须满足A < C。

你不能使用常规的排序函数来指定一个偏序,并获得排序后的输出。相反,你需要实现一个拓扑排序。

以下是一个简单的实现示例(除了我删除了重复的“passwords”条目)。

package main

import "fmt"

type Stmt struct {
    Name  string
    After []string
}

func topSort(ss []Stmt) []string {
    after := map[string][]string{} // things that must come after
    counts := map[string]int{}     // number unsatified preconditions
    zc := map[string]struct{}{}    // things with zero count
    for _, s := range ss {
        for _, a := range s.After {
            after[a] = append(after[a], s.Name)
        }
        counts[s.Name] = len(s.After)
        if len(s.After) == 0 {
            zc[s.Name] = struct{}{}
        }
    }

    r := []string{}
    for len(zc) > 0 {
        for n := range zc {
            r = append(r, n)
            for _, a := range after[n] {
                counts[a]--
                if counts[a] == 0 {
                    zc[a] = struct{}{}
                }
            }
            delete(zc, n)
        }
    }
    return r
}

func main() {
    stmts := []Stmt{
        {Name: "app", After: []string{"app_user"}},
        {Name: "billingplan", After: []string{}},
        {Name: "campaign", After: []string{"app_user"}},
        {Name: "campaign_app", After: []string{"campaign", "app"}},
        {Name: "campaign_ip", After: []string{"campaign", "ip"}},
        {Name: "campaign_operator", After: []string{"campaign", "operator"}},
        {Name: "campaign_sponsor", After: []string{"campaign", "sponsor"}},
        {Name: "campaign_subscriberfilter", After: []string{"campaign", "subscriber_filters"}},
        {Name: "campaign_url", After: []string{"campaign", "url"}},
        {Name: "contentpartner", After: []string{"app_user"}},
        {Name: "filter_criteria", After: []string{"campaign", "subscriber_filters"}},
        {Name: "ip", After: []string{"app_user"}},
        {Name: "mobile_registered", After: []string{"campaign", "app"}},
        {Name: "operator", After: []string{}},
        {Name: "passwords", After: []string{"app_user"}},
        {Name: "publish_package", After: []string{}},
        {Name: "role", After: []string{}},
        {Name: "sponsor", After: []string{"app_user"}},
        {Name: "subscriber_dbs", After: []string{}},
        {Name: "subscriber_filters", After: []string{"subscriber_dbs"}},
        {Name: "timezone", After: []string{}},
        {Name: "url", After: []string{"app_user"}},
        {Name: "app_user", After: []string{}},
        {Name: "user_role", After: []string{"app_user", "role"}},
    }
    r := topSort(stmts)
    for _, s := range r {
        fmt.Println(s)
    }
}

希望对你有帮助!

英文:

Your "Less" function is not transitive. That is, if A < B and B < C, then it must also hold that A < C.

You can't specify a partial order with a regular sort function and get sorted output. Instead, you need to implement a topological sort.

Here's a simple implementation on your data (except that I removed the duplicate "passwords" entry).

package main
import &quot;fmt&quot;
type Stmt struct {
Name  string
After []string
}
func topSort(ss []Stmt) []string {
after := map[string][]string{} // things that must come after
counts := map[string]int{}     // number unsatified preconditions
zc := map[string]struct{}{}    // things with zero count
for _, s := range ss {
for _, a := range s.After {
after[a] = append(after[a], s.Name)
}
counts[s.Name] = len(s.After)
if len(s.After) == 0 {
zc[s.Name] = struct{}{}
}
}
r := []string{}
for len(zc) &gt; 0 {
for n := range zc {
r = append(r, n)
for _, a := range after[n] {
counts[a]--
if counts[a] == 0 {
zc[a] = struct{}{}
}
}
delete(zc, n)
}
}
return r
}
func main() {
stmts := []Stmt{
{Name: &quot;app&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;billingplan&quot;, After: []string{}},
{Name: &quot;campaign&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;campaign_app&quot;, After: []string{&quot;campaign&quot;, &quot;app&quot;}},
{Name: &quot;campaign_ip&quot;, After: []string{&quot;campaign&quot;, &quot;ip&quot;}},
{Name: &quot;campaign_operator&quot;, After: []string{&quot;campaign&quot;, &quot;operator&quot;}},
{Name: &quot;campaign_sponsor&quot;, After: []string{&quot;campaign&quot;, &quot;sponsor&quot;}},
{Name: &quot;campaign_subscriberfilter&quot;, After: []string{&quot;campaign&quot;, &quot;subscriber_filters&quot;}},
{Name: &quot;campaign_url&quot;, After: []string{&quot;campaign&quot;, &quot;url&quot;}},
{Name: &quot;contentpartner&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;filter_criteria&quot;, After: []string{&quot;campaign&quot;, &quot;subscriber_filters&quot;}},
{Name: &quot;ip&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;mobile_registered&quot;, After: []string{&quot;campaign&quot;, &quot;app&quot;}},
{Name: &quot;operator&quot;, After: []string{}},
{Name: &quot;passwords&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;publish_package&quot;, After: []string{}},
{Name: &quot;role&quot;, After: []string{}},
{Name: &quot;sponsor&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;subscriber_dbs&quot;, After: []string{}},
{Name: &quot;subscriber_filters&quot;, After: []string{&quot;subscriber_dbs&quot;}},
{Name: &quot;timezone&quot;, After: []string{}},
{Name: &quot;url&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;app_user&quot;, After: []string{}},
{Name: &quot;user_role&quot;, After: []string{&quot;app_user&quot;, &quot;role&quot;}},
}
r := topSort(stmts)
for _, s := range r {
fmt.Println(s)
}
}

答案2

得分: 1

根据匿名用户提到的,你需要进行拓扑排序。Tarjan的强连通分量算法具有返回逆拓扑排序顺序的特性。这意味着它可以用作拓扑排序算法。

以下是Tarjan算法的实现(可在此处运行,最初由我在golang-nuts列表上发布),基于维基百科上的伪代码(更一般地实现在此处,但基本上使用相同的底层代码):

package main

import (
	"fmt"
	"log"
)

type Stmt struct {
	Name  string
	After []string
}

func main() {
	stmts := []Stmt{
		{Name: "app", After: []string{"app_user"}},
		{Name: "billingplan", After: []string{}},
		{Name: "campaign", After: []string{"app_user"}},
		{Name: "campaign_app", After: []string{"campaign", "app"}},
		{Name: "campaign_ip", After: []string{"campaign", "ip"}},
		{Name: "campaign_operator", After: []string{"campaign", "operator"}},
		{Name: "campaign_sponsor", After: []string{"campaign", "sponsor"}},
		{Name: "campaign_subscriberfilter", After: []string{"campaign", "subscriber_filters"}},
		{Name: "campaign_url", After: []string{"campaign", "url"}},
		{Name: "contentpartner", After: []string{"app_user"}},
		{Name: "filter_criteria", After: []string{"campaign", "subscriber_filters"}},
		{Name: "ip", After: []string{"app_user"}},
		{Name: "mobile_registered", After: []string{"campaign", "app"}},
		{Name: "operator", After: []string{}},
		{Name: "passwords", After: []string{"app_user"}},
		{Name: "publish_package", After: []string{}},
		{Name: "role", After: []string{}},
		{Name: "passwords", After: []string{"app_user"}},
		{Name: "sponsor", After: []string{"app_user"}},
		{Name: "subscriber_dbs", After: []string{}},
		{Name: "subscriber_filters", After: []string{"subscriber_dbs"}},
		{Name: "timezone", After: []string{}},
		{Name: "url", After: []string{"app_user"}},
		{Name: "app_user", After: []string{}},
		{Name: "user_role", After: []string{"app_user", "role"}},
	}

	g := make(graph)
	for _, s := range stmts {
		g[s.Name] = after(s.After)
	}

	sorted, err := topoSort(g)
	if err != nil {
		log.Fatalf("could not sort: %v", err)
	}
	for _, s := range sorted {
		fmt.Println(s)
	}
}

func topoSort(g graph) ([]string, error) {
	sccs := tarjanSCC(g)
	sorted := make([]string, len(sccs))
	for i, s := range sccs {
		if len(s) != 1 {
			return nil, fmt.Errorf("found directed cycle: %q", s)
		}
		sorted[i] = s[0]
	}
	return sorted, nil
}

// graph is an edge list representation of a directed graph.
type graph map[string]set

// set is an string set.
type set map[string]struct{}

func after(i []string) set {
	if len(i) == 0 {
		return nil
	}
	s := make(set)
	for _, v := range i {
		s[v] = struct{}{}
	}
	return s
}

// tarjanSCC returns a the strongly connected components of the
// directed graph g.
func tarjanSCC(g graph) [][]string {
	t := tarjan{
		g: g,

		indexTable: make(map[string]int, len(g)),
		lowLink:    make(map[string]int, len(g)),
		onStack:    make(map[string]bool, len(g)),
	}
	for v := range t.g {
		if t.indexTable[v] == 0 {
			t.strongconnect(v)
		}
	}
	return t.sccs
}

// tarjan implements Tarjan's strongly connected component finding
// algorithm. The implementation is from the pseudocode at
//
// http://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm
//
type tarjan struct {
	g graph

	index      int
	indexTable map[string]int
	lowLink    map[string]int
	onStack    map[string]bool

	stack []string

	sccs [][]string
}

// strongconnect is the strongconnect function described in the
// wikipedia article.
func (t *tarjan) strongconnect(v string) {
	// Set the depth index for v to the smallest unused index.
	t.index++
	t.indexTable[v] = t.index
	t.lowLink[v] = t.index
	t.stack = append(t.stack, v)
	t.onStack[v] = true

	// Consider successors of v.
	for w := range t.g[v] {
		if t.indexTable[w] == 0 {
			// Successor w has not yet been visited; recur on it.
			t.strongconnect(w)
			t.lowLink[v] = min(t.lowLink[v], t.lowLink[w])
		} else if t.onStack[w] {
			// Successor w is in stack s and hence in the current SCC.
			t.lowLink[v] = min(t.lowLink[v], t.indexTable[w])
		}
	}

	// If v is a root node, pop the stack and generate an SCC.
	if t.lowLink[v] == t.indexTable[v] {
		// Start a new strongly connected component.
		var (
			scc []string
			w   string
		)
		for {
			w, t.stack = t.stack[len(t.stack)-1], t.stack[:len(t.stack)-1]
			t.onStack[w] = false
			// Add w to current strongly connected component.
			scc = append(scc, w)
			if w == v {
				break
			}
		}
		// Output the current strongly connected component.
		t.sccs = append(t.sccs, scc)
	}
}

func min(a, b int) int {
	if a < b {
		return a
	}
	return b
}

请注意,重复运行此代码将不会产生相同的严格输出顺序,因为一些路径在相互之间没有明确的可排序性(这在playground中不会显示出来,因为结果被缓存了-你可以通过将调用tarjanSCC包装起来来看到这一点)。

虽然直接实现拓扑排序可能更容易,但通过使用Tarjan的强连通分量算法,我们能够找到排序失败的原因,例如这里(与相同的数据这里进行对比)。

英文:

As mentioned by Anonymous, you need a topological sort for this. Tarjan's strongly connected components algorithm has the property that SCCs are returned in reverse topological sort order. This means it can be used as a topological sort algorithm.

Here is an implementation of Tarjan's algorithm (runnable here and originally posted by me to the golang-nuts list) based on the pseudocode at wikipedia (more generally implemented here but using essentially the same underlying code):

package main
import (
&quot;fmt&quot;
&quot;log&quot;
)
type Stmt struct {
Name  string
After []string
}
func main() {
stmts := []Stmt{
{Name: &quot;app&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;billingplan&quot;, After: []string{}},
{Name: &quot;campaign&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;campaign_app&quot;, After: []string{&quot;campaign&quot;, &quot;app&quot;}},
{Name: &quot;campaign_ip&quot;, After: []string{&quot;campaign&quot;, &quot;ip&quot;}},
{Name: &quot;campaign_operator&quot;, After: []string{&quot;campaign&quot;, &quot;operator&quot;}},
{Name: &quot;campaign_sponsor&quot;, After: []string{&quot;campaign&quot;, &quot;sponsor&quot;}},
{Name: &quot;campaign_subscriberfilter&quot;, After: []string{&quot;campaign&quot;, &quot;subscriber_filters&quot;}},
{Name: &quot;campaign_url&quot;, After: []string{&quot;campaign&quot;, &quot;url&quot;}},
{Name: &quot;contentpartner&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;filter_criteria&quot;, After: []string{&quot;campaign&quot;, &quot;subscriber_filters&quot;}},
{Name: &quot;ip&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;mobile_registered&quot;, After: []string{&quot;campaign&quot;, &quot;app&quot;}},
{Name: &quot;operator&quot;, After: []string{}},
{Name: &quot;passwords&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;publish_package&quot;, After: []string{}},
{Name: &quot;role&quot;, After: []string{}},
{Name: &quot;passwords&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;sponsor&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;subscriber_dbs&quot;, After: []string{}},
{Name: &quot;subscriber_filters&quot;, After: []string{&quot;subscriber_dbs&quot;}},
{Name: &quot;timezone&quot;, After: []string{}},
{Name: &quot;url&quot;, After: []string{&quot;app_user&quot;}},
{Name: &quot;app_user&quot;, After: []string{}},
{Name: &quot;user_role&quot;, After: []string{&quot;app_user&quot;, &quot;role&quot;}},
}
g := make(graph)
for _, s := range stmts {
g[s.Name] = after(s.After)
}
sorted, err := topoSort(g)
if err != nil {
log.Fatalf(&quot;could not sort: %v&quot;, err)
}
for _, s := range sorted {
fmt.Println(s)
}
}
func topoSort(g graph) ([]string, error) {
sccs := tarjanSCC(g)
sorted := make([]string, len(sccs))
for i, s := range sccs {
if len(s) != 1 {
return nil, fmt.Errorf(&quot;found directed cycle: %q&quot;, s)
}
sorted[i] = s[0]
}
return sorted, nil
}
// graph is an edge list representation of a directed graph.
type graph map[string]set
// set is an string set.
type set map[string]struct{}
func after(i []string) set {
if len(i) == 0 {
return nil
}
s := make(set)
for _, v := range i {
s[v] = struct{}{}
}
return s
}
// tarjanSCC returns a the strongly connected components of the
// directed graph g.
func tarjanSCC(g graph) [][]string {
t := tarjan{
g: g,
indexTable: make(map[string]int, len(g)),
lowLink:    make(map[string]int, len(g)),
onStack:    make(map[string]bool, len(g)),
}
for v := range t.g {
if t.indexTable[v] == 0 {
t.strongconnect(v)
}
}
return t.sccs
}
// tarjan implements Tarjan&#39;s strongly connected component finding
// algorithm. The implementation is from the pseudocode at
//
// http://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm
//
type tarjan struct {
g graph
index      int
indexTable map[string]int
lowLink    map[string]int
onStack    map[string]bool
stack []string
sccs [][]string
}
// strongconnect is the strongconnect function described in the
// wikipedia article.
func (t *tarjan) strongconnect(v string) {
// Set the depth index for v to the smallest unused index.
t.index++
t.indexTable[v] = t.index
t.lowLink[v] = t.index
t.stack = append(t.stack, v)
t.onStack[v] = true
// Consider successors of v.
for w := range t.g[v] {
if t.indexTable[w] == 0 {
// Successor w has not yet been visited; recur on it.
t.strongconnect(w)
t.lowLink[v] = min(t.lowLink[v], t.lowLink[w])
} else if t.onStack[w] {
// Successor w is in stack s and hence in the current SCC.
t.lowLink[v] = min(t.lowLink[v], t.indexTable[w])
}
}
// If v is a root node, pop the stack and generate an SCC.
if t.lowLink[v] == t.indexTable[v] {
// Start a new strongly connected component.
var (
scc []string
w   string
)
for {
w, t.stack = t.stack[len(t.stack)-1], t.stack[:len(t.stack)-1]
t.onStack[w] = false
// Add w to current strongly connected component.
scc = append(scc, w)
if w == v {
break
}
}
// Output the current strongly connected component.
t.sccs = append(t.sccs, scc)
}
}
func min(a, b int) int {
if a &lt; b {
return a
}
return b
}

Note that running this code repeatedly will not result in the same strict ordering of output since a number of the paths are not definitively orderable relative to each other (this won't show up in the playground since the results are cached - you can see this though by wrapping the call to tarjanSCC).

Although it may be easier to directly implement a topological sort, by using Tarjan's SCC algorithm, we are able to find the cause of a sort failure, for example here (cf with the same data here).

答案3

得分: 0

这个问题在这里得到了回答:https://groups.google.com/forum/#!topic/golang-nuts/C_7JY1f3cSc。这是一个更详细和具体的答案,我能够立即使用。我请求那个人在这里更新,但他/她没有...所以我自己放上去了。


由于我有一个tarjan。这是该数据的拓扑排序:

http://play.golang.org/p/SHagFMvuhl

在2015-03-12的星期四,Sathish VJ写道:
> 在我的情况下,我确实检查了依赖关系中的传递性。
>
> 我注意到一个具体的事情是,当我打印出调试语句时,有些比较从未发生过。例如,user_role从现在开始从未与role进行比较。尽管在某个时候,当元素较少时,它是这样的。

sort.Sort不会进行所有的比较。那样会导致O(n^2)的时间复杂度。

英文:

This question was answered here: https://groups.google.com/forum/#!topic/golang-nuts/C_7JY1f3cSc. It was a more detailed and specific answer that I was able to immediately use. Requested the person to update it here but he/she hasn't ... so putting it myself.


Since I had a tarjan lying around. Here is the topological sort for that
data:

http://play.golang.org/p/SHagFMvuhl

On Thu, 2015-03-12 at 22:47 -0700, Sathish VJ wrote:
> In my case, I did check and there is that transitive nature in the
> dependencies.
>
> One specific thing I'm noting when I print out the debug statements is
> that some comparisons are never happening. For example, user_role is
> never getting compared to role as of now. Though at one point when
> there were lesser elements, it was.

sort.Sort will not make all comparisons. That would result in O(n^2)
time.

huangapple
  • 本文由 发表于 2015年3月13日 13:28:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/29025460.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定