2017年3月9日 16:22:33go评论90阅读模式

英文:

Simple solution for golang tour webcrawler exercise

问题

我是Go语言的新手，看到了一些关于这个练习的解决方案，但我觉得它们很复杂...

在我的解决方案中，一切都很简单，但是我遇到了死锁错误。我无法弄清楚如何正确关闭通道并停止主块内的循环。有没有简单的方法来做到这一点？

在Golang playground上的解决方案

感谢任何/所有提供帮助的人！

package main

import (
	"fmt"
	"sync"
)

type Fetcher interface {
	// Fetch returns the body of URL and
	// a slice of URLs found on that page.
	Fetch(url string) (body string, urls []string, err error)
}

type SafeCache struct {
	cache map[string]bool
	mux   sync.Mutex
}

func (c *SafeCache) Set(s string) {
	c.mux.Lock()
	c.cache[s] = true
	c.mux.Unlock()
}

func (c *SafeCache) Get(s string) bool {
	c.mux.Lock()
	defer c.mux.Unlock()
	return c.cache[s]
}

var (
	sc    = SafeCache{cache: make(map[string]bool)}
	errs  = make(chan error)
	ress  = make(chan string)
)

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
	if depth <= 0 {
		return
	}

	var (
		body string
		err  error
		urls []string
	)

	if ok := sc.Get(url); !ok {
		sc.Set(url)
		body, urls, err = fetcher.Fetch(url)
	} else {
		err = fmt.Errorf("Already fetched: %s", url)
	}

	if err != nil {
		errs <- err
		return
	}

	ress <- fmt.Sprintf("found: %s %q\n", url, body)
	for _, u := range urls {
		go Crawl(u, depth-1, fetcher)
	}
	return
}

func main() {
	go Crawl("http://golang.org/", 4, fetcher)
	for {
		select {
		case res, ok := <-ress:
			fmt.Println(res)
			if !ok {
				break
			}
		case err, ok := <-errs:
			fmt.Println(err)
			if !ok {
				break
			}
		}
	}
}

// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult

type fakeResult struct {
	body string
	urls []string
}

func (f fakeFetcher) Fetch(url string) (string, []string, error) {
	if res, ok := f[url]; ok {
		return res.body, res.urls, nil
	}
	return "", nil, fmt.Errorf("not found: %s", url)
}

// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
	"http://golang.org/": &fakeResult{
		"The Go Programming Language",
		[]string{
			"http://golang.org/pkg/",
			"http://golang.org/cmd/",
		},
	},
	"http://golang.org/pkg/": &fakeResult{
		"Packages",
		[]string{
			"http://golang.org/",
			"http://golang.org/cmd/",
			"http://golang.org/pkg/fmt/",
			"http://golang.org/pkg/os/",
		},
	},
	"http://golang.org/pkg/fmt/": &fakeResult{
		"Package fmt",
		[]string{
			"http://golang.org/",
			"http://golang.org/pkg/",
		},
	},
	"http://golang.org/pkg/os/": &fakeResult{
		"Package os",
		[]string{
			"http://golang.org/",
			"http://golang.org/pkg/",
		},
	},
}

英文:

I'm new to Go and I saw some solutions for this exercise, but I think they are complex...

In my solution everything seems simple, but I've got a deadlock error. I can't figure out how to properly close channels and stop loop inside main block. Is there a simple way to do this?

Solution on Golang playground

Thanks for any/all help one may provide!

package main
import (
&quot;fmt&quot;
&quot;sync&quot;
)
type Fetcher interface {
// Fetch returns the body of URL and
// a slice of URLs found on that page.
Fetch(url string) (body string, urls []string, err error)
}
type SafeCache struct {
cache map[string]bool
mux   sync.Mutex
}
func (c *SafeCache) Set(s string) {
c.mux.Lock()
c.cache展开收缩 = true
c.mux.Unlock()
}
func (c *SafeCache) Get(s string) bool {
c.mux.Lock()
defer c.mux.Unlock()
return c.cache展开收缩
}
var (
sc = SafeCache{cache: make(map[string]bool)}
errs, ress = make(chan error), make(chan string)
)
// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
if depth &lt;= 0 {
return
}
var (
body string
err error
urls []string
)
if ok := sc.Get(url); !ok {
sc.Set(url)
body, urls, err = fetcher.Fetch(url)
} else {
err = fmt.Errorf(&quot;Already fetched: %s&quot;, url)
}
if err != nil {
errs &lt;- err
return
}
ress &lt;- fmt.Sprintf(&quot;found: %s %q\n&quot;, url, body)
for _, u := range urls {
go Crawl(u, depth-1, fetcher)
}
return
}
func main() {
go Crawl(&quot;http://golang.org/&quot;, 4, fetcher)
for {
select {
case res, ok := &lt;-ress:
fmt.Println(res)
if !ok {
break
}
case err, ok := &lt;-errs:
fmt.Println(err)
if !ok {
break
}
}
}
}
// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult
type fakeResult struct {
body string
urls []string
}
func (f fakeFetcher) Fetch(url string) (string, []string, error) {
if res, ok := f
; ok {
return res.body, res.urls, nil
}
return &quot;&quot;, nil, fmt.Errorf(&quot;not found: %s&quot;, url)
}
// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
&quot;http://golang.org/&quot;: &amp;fakeResult{
&quot;The Go Programming Language&quot;,
[]string{
&quot;http://golang.org/pkg/&quot;,
&quot;http://golang.org/cmd/&quot;,
},
},
&quot;http://golang.org/pkg/&quot;: &amp;fakeResult{
&quot;Packages&quot;,
[]string{
&quot;http://golang.org/&quot;,
&quot;http://golang.org/cmd/&quot;,
&quot;http://golang.org/pkg/fmt/&quot;,
&quot;http://golang.org/pkg/os/&quot;,
},
},
&quot;http://golang.org/pkg/fmt/&quot;: &amp;fakeResult{
&quot;Package fmt&quot;,
[]string{
&quot;http://golang.org/&quot;,
&quot;http://golang.org/pkg/&quot;,
},
},
&quot;http://golang.org/pkg/os/&quot;: &amp;fakeResult{
&quot;Package os&quot;,
[]string{
&quot;http://golang.org/&quot;,
&quot;http://golang.org/pkg/&quot;,
},
},
}

答案1

得分: 2

你可以使用sync.WaitGroup来解决这个问题。

你可以在单独的goroutine中开始监听你的通道。
WaitGroup将协调你有多少个goroutine。

wg.Add(1)表示我们将启动一个新的goroutine。

wg.Done()表示goroutine已经完成。

wg.Wait()会阻塞goroutine，直到所有启动的goroutine都完成。

这三个方法可以协调goroutine的执行。

Go playground链接

PS. 你可能对sync.RWMutex对于你的SafeCache感兴趣。

英文:

you can solve this with sync.WaitGroup

You can start listening your channels in separate goroutines.
WaitGroup will coordinate how many goroutines do you have.

wg.Add(1) says that we're going to start new goroutine.

wg.Done() says that goroutine is finished.

wg.Wait() blocks goroutine, until all started goroutines aren't finished yet.

This 3 methods allows you to coordinate goroutines.

Go playground link

PS. you might be interested in sync.RWMutex for your SafeCache

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

golang tour网页爬虫练习的简单解决方案

问题

答案1

包不在 GOROOT 中。

Get last document from index in Elasticsearch

如何在Go中持久化或编码一个链式数据结构？

Int* to String in Golang

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论