2017年6月30日 22:30:26go评论140阅读模式

英文:

How to understand and practice in concurrency with Go?

问题

我正在学习Go语言，其中最强大的特性之一就是并发。之前我写过PHP脚本，它们是逐行执行的，所以我很难理解通道（channels）和goroutine（协程）。

是否有任何网站或其他资源（书籍、文章等），我可以看到一个可以并发处理的任务，这样我就可以在Go语言中练习并发编程？如果最后能看到带有注释和解释的解决方案，以及为什么要这样做以及为什么这个解决方案比其他解决方案更好，那就太好了。

举个例子，这是一个让我困惑的任务，我不知道如何着手：我需要创建一个类似解析器的程序，它接收一个起始点（例如：http://example.com），然后开始遍历整个网站（example.com/about、example.com/best-hotels/等），并从每个页面中提取一些文本部分（例如，通过选择器，如h1.title和p.description），然后在整个网站爬取完成后，我会得到一个解析后的内容切片。
我知道如何发送请求，如何使用选择器获取信息，但我不知道如何在所有的goroutine之间进行通信。

谢谢您提供的任何信息和链接。希望这对未来遇到同样问题的其他人有所帮助。

英文:

I am learning Go, and one of most powerful features is concurrency. I wrote PHP scripts before, they executed line-by-line, that's why it is difficult for me to understand channels and goroutines.

Is there are any website or any other resources (books, articles, etc.) where I can see a task that can be processed concurrently, so I can practice in concurrency with Go? It would be great, if at the end I can see the solution with comments and explanations why we do it this way and why this solution is better then others.

Just for example, here is the task that confuses me and I don't know how to approach: i need to make kinda parser, that receive start point (e.g.: http://example.com), and start navigating whole website (example.com/about, example.com/best-hotels/, etc.), and took some text parts from the each page (e.g., by selector, like h1.title and p.description) and then, after all website crawled, I receive a slice of parsed content.
I know how to make requests, how to get information using selector, but I don't know how to organize communication between all the goroutines.

Thank you for any information and links. Hope this would help others with the same problem in future.

答案1

得分: 1

所以关于Go语言并发模式，有很多在线资源可以参考，这三个链接是我通过快速谷歌搜索得到的。但如果你有特定的问题，我也可以解答。

看起来你想要爬取一个网站，并从其多个页面中获取信息，将这些"信息"存储到一个公共位置（比如一个slice）中。在这种情况下，你可以使用chan，也就是通道，它是一个线程安全的数据结构，可以在多个线程中安全地传递数据。

当然，在Go语言中，使用go关键字可以创建一个goroutine。

例如，在func main()线程中：

// 获取网页列表
dataChannel := make(chan string)
for _, webpage := range listOfWebpages {
    go fetchDataFromWebpage(webpage, dataChannel)
}
// dataChannel将并发地填充你发送给它的数据
for x := range dataChannel {
    fmt.Println(x) // 打印从网页中抓取的标题或其他信息
}

这些goroutine将是用于抓取网页并将数据发送到dataChannel的函数（你提到你已经知道如何抓取网页）。类似这样：

func fetchDataFromWebpage(url string, c chan string) {
    data := scrapeWebsite(url)
    c <- data // 将数据发送到线程安全的通道
}

如果你对如何使用并发工具（如通道、互斥锁或WaitGroup）感到困惑，也许你应该先尝试理解为什么并发可能会有问题我认为最好的说明（对我来说）是哲学家就餐问题（Dining Philosophers Problem），详情请参考：https://en.wikipedia.org/wiki/Dining_philosophers_problem

五位沉默的哲学家围坐在一张圆桌旁，每人面前有一碗意大利面。每位哲学家必须交替地思考和进餐。然而，一个哲学家只有在同时拿到左手和右手的叉子时才能吃意大利面。每个叉子只能由一个哲学家持有，因此一个哲学家只有在叉子没有被其他哲学家使用时才能使用叉子。在一个哲学家吃完之后，他需要放下两个叉子，以便其他人可以使用。哲学家可以在左手或右手的叉子可用时拿起叉子，但在拿到两个叉子之前不能开始进餐。

如果你想要练习，我建议你实现这个问题，让它出现问题，然后尝试使用并发模式来修复它还有其他类似的问题可供尝试！创建问题是理解如何解决问题的一步！

如果你对如何使用通道仍然感到困惑，除了阅读相关资料外，你可以简单地将通道视为可以从并发线程安全地访问/修改的队列。

英文:

so there are lots of resources online about concurrency patterns in go -- those three I got from a quick google search. But if you have something specific in mind, I think I can address that too.

Looks like you want to crawl a website and get information from it's many pages concurrently, depositing that "information" into a common location (ie. a slice). The way to go here is to use a chan, chaonlinennel, which is a thread-safe (multiple threads can access it without fear) data-structure for channeling data from one thread/goroutine to another.

And of course the go keyword in Go is how to spawn a goroutine.

so for example, in a func main() thread:

// get a listOfWebpages
dataChannel := make(chan string)
for _, webpage := range listOfWebpages {
    go fetchDataFromWebpage(webpage, dataChannel)
}
// the dataChannel will be concurrently filled with the data you send to it
for x := range dataChannel {
    fmt.Println(x) // print the header or whatever you scraped from webpage
}

The goroutines will be functions which scrape websites and feed the dataChannel (you mentioned you know how to scrape websites already). Something like this:

func fetchDataFromWebpage(url string, c chan string) {
    data := scrapeWebsite(url)
    c &lt;- data // send the data to thread safe channel
}

<hr>

If your having trouble understanding how to use concurrent tools, such as channels, mutex locks, or WaitGroups -- maybe you should start by trying to understand why concurrency can be problematic I find the best illustration of that (to me) is the Dining Philosophers Problem, https://en.wikipedia.org/wiki/Dining_philosophers_problem

> Five silent philosophers sit at a round table with bowls of spaghetti. Forks are placed between each pair of adjacent philosophers.
>
> Each philosopher must alternately think and eat. However, a philosopher can only eat spaghetti when they have both left and right forks. Each fork can be held by only one philosopher and so a philosopher can use the fork only if it is not being used by another philosopher. After an individual philosopher finishes eating, they need to put down both forks so that the forks become available to others. A philosopher can take the fork on their right or the one on their left as they become available, but cannot start eating before getting both forks.

If practice is what you're looking for, I recommend implementing this problem, so that it fails, and then trying to fix it using concurrent patterns -- there are other problems like this available to! And creating the problem is one step towards understanding how to solve it!

<hr>

If you're having more trouble just understanding how to use Channels, aside from reading up on it, you can more simply think about channels as queues which can safely be accessed/modified from concurrent threads.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何理解并使用Go语言进行并发编程的实践？

问题

答案1

curl: (56) Recv failure: Connection reset by peer in golang with docker

在Go语言中解码XML的问题

为什么我的 tiedot DB 指针是 nil？

在初始化(init)函数中还是在处理程序(handler function)中读取模板？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。