
huangapple go评论74阅读模式

Reading a file concurrently




  1. 逐行读取文件(最终使用缓冲区来处理一组行)。
  2. 将文本传递给执行一些正则表达式工作的func()
  3. 将结果发送到某个地方,但避免使用互斥锁或共享变量。我将整数(始终为数字1)发送到一个通道。这有点愚蠢,但如果不会引起问题,我希望保持这种方式,除非您们有更好的选择。
  4. 使用工作池来完成此操作。我不确定如何告诉工作线程重新排队?


package main

import (

func telephoneNumbersInFile(path string) int {
  file := strings.NewReader(path)

  var telephone = regexp.MustCompile(`\(\d+\)\s\d+-\d+`)

  // 这里需要缓冲通道吗?
  jobs := make(chan string)
  results := make(chan int)

  // 我认为我们需要一个等待组,不确定。
  wg := new(sync.WaitGroup)

  // 启动一些将阻塞并等待的工作线程?
  for w := 1; w <= 3; w++ {
    go matchTelephoneNumbers(jobs, results, wg, telephone)

  // 逐行遍历文件并排队大量工作
  scanner := bufio.NewScanner(file)
  for scanner.Scan() {
    // 以后我想创建一组行的缓冲区,而不仅仅是逐行处理...
    jobs <- scanner.Text()


  // 从结果通道中累加结果。
  // 其余部分甚至都没有工作,所以现在先忽略。
  counts := 0
  // for v := range results {
  //   counts += v
  // }

  return counts

func matchTelephoneNumbers(jobs <-chan string, results chan<- int, wg *sync.WaitGroup, telephone *regexp.Regexp) {
  // goroutine 完成后,减少等待组的内部计数器
  defer wg.Done()

  // 最终我想要一个 []string 通道,以便处理一组行而不仅仅是一行文本
  for j := range jobs {
    if telephone.MatchString(j) {
      results <- 1

func main() {
  // 一个人工输入源。通常这是通过命令行传递的文件。
  const input = "Foo\n(555) 123-3456\nBar\nBaz"
  numberOfTelephoneNumbers := telephoneNumbersInFile(input)

The reading part isn't concurrent but the processing is. I phrased the title this way because I'm most likely to search for this problem again using that phrase. 并发读取文件

I'm getting a deadlock after trying to go beyond the examples so this is a learning experience for me. My goals are these:

  1. Read a file line by line (eventually use a buffer to do groups of lines).
  2. Pass off the text to a func() that does some regex work.
  3. Send the results somewhere but avoid mutexes or shared variables. I'm sending ints (always the number 1) to a channel. It's sort of silly but if it's not causing problems I'd like to leave it like this unless you folks have a neater option.
  4. Use a worker pool to do this. I'm not sure how I tell the workers to requeue themselves?

Here is the playground link. I tried to write helpful comments, hopefully this makes sense. My design could be completely wrong so don't hesitate to refactor.

package main
import (
func telephoneNumbersInFile(path string) int {
file := strings.NewReader(path)
var telephone = regexp.MustCompile(`\(\d+\)\s\d+-\d+`)
// do I need buffered channels here?
jobs := make(chan string)
results := make(chan int)
// I think we need a wait group, not sure.
wg := new(sync.WaitGroup)
// start up some workers that will block and wait?
for w := 1; w &lt;= 3; w++ {
go matchTelephoneNumbers(jobs, results, wg, telephone)
// go over a file line by line and queue up a ton of work
scanner := bufio.NewScanner(file)
for scanner.Scan() {
// Later I want to create a buffer of lines, not just line-by-line here ...
jobs &lt;- scanner.Text()
// Add up the results from the results channel.
// The rest of this isn&#39;t even working so ignore for now.
counts := 0
// for v := range results {
//   counts += v
// }
return counts
func matchTelephoneNumbers(jobs &lt;-chan string, results chan&lt;- int, wg *sync.WaitGroup, telephone *regexp.Regexp) {
// Decreasing internal counter for wait-group as soon as goroutine finishes
defer wg.Done()
// eventually I want to have a []string channel to work on a chunk of lines not just one line of text
for j := range jobs {
if telephone.MatchString(j) {
results &lt;- 1
func main() {
// An artificial input source.  Normally this is a file passed on the command line.
const input = &quot;Foo\n(555) 123-3456\nBar\nBaz&quot;
numberOfTelephoneNumbers := telephoneNumbersInFile(input)


得分: 16



  1. 在一个单独的routine中运行扫描器,一旦所有内容都被读取完毕,关闭输入通道。
  2. 运行一个单独的routine等待解析器完成工作,然后关闭输出通道。
  3. 在主routine中收集所有的结果。


// 逐行遍历文件并排队大量的工作
go func() {
scanner := bufio.NewScanner(file)
for scanner.Scan() {
jobs <- scanner.Text()

// 收集所有的结果...
// 首先,确保在所有内容被处理完毕时关闭结果通道
go func() {

// 现在,从结果通道中累加结果,直到通道关闭
counts := 0
for v := range results {
counts += v




You're almost there, just need a little bit of work on goroutines' synchronisation. Your problem is that you're trying to feed the parser and collect the results in the same routine, but that can't be done.

I propose the following:

  1. Run scanner in a separate routine, close input channel once everything is read.
  2. Run separate routine waiting for the parsers to finish their job, than close the output channel.
  3. Collect all the results in you main routine.

The relevant changes could look like this:

// Go over a file line by line and queue up a ton of work
go func() {
scanner := bufio.NewScanner(file)
for scanner.Scan() {
jobs &lt;- scanner.Text()
// Collect all the results...
// First, make sure we close the result channel when everything was processed
go func() {
// Now, add up the results from the results channel until closed
counts := 0
for v := range results {
counts += v

Fully working example on the playground: http://play.golang.org/p/coja1_w-fY

Worth adding you don't necessarily need the WaitGroup to achieve the same, all you need to know is when to stop receiving results. This could be achieved for example by scanner advertising (on a channel) how many lines were read and then the collector reading only specified number of results (you would need to send zeros as well though).


得分: 1



  1. 使用带缓冲的通道,这样发送操作不会阻塞。
  2. 关闭结果通道,这样接收操作不会阻塞。





Edit: The answer by @tomasz above is the correct one. Please disregard this answer.

You need to do two things:

  1. use buffered chan's so that sending doesn't block
  2. close the results chan so that receiving doesn't block.

The use of buffered channels is essential because unbuffered channels need a receive for each send, which is causing the deadlock you're hitting.

If you fix that, you'll run into a deadlock when you try to receive the results, because results hasn't been closed.

Here's the fixed playground: http://play.golang.org/p/DtS8Matgi5

  • 本文由 发表于 2014年12月1日 03:46:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/27217428.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
