当HTTP请求被取消时,关闭所有的goroutine。

huangapple go评论91阅读模式
英文:

Close all goroutines when HTTP request is cancelled

问题

我正在制作一个网络爬虫。我通过一个爬虫函数将URL传递进去,并解析它以获取锚点标签中的所有链接,然后我使用单独的goroutine为每个URL调用相同的爬虫函数。

但是,如果我在收到响应之前取消请求,那么与该特定请求相关的所有goroutine仍然在运行。

现在我想要的是,当我取消请求时,所有由该请求引发的goroutine都停止。

请指导一下。

以下是我的爬虫函数的代码。

func crawler(c echo.Context, urlRec string, feed chan string, urlList *[]string, wg *sync.WaitGroup) {
	defer wg.Done()
	URL, _ := url.Parse(urlRec)
	response, err := http.Get(urlRec)
	if err != nil {
		log.Print(err)
		return
	}

	body := response.Body
	defer body.Close()

	tokenizer := html.NewTokenizer(body)
	flag := true
	for flag {
		tokenType := tokenizer.Next()
		switch {
		case tokenType == html.ErrorToken:
			flag = false
			break
		case tokenType == html.StartTagToken:
			token := tokenizer.Token()

			// Check if the token is an <a> tag
			isAnchor := token.Data == "a"
			if !isAnchor {
				continue
			}

			ok, urlHref := getReference(token)
			if !ok {
				continue
			}

			// Make sure the url begines in http**
			hasProto := strings.Index(urlHref, "http") == 0
			if hasProto {
				if !urlInURLList(urlHref, urlList) {
					if strings.Contains(urlHref, URL.Host) {
						*urlList = append(*urlList, urlHref)
						// fmt.Println(urlHref)
						// c.String(http.StatusOK, urlHref+"\n")Documents
						if !checkExt(filepath.Ext(urlHref)) {
							wg.Add(1)
							go crawler(c, urlHref, feed, urlList, wg)
						}
					}
				}
			}
		}
	}
}

以下是我的POST请求处理程序。

func scrapePOST(c echo.Context) error {
	var urlList []string
	urlSession := urlFound{}
	var wg sync.WaitGroup
	urlParam := c.FormValue("url")
	feed := make(chan string, 1000)
	wg.Add(1)
	go crawler(c, urlParam, feed, &urlList, &wg)
	wg.Wait()
	var count = 0
	for _, url := range urlList {
		if filepath.Ext(url) == ".jpg" || filepath.Ext(url) == ".jpeg" || filepath.Ext(url) == ".png" {
			urlSession.Images = append(urlSession.Images, url)
		} else if filepath.Ext(url) == ".doc" || filepath.Ext(url) == ".docx" || filepath.Ext(url) == ".pdf" || filepath.Ext(url) == ".ppt" {
			urlSession.Documents = append(urlSession.Documents, url)
		} else {
			urlSession.Links = append(urlSession.Links, url)
		}
		count = count + 1
	}
	urlSession.Count = count
	// jsonResp, _ := json.Marshal(urlSession)
	// fmt.Print(urlSession)
	return c.JSON(http.StatusOK, urlSession)
}
英文:

<br />
I am making a web crawler. I'm passing the url through a crawler function and parsing it to get all the links in the anchor tag, then I am invoking same crawler function for all those urls using seperate goroutine for every url.
<br/>
But if if send a request and cancel it before I get the response, all the groutines for that particular request are still running.<br />
Now what I want is that when I cancel the request all the goroutines that got invoked due to that request stops.
<br />
Please guide.<br />
Following is my code for the crawler function.

func crawler(c echo.Context, urlRec string, feed chan string, urlList *[]string, wg *sync.WaitGroup) {
defer wg.Done()
URL, _ := url.Parse(urlRec)
response, err := http.Get(urlRec)
if err != nil {
log.Print(err)
return
}
body := response.Body
defer body.Close()
tokenizer := html.NewTokenizer(body)
flag := true
for flag {
tokenType := tokenizer.Next()
switch {
case tokenType == html.ErrorToken:
flag = false
break
case tokenType == html.StartTagToken:
token := tokenizer.Token()
// Check if the token is an &lt;a&gt; tag
isAnchor := token.Data == &quot;a&quot;
if !isAnchor {
continue
}
ok, urlHref := getReference(token)
if !ok {
continue
}
// Make sure the url begines in http**
hasProto := strings.Index(urlHref, &quot;http&quot;) == 0
if hasProto {
if !urlInURLList(urlHref, urlList) {
if strings.Contains(urlHref, URL.Host) {
*urlList = append(*urlList, urlHref)
// fmt.Println(urlHref)
// c.String(http.StatusOK, urlHref+&quot;\n&quot;)Documents
if !checkExt(filepath.Ext(urlHref)) {
wg.Add(1)
go crawler(c, urlHref, feed, urlList, wg)
}
}
}
}
}
}
}

And following is my POST request handler

func scrapePOST(c echo.Context) error {
var urlList []string
urlSession := urlFound{}
var wg sync.WaitGroup
urlParam := c.FormValue(&quot;url&quot;)
feed := make(chan string, 1000)
wg.Add(1)
go crawler(c, urlParam, feed, &amp;urlList, &amp;wg)
wg.Wait()
var count = 0
for _, url := range urlList {
if filepath.Ext(url) == &quot;.jpg&quot; || filepath.Ext(url) == &quot;.jpeg&quot; || filepath.Ext(url) == &quot;.png&quot; {
urlSession.Images = append(urlSession.Images, url)
} else if filepath.Ext(url) == &quot;.doc&quot; || filepath.Ext(url) == &quot;.docx&quot; || filepath.Ext(url) == &quot;.pdf&quot; || filepath.Ext(url) == &quot;.ppt&quot; {
urlSession.Documents = append(urlSession.Documents, url)
} else {
urlSession.Links = append(urlSession.Links, url)
}
count = count + 1
}
urlSession.Count = count
// jsonResp, _ := json.Marshal(urlSession)
// fmt.Print(urlSession)
return c.JSON(http.StatusOK, urlSession)
}

答案1

得分: 10

回答如下:

回声上下文公开了HTTP请求,该请求已与服务器请求绑定。只需获取该上下文,并检查是否取消,或将其传递给需要上下文的方法。

ctx := c.Request().Context()
select {
case <-ctx.Done():
    return ctx.Err()
default:
    // 继续处理请求
}

// 并传递给数据库或其他操作:
rows, err := db.QueryContext(ctx, ...)

如果客户端中断连接,请求范围的上下文将自动取消。

如果您想添加自己的取消条件(超时或其他条件),也可以这样做:

req := c.Request()
ctx, cancel := context.WithCancel(req.Context())
req = req.WithContext(ctx)
defer cancel()
// 做一些操作,可能会有条件地调用cancel()来提前取消上下文
英文:

The echo context exposes the HTTP request, which has a context tied to the server request already. Just get that context, and check it for cancellation, and/or pass it along to methods that take a context.

ctx := c.Request().Context()
select {
case &lt;-ctx.Done():
return ctx.Err()
default:
// Continue handling the request
}
// and pass along to the db or whatever else:
rows, err := db.QueryContext(ctx, ...)

If the client aborts the connection, the Request-scoped context will automatically be cancelled.

If you want to add your own cancellation conditions, (timeouts, or whatever) you can do that, too:

req := c.Request()
ctx, cancel := context.WithCancel(req.Context())
req = req.WithContext(ctx)
defer cancel()
// do stuff, which may conditionally call cancel() to cancel the context early

huangapple
  • 本文由 发表于 2017年8月6日 02:46:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/45525332.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定