奇怪的len函数(或字符串)行为

huangapple go评论92阅读模式
英文:

Strange len function (or string) behavior

问题

我正在尝试使用goquery解析时间表内容,以便稍后处理。但是我遇到了一个问题。

我有两个函数。第一个函数接收一个HTML文档,搜索一个令牌(csrfmiddlewaretoken),第二个函数使用该令牌发送请求并提取信息。在从页面中提取所有必要信息后,我搜索该令牌以在将来的请求中使用并存储它。

但是由于某种原因,当它达到if len(foundCsrfToken) == 0 {时,找到的令牌变为空字符串。如果我在该语句之前打印令牌的长度,它会打印如下内容:

...
64
0
...

我已经排除了所有goroutine的问题。

func findCsrfMiddlewareToken(responseBody io.Reader) (string, error) {
	document, err := goquery.NewDocumentFromReader(responseBody)
	if err != nil {
		return "", err
	}

	var foundCsrfToken string
	document.Find("script").Each(func(_ int, scrpt *goquery.Selection) {
		scriptText := scrpt.Text()
		if funcDefIndex := strings.Index(scriptText, "function Filter"); funcDefIndex != -1 {
			csrfTokenValueStart := strings.Index(scriptText, "csrfmiddlewaretoken: '")
			offset := csrfTokenValueStart + len("csrfmiddlewaretoken: '")
			foundCsrfToken = scriptText[offset : offset+csrfMiddlewareTokenLength]
		}
	})
	if len(foundCsrfToken) == 0 {
		return "", errNoCsrfMiddlewareToken
	}
	return foundCsrfToken, nil
}

func (parser *TimetableParser) ParseTimetable(timetableFilterInfo internal.TimetableInfo) (internal.Timetable, error) {
	timetable := internal.Timetable{}

	requestBody := makeFormValues(timetableFilterInfo, parser.csrfMiddlewareToken).Encode()
	request, err := http.NewRequest("POST", baseUrl, strings.NewReader(requestBody))
	if err != nil {
		return timetable, err
	}
	request.Header.Add("Content-Type", "application/x-www-form-urlencoded")
	request.Header.Add("Content-Length", strconv.Itoa(len(requestBody)))
	request.Header.Add("Referer", baseUrl)

	response, err := parser.client.Do(request)
	if err != nil {
		return timetable, err
	}
	defer response.Body.Close()

	document, err := goquery.NewDocumentFromReader(response.Body)
	if err != nil {
		return timetable, err
	}

	document.Find("table#schedule").Find("tr").Each(func(rowIndex int, row *goquery.Selection) {
		subjectTimeElement := row.Closest("td")
		subjectTimeElement.NextAll().Each(func(columnIndex int, cell *goquery.Selection) {
			subjectInfo := extractSubjectInfoFromCell(cell)
			subjectInfo.Order = rowIndex
			timetable.Subjects[columnIndex][rowIndex] = subjectInfo
		})
	})

	parser.csrfMiddlewareToken, err = findCsrfMiddlewareToken(response.Body)
	if err != nil {
			log.Println("csrfMiddlewareToken: " + err.Error())
	}
	return timetable, nil
}

Go版本: go1.17.1 windows/amd64

goquery版本: 1.7.1

英文:

I'm trying to parse the timetable content using goquery to work with it later. But I have a problem.

I have two functions. The first one takes an html document and searches for a token (csrfmiddlewaretoken) and the second one sends a request using this token and extracts information. Finishing extracting all necessary information from the page, I search for the token to use it in future request and store it.

But for some reason found token turns into an empty string when it reaches if len(foundCsrfToken) == 0 {. If I print length of the token just before the statement it prints this:

...
64
0
...

I've got rid of all goroutines in case if it's the problem.

func findCsrfMiddlewareToken(responseBody io.Reader) (string, error) {
	document, err := goquery.NewDocumentFromReader(responseBody)
	if err != nil {
		return "", err
	}

	var foundCsrfToken string
	document.Find("script").Each(func(_ int, scrpt *goquery.Selection) {
		scriptText := scrpt.Text()
		if funcDefIndex := strings.Index(scriptText, "function Filter"); funcDefIndex != -1 {
			csrfTokenValueStart := strings.Index(scriptText, "csrfmiddlewaretoken: '")
			offset := csrfTokenValueStart + len("csrfmiddlewaretoken: '")
			foundCsrfToken = scriptText[offset : offset+csrfMiddlewareTokenLength]
		}
	})
	if len(foundCsrfToken) == 0 {
		return "", errNoCsrfMiddlewareToken
	}
	return foundCsrfToken, nil
}

func (parser *TimetableParser) ParseTimetable(timetableFilterInfo internal.TimetableInfo) (internal.Timetable, error) {
	timetable := internal.Timetable{}

	requestBody := makeFormValues(timetableFilterInfo, parser.csrfMiddlewareToken).Encode()
	request, err := http.NewRequest("POST", baseUrl, strings.NewReader(requestBody))
	if err != nil {
		return timetable, err
	}
	request.Header.Add("Content-Type", "application/x-www-form-urlencoded")
	request.Header.Add("Content-Length", strconv.Itoa(len(requestBody)))
	request.Header.Add("Referer", baseUrl)

	response, err := parser.client.Do(request)
	if err != nil {
		return timetable, err
	}
	defer response.Body.Close()

	document, err := goquery.NewDocumentFromReader(response.Body)
	if err != nil {
		return timetable, err
	}

	document.Find("table#schedule").Find("tr").Each(func(rowIndex int, row *goquery.Selection) {
		subjectTimeElement := row.Closest("td")
		subjectTimeElement.NextAll().Each(func(columnIndex int, cell *goquery.Selection) {
			subjectInfo := extractSubjectInfoFromCell(cell)
			subjectInfo.Order = rowIndex
			timetable.Subjects[columnIndex][rowIndex] = subjectInfo
		})
	})

	parser.csrfMiddlewareToken, err = findCsrfMiddlewareToken(response.Body)
	if err != nil {
			log.Println("csrfMiddlewareToken: " + err.Error())
	}
	return timetable, nil
}

Go version: go1.17.1 windows/amd64

goquery version: 1.7.1

答案1

得分: 1

我刚刚意识到问题出在哪里。io.Reader被视为一个流。所以当我从中读取一次后,它就变为空了。正如你所看到的,在收集所有必要信息并读取响应后,它被传递到第一个函数中。但它已经是空的了。
当我第一次调用findCsrfMiddlewareToken函数时,它像往常一样工作,并打印出令牌长度(64)。但当我第二次调用时,响应为空,它打印出0。

可能的解决方案:https://stackoverflow.com/questions/39791021/how-to-read-multiple-times-from-same-io-reader

英文:

I've just realized what is wrong. io.Reader is treated as a stream. So when I make read from it once, it becomes empty. As you can see, after gathering all necessary information and reading the response, it is passed into the first function. But it's already empty.
When I call findCsrfMiddlewareToken function for the first time, it works as usual and prints token length (64). But when I get to second call with empty response, it prints 0.

Possible solution: https://stackoverflow.com/questions/39791021/how-to-read-multiple-times-from-same-io-reader

huangapple
  • 本文由 发表于 2021年10月4日 04:02:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/69428459.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定