英文:
Strange len function (or string) behavior
问题
我正在尝试使用goquery解析时间表内容,以便稍后处理。但是我遇到了一个问题。
我有两个函数。第一个函数接收一个HTML文档,搜索一个令牌(csrfmiddlewaretoken),第二个函数使用该令牌发送请求并提取信息。在从页面中提取所有必要信息后,我搜索该令牌以在将来的请求中使用并存储它。
但是由于某种原因,当它达到if len(foundCsrfToken) == 0 {
时,找到的令牌变为空字符串。如果我在该语句之前打印令牌的长度,它会打印如下内容:
...
64
0
...
我已经排除了所有goroutine的问题。
func findCsrfMiddlewareToken(responseBody io.Reader) (string, error) {
document, err := goquery.NewDocumentFromReader(responseBody)
if err != nil {
return "", err
}
var foundCsrfToken string
document.Find("script").Each(func(_ int, scrpt *goquery.Selection) {
scriptText := scrpt.Text()
if funcDefIndex := strings.Index(scriptText, "function Filter"); funcDefIndex != -1 {
csrfTokenValueStart := strings.Index(scriptText, "csrfmiddlewaretoken: '")
offset := csrfTokenValueStart + len("csrfmiddlewaretoken: '")
foundCsrfToken = scriptText[offset : offset+csrfMiddlewareTokenLength]
}
})
if len(foundCsrfToken) == 0 {
return "", errNoCsrfMiddlewareToken
}
return foundCsrfToken, nil
}
func (parser *TimetableParser) ParseTimetable(timetableFilterInfo internal.TimetableInfo) (internal.Timetable, error) {
timetable := internal.Timetable{}
requestBody := makeFormValues(timetableFilterInfo, parser.csrfMiddlewareToken).Encode()
request, err := http.NewRequest("POST", baseUrl, strings.NewReader(requestBody))
if err != nil {
return timetable, err
}
request.Header.Add("Content-Type", "application/x-www-form-urlencoded")
request.Header.Add("Content-Length", strconv.Itoa(len(requestBody)))
request.Header.Add("Referer", baseUrl)
response, err := parser.client.Do(request)
if err != nil {
return timetable, err
}
defer response.Body.Close()
document, err := goquery.NewDocumentFromReader(response.Body)
if err != nil {
return timetable, err
}
document.Find("table#schedule").Find("tr").Each(func(rowIndex int, row *goquery.Selection) {
subjectTimeElement := row.Closest("td")
subjectTimeElement.NextAll().Each(func(columnIndex int, cell *goquery.Selection) {
subjectInfo := extractSubjectInfoFromCell(cell)
subjectInfo.Order = rowIndex
timetable.Subjects[columnIndex][rowIndex] = subjectInfo
})
})
parser.csrfMiddlewareToken, err = findCsrfMiddlewareToken(response.Body)
if err != nil {
log.Println("csrfMiddlewareToken: " + err.Error())
}
return timetable, nil
}
Go版本: go1.17.1 windows/amd64
goquery版本: 1.7.1
英文:
I'm trying to parse the timetable content using goquery to work with it later. But I have a problem.
I have two functions. The first one takes an html document and searches for a token (csrfmiddlewaretoken) and the second one sends a request using this token and extracts information. Finishing extracting all necessary information from the page, I search for the token to use it in future request and store it.
But for some reason found token turns into an empty string when it reaches if len(foundCsrfToken) == 0 {
. If I print length of the token just before the statement it prints this:
...
64
0
...
I've got rid of all goroutines in case if it's the problem.
func findCsrfMiddlewareToken(responseBody io.Reader) (string, error) {
document, err := goquery.NewDocumentFromReader(responseBody)
if err != nil {
return "", err
}
var foundCsrfToken string
document.Find("script").Each(func(_ int, scrpt *goquery.Selection) {
scriptText := scrpt.Text()
if funcDefIndex := strings.Index(scriptText, "function Filter"); funcDefIndex != -1 {
csrfTokenValueStart := strings.Index(scriptText, "csrfmiddlewaretoken: '")
offset := csrfTokenValueStart + len("csrfmiddlewaretoken: '")
foundCsrfToken = scriptText[offset : offset+csrfMiddlewareTokenLength]
}
})
if len(foundCsrfToken) == 0 {
return "", errNoCsrfMiddlewareToken
}
return foundCsrfToken, nil
}
func (parser *TimetableParser) ParseTimetable(timetableFilterInfo internal.TimetableInfo) (internal.Timetable, error) {
timetable := internal.Timetable{}
requestBody := makeFormValues(timetableFilterInfo, parser.csrfMiddlewareToken).Encode()
request, err := http.NewRequest("POST", baseUrl, strings.NewReader(requestBody))
if err != nil {
return timetable, err
}
request.Header.Add("Content-Type", "application/x-www-form-urlencoded")
request.Header.Add("Content-Length", strconv.Itoa(len(requestBody)))
request.Header.Add("Referer", baseUrl)
response, err := parser.client.Do(request)
if err != nil {
return timetable, err
}
defer response.Body.Close()
document, err := goquery.NewDocumentFromReader(response.Body)
if err != nil {
return timetable, err
}
document.Find("table#schedule").Find("tr").Each(func(rowIndex int, row *goquery.Selection) {
subjectTimeElement := row.Closest("td")
subjectTimeElement.NextAll().Each(func(columnIndex int, cell *goquery.Selection) {
subjectInfo := extractSubjectInfoFromCell(cell)
subjectInfo.Order = rowIndex
timetable.Subjects[columnIndex][rowIndex] = subjectInfo
})
})
parser.csrfMiddlewareToken, err = findCsrfMiddlewareToken(response.Body)
if err != nil {
log.Println("csrfMiddlewareToken: " + err.Error())
}
return timetable, nil
}
Go version: go1.17.1 windows/amd64
goquery version: 1.7.1
答案1
得分: 1
我刚刚意识到问题出在哪里。io.Reader被视为一个流。所以当我从中读取一次后,它就变为空了。正如你所看到的,在收集所有必要信息并读取响应后,它被传递到第一个函数中。但它已经是空的了。
当我第一次调用findCsrfMiddlewareToken
函数时,它像往常一样工作,并打印出令牌长度(64)。但当我第二次调用时,响应为空,它打印出0。
可能的解决方案:https://stackoverflow.com/questions/39791021/how-to-read-multiple-times-from-same-io-reader
英文:
I've just realized what is wrong. io.Reader is treated as a stream. So when I make read from it once, it becomes empty. As you can see, after gathering all necessary information and reading the response, it is passed into the first function. But it's already empty.
When I call findCsrfMiddlewareToken
function for the first time, it works as usual and prints token length (64). But when I get to second call with empty response, it prints 0.
Possible solution: https://stackoverflow.com/questions/39791021/how-to-read-multiple-times-from-same-io-reader
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论