英文:
go-colly returning empty slice
问题
我正在尝试爬取一个网站,但是似乎我的产品切片是空的。
scraper.go:
package scraper
import (
"fmt"
"strings"
"github.com/gocolly/colly"
"github.com/gocolly/colly/extensions"
)
type Product struct {
name string
fullPrice string
url string
}
func Scraper(site string) []Product {
products := []Product{}
c := colly.NewCollector()
replacer := strings.NewReplacer("R$", "", ",", ".")
c.OnHTML("div#column-main-content", func(e *colly.HTMLElement) {
fullPrice := e.ChildText("span.m7nrfa-0.eJCbzj.sc-ifAKCX.ANnoQ")
product := Product{
name: e.ChildText("h2"),
fullPrice: replacer.Replace(fullPrice),
url: e.ChildAttr("a.sc-1fcmfeb-2.iezWpY", "href"),
}
fmt.Println(product)
products = append(products, product)
})
fmt.Println(products)
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})
c.OnError(func(r *colly.Response, err error) {
fmt.Println("Request URL:", r.Request.URL, "failed with response:", r.Request, "\nError:", err)
})
// Uses a random User-Agent in each request
extensions.RandomUserAgent(c)
c.Visit(site)
return products
}
main.go:
package main
import "github.com/Antonio-Costa00/Go-Price-Monitor/scraper"
func main() {
scraper.Scraper("https://sp.olx.com.br/?q=iphone%27")
}
product变量有输出,但是切片是空的。
切片输出:
[]
我不知道在将结果附加到products切片时是否做错了什么。
有人可以帮我检查一下,看看我是否做错了什么导致返回一个空的切片吗?
英文:
I am trying to scrape a website, but it seems my slice of products it's empty.
scraper.go:
package scraper
import (
"fmt"
"strings"
"github.com/gocolly/colly"
"github.com/gocolly/colly/extensions"
)
type Product struct {
name string
fullPrice string
url string
}
func Scraper(site string) []Product {
products := []Product{}
c := colly.NewCollector()
replacer := strings.NewReplacer("R$", "", ",", ".")
c.OnHTML("div#column-main-content", func(e *colly.HTMLElement) {
fullPrice := e.ChildText("span.m7nrfa-0.eJCbzj.sc-ifAKCX.ANnoQ")
product := Product{
name: e.ChildText("h2"),
fullPrice: replacer.Replace(fullPrice),
url: e.ChildAttr("a.sc-1fcmfeb-2.iezWpY", "href"),
}
fmt.Println(product)
products = append(products, product)
})
fmt.Println(products)
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})
c.OnError(func(r *colly.Response, err error) {
fmt.Println("Request URL:", r.Request.URL, "failed with response:", r.Request, "\nError:", err)
})
// Uses a random User-Agent in each request
extensions.RandomUserAgent(c)
c.Visit(site)
return products
}
main.go:
package main
import "github.com/Antonio-Costa00/Go-Price-Monitor/scraper"
func main() {
scraper.Scraper("https://sp.olx.com.br/?q=iphone%27")
}
product variable has an output, but the slice is empty.
slice output:
[]
I don't know if I am doing something wrong when appending the result to products slice.
Can someone help me to check if I am doing something wrong to return an empty slice?
答案1
得分: 4
Colly库以异步方式进行网页抓取,因此当您打印products时,它是空的,但它将在另一个goroutine中填充。通过使用OnScraped处理程序并在那里打印products,您应该可以看到它被填充了。
package scraper
import (
"fmt"
"strings"
"github.com/gocolly/colly"
"github.com/gocolly/colly/extensions"
)
type Product struct {
name string
fullPrice string
url string
}
func Scraper(site string) []Product {
products := []Product{}
c := colly.NewCollector()
replacer := strings.NewReplacer("R$", "", ",", ".")
c.OnHTML("div#column-main-content", func(e *colly.HTMLElement) {
fullPrice := e.ChildText("span.m7nrfa-0.eJCbzj.sc-ifAKCX.ANnoQ")
product := Product{
name: e.ChildText("h2"),
fullPrice: replacer.Replace(fullPrice),
url: e.ChildAttr("a.sc-1fcmfeb-2.iezWpY", "href"),
}
fmt.Println(product)
products = append(products, product)
})
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})
c.OnError(func(r *colly.Response, err error) {
fmt.Println("Request URL:", r.Request.URL, "failed with response:", r.Request, "\nError:", err)
})
c.OnScraped(func(r *colly.Response) {
fmt.Println(products)
})
// 在每个请求中使用随机的User-Agent
extensions.RandomUserAgent(c)
c.Visit(site)
return products
}
英文:
The Colly library does the scraping asynchronously, so when you print the products it is empty, but it will be filled in another goroutine. By using the OnScraped handler and printing the products there you should see it is filled.
package scraper
import (
"fmt"
"strings"
"github.com/gocolly/colly"
"github.com/gocolly/colly/extensions"
)
type Product struct {
name string
fullPrice string
url string
}
func Scraper(site string) []Product {
products := []Product{}
c := colly.NewCollector()
replacer := strings.NewReplacer("R$", "", ",", ".")
c.OnHTML("div#column-main-content", func(e *colly.HTMLElement) {
fullPrice := e.ChildText("span.m7nrfa-0.eJCbzj.sc-ifAKCX.ANnoQ")
product := Product{
name: e.ChildText("h2"),
fullPrice: replacer.Replace(fullPrice),
url: e.ChildAttr("a.sc-1fcmfeb-2.iezWpY", "href"),
}
fmt.Println(product)
products = append(products, product)
})
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})
c.OnError(func(r *colly.Response, err error) {
fmt.Println("Request URL:", r.Request.URL, "failed with response:", r.Request, "\nError:", err)
})
c.OnScraped(func(r *colly.Response) {
fmt.Println(products)
})
// Uses a random User-Agent in each request
extensions.RandomUserAgent(c)
c.Visit(site)
return products
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论