go-colly返回空切片

huangapple go评论79阅读模式
英文:

go-colly returning empty slice

问题

我正在尝试爬取一个网站,但是似乎我的产品切片是空的。

scraper.go:

package scraper

import (
	"fmt"
	"strings"

	"github.com/gocolly/colly"
	"github.com/gocolly/colly/extensions"
)

type Product struct {
	name      string
	fullPrice string
	url       string
}

func Scraper(site string) []Product {

	products := []Product{}
	c := colly.NewCollector()
	replacer := strings.NewReplacer("R$", "", ",", ".")
	c.OnHTML("div#column-main-content", func(e *colly.HTMLElement) {
		fullPrice := e.ChildText("span.m7nrfa-0.eJCbzj.sc-ifAKCX.ANnoQ")
		product := Product{
			name:      e.ChildText("h2"),
			fullPrice: replacer.Replace(fullPrice),
			url:       e.ChildAttr("a.sc-1fcmfeb-2.iezWpY", "href"),
		}
		fmt.Println(product)
		products = append(products, product)
	})
	fmt.Println(products)

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Visiting", r.URL)
	})

	c.OnError(func(r *colly.Response, err error) {
		fmt.Println("Request URL:", r.Request.URL, "failed with response:", r.Request, "\nError:", err)
	})

	// Uses a random User-Agent in each request
	extensions.RandomUserAgent(c)

	c.Visit(site)
	return products
}

main.go:

package main

import "github.com/Antonio-Costa00/Go-Price-Monitor/scraper"

func main() {
	scraper.Scraper("https://sp.olx.com.br/?q=iphone%27")
}

product变量有输出,但是切片是空的。

切片输出:

[]

我不知道在将结果附加到products切片时是否做错了什么。

有人可以帮我检查一下,看看我是否做错了什么导致返回一个空的切片吗?

英文:

I am trying to scrape a website, but it seems my slice of products it's empty.

scraper.go:

package scraper

import (
	"fmt"
	"strings"

	"github.com/gocolly/colly"
	"github.com/gocolly/colly/extensions"
)

type Product struct {
	name      string
	fullPrice string
	url       string
}

func Scraper(site string) []Product {

	products := []Product{}
	c := colly.NewCollector()
	replacer := strings.NewReplacer("R$", "", ",", ".")
	c.OnHTML("div#column-main-content", func(e *colly.HTMLElement) {
		fullPrice := e.ChildText("span.m7nrfa-0.eJCbzj.sc-ifAKCX.ANnoQ")
		product := Product{
			name:      e.ChildText("h2"),
			fullPrice: replacer.Replace(fullPrice),
			url:       e.ChildAttr("a.sc-1fcmfeb-2.iezWpY", "href"),
		}
		fmt.Println(product)
		products = append(products, product)
	})
	fmt.Println(products)

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Visiting", r.URL)
	})

	c.OnError(func(r *colly.Response, err error) {
		fmt.Println("Request URL:", r.Request.URL, "failed with response:", r.Request, "\nError:", err)
	})

	// Uses a random User-Agent in each request
	extensions.RandomUserAgent(c)

	c.Visit(site)
	return products
}

main.go:

package main

import "github.com/Antonio-Costa00/Go-Price-Monitor/scraper"

func main() {
	scraper.Scraper("https://sp.olx.com.br/?q=iphone%27")
}

product variable has an output, but the slice is empty.

slice output:

[]

I don't know if I am doing something wrong when appending the result to products slice.

Can someone help me to check if I am doing something wrong to return an empty slice?

答案1

得分: 4

Colly库以异步方式进行网页抓取,因此当您打印products时,它是空的,但它将在另一个goroutine中填充。通过使用OnScraped处理程序并在那里打印products,您应该可以看到它被填充了。

package scraper

import (
        "fmt"
        "strings"

        "github.com/gocolly/colly"
        "github.com/gocolly/colly/extensions"
)

type Product struct {
        name      string
        fullPrice string
        url       string
}

func Scraper(site string) []Product {
        products := []Product{}
        c := colly.NewCollector()
        replacer := strings.NewReplacer("R$", "", ",", ".")
        c.OnHTML("div#column-main-content", func(e *colly.HTMLElement) {
                fullPrice := e.ChildText("span.m7nrfa-0.eJCbzj.sc-ifAKCX.ANnoQ")
                product := Product{
                        name:      e.ChildText("h2"),
                        fullPrice: replacer.Replace(fullPrice),
                        url:       e.ChildAttr("a.sc-1fcmfeb-2.iezWpY", "href"),
                }
                fmt.Println(product)
                products = append(products, product)
        })

        c.OnRequest(func(r *colly.Request) {
                fmt.Println("Visiting", r.URL)
        })

        c.OnError(func(r *colly.Response, err error) {
                fmt.Println("Request URL:", r.Request.URL, "failed with response:", r.Request, "\nError:", err)
        })

        c.OnScraped(func(r *colly.Response) {
                fmt.Println(products)
        })

        // 在每个请求中使用随机的User-Agent
        extensions.RandomUserAgent(c)

        c.Visit(site)
        return products
}
英文:

The Colly library does the scraping asynchronously, so when you print the products it is empty, but it will be filled in another goroutine. By using the OnScraped handler and printing the products there you should see it is filled.

package scraper

import (
        "fmt"
        "strings"

        "github.com/gocolly/colly"
        "github.com/gocolly/colly/extensions"
)

type Product struct {
        name      string
        fullPrice string
        url       string
}

func Scraper(site string) []Product {
        products := []Product{}
        c := colly.NewCollector()
        replacer := strings.NewReplacer("R$", "", ",", ".")
        c.OnHTML("div#column-main-content", func(e *colly.HTMLElement) {
                fullPrice := e.ChildText("span.m7nrfa-0.eJCbzj.sc-ifAKCX.ANnoQ")
                product := Product{
                        name:      e.ChildText("h2"),
                        fullPrice: replacer.Replace(fullPrice),
                        url:       e.ChildAttr("a.sc-1fcmfeb-2.iezWpY", "href"),
                }
                fmt.Println(product)
                products = append(products, product)
        })

        c.OnRequest(func(r *colly.Request) {
                fmt.Println("Visiting", r.URL)
        })

        c.OnError(func(r *colly.Response, err error) {
                fmt.Println("Request URL:", r.Request.URL, "failed with response:", r.Request, "\nError:", err)
        })

        c.OnScraped(func(r *colly.Response) {
                fmt.Println(products)
        })

        // Uses a random User-Agent in each request
        extensions.RandomUserAgent(c)

        c.Visit(site)
        return products
}

huangapple
  • 本文由 发表于 2022年11月12日 06:55:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/74408954.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定