你可以使用golang如何获取特定网站上卖家的名称?

huangapple go评论70阅读模式
英文:

How can I get the name of a seller on a specific website using golang?

问题

我正在使用Go语言编写一个网络爬虫。给定一个特定的网页,我试图获取卖家的姓名,该姓名位于右上角(在这个示例中,在这个olx网站上,您可以看到卖家的姓名是Ionut)。当我运行下面的代码时,它应该将姓名写入index.csv文件,但是该文件是空的。我认为问题出在HTML解析器上,尽管在我看来它看起来没问题。

package main

import (
	"encoding/csv"
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/gocolly/colly"
)

func main() {
	//设置存储收集数据的文件
	fName := filepath.Join("D:\\", "go projects", "cwst go", "CWST-GO", "target folder", "index.csv")
	file, err := os.Create(fName)
	if err != nil {
		log.Fatalf("Could not create file, error: %q", err)
	}
	defer file.Close()
	//写入收集数据的写入器
	writer := csv.NewWriter(file)
	//在文件写入后,将缓冲区中的内容写入写入器,然后传递给文件
	defer writer.Flush()

	//收集器
	c := colly.NewCollector(
		colly.AllowedDomains("https://www.olx.ro/"),
	)

	//HTML解析器
	c.OnHTML(".css-1fp4ipz", func(e *colly.HTMLElement) { //包含所需信息的div类

		writer.Write([]string{
			e.ChildText("h4"), //信息的特定标签
		})
	})

	fmt.Printf("Scraping page: ")
	c.Visit("https://www.olx.ro/d/oferta/bmw-xdrixe-seria-7-2020-71000-tva-IDgp7iN.html")

	log.Printf("\n\nScraping Complete\n\n")
	log.Println(c)

}
英文:

I'm making a web scraper in go. Given a specific web page, I'm trying to get the name of the seller which is placed in the top right corner (in this example on this olx site you can see the name of the seller is Ionut). When I run the down below code, it should write the name in the index.csv file, but the file is empty. I think the problem is at the HTML parser, though it looks fine to me.

package main

import (
	"encoding/csv"
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/gocolly/colly"
)

func main() {
	//setting up the file where we store collected data
	fName := filepath.Join("D:\\", "go projects", "cwst go", "CWST-GO", "target folder", "index.csv")
	file, err := os.Create(fName)
	if err != nil {
		log.Fatalf("Could not create file, error :%q", err)
	}
	defer file.Close()
	//writer that writes the collected data into our file
	writer := csv.NewWriter(file)
	//after the file is written, what it is in the buffer goes in writer and then passed to file
	defer writer.Flush()

	//collector
	c := colly.NewCollector(
		colly.AllowedDomains("https://www.olx.ro/"),
	)

	//HTML parser
	c.OnHTML(".css-1fp4ipz", func(e *colly.HTMLElement) { //div class that contains wanted info

		writer.Write([]string{
			e.ChildText("h4"), //specific tag of the info
		})
	})

	fmt.Printf("Scraping page :  ")
	c.Visit("https://www.olx.ro/d/oferta/bmw-xdrixe-seria-7-2020-71000-tva-IDgp7iN.html")

	log.Printf("\n\nScraping Complete\n\n")
	log.Println(c)

}

答案1

得分: 3

你不需要在允许的域名中添加 https/

c := colly.NewCollector(
    colly.AllowedDomains("www.olx.ro"),
)
英文:

You don't need to add https or / in the allowed domains.

c := colly.NewCollector(
	colly.AllowedDomains("www.olx.ro"),
)

huangapple
  • 本文由 发表于 2022年8月17日 17:54:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/73386406.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定