问题

我正在使用Go语言编写一个网络爬虫。给定一个特定的网页，我试图获取卖家的姓名，该姓名位于右上角（在这个示例中，在这个olx网站上，您可以看到卖家的姓名是Ionut）。当我运行下面的代码时，它应该将姓名写入index.csv文件，但是该文件是空的。我认为问题出在HTML解析器上，尽管在我看来它看起来没问题。

package main

import (
	"encoding/csv"
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/gocolly/colly"
)

func main() {
	//设置存储收集数据的文件
	fName := filepath.Join("D:\\", "go projects", "cwst go", "CWST-GO", "target folder", "index.csv")
	file, err := os.Create(fName)
	if err != nil {
		log.Fatalf("Could not create file, error: %q", err)
	}
	defer file.Close()
	//写入收集数据的写入器
	writer := csv.NewWriter(file)
	//在文件写入后，将缓冲区中的内容写入写入器，然后传递给文件
	defer writer.Flush()

	//收集器
	c := colly.NewCollector(
		colly.AllowedDomains("https://www.olx.ro/"),
	)

	//HTML解析器
	c.OnHTML(".css-1fp4ipz", func(e *colly.HTMLElement) { //包含所需信息的div类

		writer.Write([]string{
			e.ChildText("h4"), //信息的特定标签
		})
	})

	fmt.Printf("Scraping page: ")
	c.Visit("https://www.olx.ro/d/oferta/bmw-xdrixe-seria-7-2020-71000-tva-IDgp7iN.html")

	log.Printf("\n\nScraping Complete\n\n")
	log.Println(c)

}

英文:

I'm making a web scraper in go. Given a specific web page, I'm trying to get the name of the seller which is placed in the top right corner (in this example on this olx site you can see the name of the seller is Ionut). When I run the down below code, it should write the name in the index.csv file, but the file is empty. I think the problem is at the HTML parser, though it looks fine to me.

package main

import (
	&quot;encoding/csv&quot;
	&quot;fmt&quot;
	&quot;log&quot;
	&quot;os&quot;
	&quot;path/filepath&quot;

	&quot;github.com/gocolly/colly&quot;
)

func main() {
	//setting up the file where we store collected data
	fName := filepath.Join(&quot;D:\\&quot;, &quot;go projects&quot;, &quot;cwst go&quot;, &quot;CWST-GO&quot;, &quot;target folder&quot;, &quot;index.csv&quot;)
	file, err := os.Create(fName)
	if err != nil {
		log.Fatalf(&quot;Could not create file, error :%q&quot;, err)
	}
	defer file.Close()
	//writer that writes the collected data into our file
	writer := csv.NewWriter(file)
	//after the file is written, what it is in the buffer goes in writer and then passed to file
	defer writer.Flush()

	//collector
	c := colly.NewCollector(
		colly.AllowedDomains(&quot;https://www.olx.ro/&quot;),
	)

	//HTML parser
	c.OnHTML(&quot;.css-1fp4ipz&quot;, func(e *colly.HTMLElement) { //div class that contains wanted info

		writer.Write([]string{
			e.ChildText(&quot;h4&quot;), //specific tag of the info
		})
	})

	fmt.Printf(&quot;Scraping page :  &quot;)
	c.Visit(&quot;https://www.olx.ro/d/oferta/bmw-xdrixe-seria-7-2020-71000-tva-IDgp7iN.html&quot;)

	log.Printf(&quot;\n\nScraping Complete\n\n&quot;)
	log.Println(c)

}

答案1

得分: 3

你不需要在允许的域名中添加 https 或 /。

c := colly.NewCollector(
    colly.AllowedDomains("www.olx.ro"),
)

英文:

You don't need to add https or / in the allowed domains.

c := colly.NewCollector(
	colly.AllowedDomains(&quot;www.olx.ro&quot;),
)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

你可以使用golang如何获取特定网站上卖家的名称？

问题

答案1

如何在Travis CI上管理构建Golang项目

尝试在Google App Engine上获取托管证书时出现404错误。

从带锁的地图中读取不会通过通道返回值。

Golang：给定句子的首字母缩写。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论