英文:
How can I get the name of a seller on a specific website using golang?
问题
我正在使用Go语言编写一个网络爬虫。给定一个特定的网页,我试图获取卖家的姓名,该姓名位于右上角(在这个示例中,在这个olx网站上,您可以看到卖家的姓名是Ionut)。当我运行下面的代码时,它应该将姓名写入index.csv文件,但是该文件是空的。我认为问题出在HTML解析器上,尽管在我看来它看起来没问题。
package main
import (
"encoding/csv"
"fmt"
"log"
"os"
"path/filepath"
"github.com/gocolly/colly"
)
func main() {
//设置存储收集数据的文件
fName := filepath.Join("D:\\", "go projects", "cwst go", "CWST-GO", "target folder", "index.csv")
file, err := os.Create(fName)
if err != nil {
log.Fatalf("Could not create file, error: %q", err)
}
defer file.Close()
//写入收集数据的写入器
writer := csv.NewWriter(file)
//在文件写入后,将缓冲区中的内容写入写入器,然后传递给文件
defer writer.Flush()
//收集器
c := colly.NewCollector(
colly.AllowedDomains("https://www.olx.ro/"),
)
//HTML解析器
c.OnHTML(".css-1fp4ipz", func(e *colly.HTMLElement) { //包含所需信息的div类
writer.Write([]string{
e.ChildText("h4"), //信息的特定标签
})
})
fmt.Printf("Scraping page: ")
c.Visit("https://www.olx.ro/d/oferta/bmw-xdrixe-seria-7-2020-71000-tva-IDgp7iN.html")
log.Printf("\n\nScraping Complete\n\n")
log.Println(c)
}
英文:
I'm making a web scraper in go. Given a specific web page, I'm trying to get the name of the seller which is placed in the top right corner (in this example on this olx site you can see the name of the seller is Ionut). When I run the down below code, it should write the name in the index.csv file, but the file is empty. I think the problem is at the HTML parser, though it looks fine to me.
package main
import (
"encoding/csv"
"fmt"
"log"
"os"
"path/filepath"
"github.com/gocolly/colly"
)
func main() {
//setting up the file where we store collected data
fName := filepath.Join("D:\\", "go projects", "cwst go", "CWST-GO", "target folder", "index.csv")
file, err := os.Create(fName)
if err != nil {
log.Fatalf("Could not create file, error :%q", err)
}
defer file.Close()
//writer that writes the collected data into our file
writer := csv.NewWriter(file)
//after the file is written, what it is in the buffer goes in writer and then passed to file
defer writer.Flush()
//collector
c := colly.NewCollector(
colly.AllowedDomains("https://www.olx.ro/"),
)
//HTML parser
c.OnHTML(".css-1fp4ipz", func(e *colly.HTMLElement) { //div class that contains wanted info
writer.Write([]string{
e.ChildText("h4"), //specific tag of the info
})
})
fmt.Printf("Scraping page : ")
c.Visit("https://www.olx.ro/d/oferta/bmw-xdrixe-seria-7-2020-71000-tva-IDgp7iN.html")
log.Printf("\n\nScraping Complete\n\n")
log.Println(c)
}
答案1
得分: 3
你不需要在允许的域名中添加 https
或 /
。
c := colly.NewCollector(
colly.AllowedDomains("www.olx.ro"),
)
英文:
You don't need to add https
or /
in the allowed domains.
c := colly.NewCollector(
colly.AllowedDomains("www.olx.ro"),
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论