使用Go Colly爬取所有可能的标签,并将它们放入一个变量中。

huangapple go评论76阅读模式
英文:

Scraping all possible tags and putting them into one variable using Go Colly

问题

我需要从一系列网站中抓取不同的标签,将它们放入变量中,然后将它们放入一个.csv列表中。例如,所有提到文章作者的行(div.author,p.author等)。在所有网站上,这行的位置和标签都不同,所以我需要创建一个条件和正则表达式来过滤这些标签。

这是我的代码,我在其中找到一个可能的作者标签,并将其添加到articleCollection中。我尝试了if和for条件,但无法将正确的变体放入author_name变量中。

c.OnHTML("body", func(e *colly.HTMLElement) {
	author_name := e.DOM.Find("div.author").Text()

	if author_name == "" {
		log.Println("Author not found \n")
	}

	author := Authors{
		Author: author_name,
	}

	articleCollection = append(articleCollection, author)
})

另外,我尝试使用以下条件来查找所有带有作者类的<p>标签,但它没有起作用,因为author_name被声明但未使用:

if author_name == "" {
	author_name := e.DOM.Find("p.author").Text()
}

谢谢。

英文:

I need to scrape different tags from a list of sites, put in variable and then put them in a .csv list. For example, all lines where the author of the article is mentioned (div.author, p.author etc). On all sites, the location of this line and the tags are different, so I need to create a conditional and regular expression to filter that tags.

This is my code, where I find 1 possible author tag and append it to articleCollection. I tried if and for conditions, but can't put right variant it into author_name variable.

c.OnHTML(&quot;body&quot;, func(e *colly.HTMLElement) {
	author_name := e.DOM.Find(&quot;div.author&quot;).Text()

	if author_name == &quot;&quot; {
		log.Println(&quot;Author not found \n&quot;)
	}

	author := Authors{
		Author: author_name,
	}

	articleCollection = append(articleCollection, author)
})

Also, I tried implement condition like this for find all <p> with author class, but it didn't work, because author_name declared and not used :

if author_name == &quot;&quot; {
	author_name := e.DOM.Find(&quot;p.author&quot;).Text()
}

Thank you.

答案1

得分: 0

使用以下代码替代:

if author_name == "" {
    author_name = e.DOM.Find("p.author").Text()
}

而不是:

if author_name == "" {
    author_name := e.DOM.Find("p.author").Text()
}

使用:=会分配一个新的变量,在你的情况下,它是author_name,一个只在该if块内有效的新变量。而且你在声明变量后没有在任何地方使用它,这就是为什么会出现错误的原因。

英文:

Use

if author_name == &quot;&quot; {
    author_name = e.DOM.Find(&quot;p.author&quot;).Text()
}

instead of

if author_name == &quot;&quot; {
    author_name := e.DOM.Find(&quot;p.author&quot;).Text()
}

Using := will allocate a new variable, and in your case, it is author_name, a new variable that is only valid within that if block. and you are not using it on anything after declaring the variable, that is why the error comes up

huangapple
  • 本文由 发表于 2023年4月5日 22:02:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75940402.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定