如何使用Colly爬取带有属性的属性?

huangapple go评论111阅读模式
英文:

how to scrape attribute in attibute with colly

问题

我尝试抓取一个产品的productId,但是无法成功。请帮忙。

HTML代码如下:

  1. <span class="info">
  2. <button data-product="{"merchantName":"xxx","price":"11","productName":"car window","categoryName":"windows","brandName":"aa assosiations","productId":"which I want to scrape"}">

当我尝试使用以下代码时:

  1. h.ChildAttr("span.info>button", "data-product")

结果是{"merchantName":"xxx","price":"11","productName":"car window","categoryName":"windows","brandName":"aa assosiations","productId":"which I want to scrape"}

而当我尝试使用以下代码时:

  1. h.ChildAttr("span.info>button", "productId")

没有结果。我该如何使用colly获取这个数据?

英文:

I try to scrape productId of a product but i can not. please help

html code

  1. &lt;span class=&quot;info&quot;&gt;
  2. &lt;button data-product=&quot;{&quot;merchantName&quot;:&quot;xxx&quot;,&quot;price&quot;:&quot;11&quot;,&quot;productName&quot;:&quot;car window&quot;,&quot;categoryName&quot;:&quot;windows&quot;,&quot;brandName&quot;:&quot;aa assosiations&quot;,&quot;productId&quot;:&quot;which I want to scrape&quot;}&quot;&gt;

when I try

  1. h.ChildAttr(&quot;span.info&gt;button&quot;, &quot;data-product&quot;)

result is {&quot;merchantName&quot;:&quot;xxx&quot;,&quot;price&quot;:&quot;11&quot;,&quot;productName&quot;:&quot;car window&quot;,&quot;categoryName&quot;:&quot;windows&quot;,&quot;brandName&quot;:&quot;aa assosiations&quot;,&quot;productId&quot;:&quot;which I want to scrape&quot;}

and when I try

  1. h.ChildAttr(&quot;span.info&gt;button&quot;, &quot;productId&quot;)

there is no result.
how can I get this data with colly?

答案1

得分: 0

属性值是原始值,在这种情况下,它是以JSON格式呈现的,因此您需要解析JSON以正确获取数据。

例如:

  1. package main
  2. import (
  3. "log"
  4. "encoding/json"
  5. "github.com/gocolly/colly"
  6. )
  7. func main() {
  8. c := colly.NewCollector()
  9. c.OnHTML(`body`, func(e *colly.HTMLElement) {
  10. text := e.ChildAttr("span.info>button", "data-product")
  11. var result map[string]interface{}
  12. err := json.Unmarshal([]byte(text), &result)
  13. if err != nil {
  14. log.Println(err)
  15. return
  16. }
  17. log.Println(result["productId"])
  18. })
  19. c.Visit("[某个URL]")
  20. }

输出

  1. 2021/10/21 14:23:24 我想要抓取的内容
英文:

The attribute value is a raw value, and in this case, it's in JSON format, so you will need to parse the JSON in order to correctly get the data.

For example:

  1. package main
  2. import (
  3. &quot;log&quot;
  4. &quot;encoding/json&quot;
  5. &quot;github.com/gocolly/colly&quot;
  6. )
  7. func main() {
  8. c := colly.NewCollector()
  9. c.OnHTML(`body`, func(e *colly.HTMLElement) {
  10. text := e.ChildAttr(&quot;span.info&gt;button&quot;, &quot;data-product&quot;)
  11. var result map[string]interface{}
  12. err := json.Unmarshal([]byte(text), &amp;result)
  13. if err != nil {
  14. log.Println(err)
  15. return
  16. }
  17. log.Println(result[&quot;productId&quot;])
  18. })
  19. c.Visit(&quot;[some url]&quot;)
  20. }

Output

  1. 2021/10/21 14:23:24 which I want to scrape

huangapple
  • 本文由 发表于 2021年10月21日 18:57:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/69660694.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定