如何使用Colly爬取带有属性的属性?

huangapple go评论83阅读模式
英文:

how to scrape attribute in attibute with colly

问题

我尝试抓取一个产品的productId,但是无法成功。请帮忙。

HTML代码如下:

<span class="info">
 <button data-product="{"merchantName":"xxx","price":"11","productName":"car window","categoryName":"windows","brandName":"aa assosiations","productId":"which I want to scrape"}">

当我尝试使用以下代码时:

h.ChildAttr("span.info>button", "data-product")

结果是{"merchantName":"xxx","price":"11","productName":"car window","categoryName":"windows","brandName":"aa assosiations","productId":"which I want to scrape"}

而当我尝试使用以下代码时:

h.ChildAttr("span.info>button", "productId")

没有结果。我该如何使用colly获取这个数据?

英文:

I try to scrape productId of a product but i can not. please help

html code

&lt;span class=&quot;info&quot;&gt;
 &lt;button data-product=&quot;{&quot;merchantName&quot;:&quot;xxx&quot;,&quot;price&quot;:&quot;11&quot;,&quot;productName&quot;:&quot;car window&quot;,&quot;categoryName&quot;:&quot;windows&quot;,&quot;brandName&quot;:&quot;aa assosiations&quot;,&quot;productId&quot;:&quot;which I want to scrape&quot;}&quot;&gt;

when I try

h.ChildAttr(&quot;span.info&gt;button&quot;, &quot;data-product&quot;)

result is {&quot;merchantName&quot;:&quot;xxx&quot;,&quot;price&quot;:&quot;11&quot;,&quot;productName&quot;:&quot;car window&quot;,&quot;categoryName&quot;:&quot;windows&quot;,&quot;brandName&quot;:&quot;aa assosiations&quot;,&quot;productId&quot;:&quot;which I want to scrape&quot;}

and when I try

h.ChildAttr(&quot;span.info&gt;button&quot;, &quot;productId&quot;)

there is no result.
how can I get this data with colly?

答案1

得分: 0

属性值是原始值,在这种情况下,它是以JSON格式呈现的,因此您需要解析JSON以正确获取数据。

例如:

package main

import (
    "log"
    "encoding/json"
    "github.com/gocolly/colly"
)

func main() {
    c := colly.NewCollector()

    c.OnHTML(`body`, func(e *colly.HTMLElement) {
        text := e.ChildAttr("span.info>button", "data-product")

        var result map[string]interface{}
        err := json.Unmarshal([]byte(text), &result)
        if err != nil {
            log.Println(err)
            return
        }
        log.Println(result["productId"])
    })

    c.Visit("[某个URL]")
}

输出

2021/10/21 14:23:24 我想要抓取的内容
英文:

The attribute value is a raw value, and in this case, it's in JSON format, so you will need to parse the JSON in order to correctly get the data.

For example:

package main

import (
    &quot;log&quot;
    &quot;encoding/json&quot;
    &quot;github.com/gocolly/colly&quot;
)

func main() {
    c := colly.NewCollector()

    c.OnHTML(`body`, func(e *colly.HTMLElement) {
		text := e.ChildAttr(&quot;span.info&gt;button&quot;, &quot;data-product&quot;)

        var result map[string]interface{}
        err := json.Unmarshal([]byte(text), &amp;result)
        if err != nil {
            log.Println(err)
            return
        }
        log.Println(result[&quot;productId&quot;])
    })

    c.Visit(&quot;[some url]&quot;)
}

Output

2021/10/21 14:23:24 which I want to scrape

huangapple
  • 本文由 发表于 2021年10月21日 18:57:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/69660694.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定