如何使用网络爬虫访问动态HTML元素?

huangapple go评论82阅读模式
英文:

How do I access to a dynamic HTML element with web scrapping?

问题

我正在使用go-rod进行网页抓取。我想要访问一个动态<a>标签内的链接。
为了使这个<a>标签可见,我必须完成一个搜索器,它是一个没有submit按钮的input标签,格式如下:

<form>
    <input> <!--这是搜索器-->
<form/>

所以,当我完成搜索器时,我想要访问的<a>标签就会出现:
如何使用网络爬虫访问动态HTML元素?

到目前为止,一切都好。这是我用来完成搜索器的代码:

// 页面的 URL
page := rod.New().MustConnect().MustPage("https://www.sofascore.com/")

// 接受 cookies 提示
page.MustElement("cookiesAlertSelector...").MustClick()

// 完成搜索器
el := page.MustElement(`searcherSelector...`)
el.MustInput("Lionel Messi")

现在问题出现了,当我想要点击完成搜索器后显示的<a>标签时。

我尝试了以下代码:

divIWant := page.MustElement("aSelector...")
divIWant.MustClick()

以及:

divIWant := page.MustElement("aSelector...").MustWaitVisible()
divIWant.MustClick()

但是,它们都返回相同的错误:

panic: {-32000 Node is detached from document }

goroutine 1 [running]:
github.com/go-rod/rod/lib/utils.glob..func2({0x100742dc0?,
0x140002bad50?})
/Users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/rod@v0.112.2/lib/utils/utils.go:65
+0x24 github.com/go-rod/rod.genE.func1({0x14000281ca0?, 0x1003a98b7?, 0x4?})
/Users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/rod@v0.112.2/must.go:36
+0x64 github.com/go-rod/rod.(*Element).MustClick(0x14000289320) /Users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/rod@v0.112.2/must.go:729
+0x9c main.main() /Users/lucastomicbenitez/development/golang/evolutionaryAlgorithm/main/main.go:22
+0x9c exit status 2

因此,为了寻找解决方案,我找到了这个 GitHub 问题,并尝试使用以下代码获取链接:

link := page.MustEval(`() => document.querySelector('aSelector...').href`)

但是它返回了以下错误:

panic: eval js error: TypeError: Cannot read properties of null (reading 'href')

不过,我非常确定选择器是正确的。
我做错了什么?

英文:

I'm using go-rod for web scrapping. I want to access to a link which is inside a dynamic &lt;a&gt;.
To make this a visible I have to complete a searcher which is an input with the next format (without a submit):

&lt;form&gt;
    &lt;input&gt; &lt;!--This is the searcher--&gt;
&lt;form/&gt;

So, when I complete it, the a I want to access appears:
如何使用网络爬虫访问动态HTML元素?

Until here, all It's ok. This is the code I use to complete the searcher:

//Page&#39;s url
page := rod.New().MustConnect().MustPage(&quot;https://www.sofascore.com/&quot;)

//Acept cookies alert
page.MustElement(&quot;cookiesAlertSelector...&quot;).MustClick()

//Completes the searcher
el := page.MustElement(`searcherSelector...`)
el.MustInput(&quot;Lionel Messi&quot;)

Now the problem appears, when I want to click the a which has been displayed after I completed the searcher.

I tried with this:

divIWant := page.MustElement(&quot;aSelector...&quot;)
divIWant.MustClick()

and with this:

divIWant := page.MustElement(&quot;aSelector...&quot;).MustWaitVisible()
divIWant.MustClick()

But, both of them returns me the same error:

> panic: {-32000 Node is detached from document }
>
> goroutine 1 [running]:
> github.com/go-rod/rod/lib/utils.glob..func2({0x100742dc0?,
> 0x140002bad50?})
> /Users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/rod@v0.112.2/lib/utils/utils.go:65
> +0x24 github.com/go-rod/rod.genE.func1({0x14000281ca0?, 0x1003a98b7?, 0x4?})
> /Users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/rod@v0.112.2/must.go:36
> +0x64 github.com/go-rod/rod.(*Element).MustClick(0x14000289320) /Users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/rod@v0.112.2/must.go:729
> +0x9c main.main() /Users/lucastomicbenitez/development/golang/evolutionaryAlgorithm/main/main.go:22
> +0x9c exit status 2

So, looking for some solution I found this github issue and tried this to get the link:

link := page.MustEval(`()=&gt; document.querySelector(&#39;aSelector...&#39;).href`)

But it returns this:

> panic: eval js error: TypeError: Cannot read properties of null
> (reading 'href')

However, I'm pretty sure the selector is correct.
What am I doing wrong?

答案1

得分: 2

根据评论中@Hymns For Disco的说法,我只需要在搜索完成后等待一段时间。

el.MustInput("Lionel Messi")

time.Sleep(time.Second)

link := page.MustEval(`() => document.querySelector('aSelector...').href`)
英文:

As @Hymns For Disco said in the comments, I just needed to wait some time after the searcher is completed.

el.MustInput(&quot;Lionel Messi&quot;)

time.Sleep(time.Second)

link := page.MustEval(`()=&gt; document.querySelector(&#39;aSelector...&#39;).href`)

huangapple
  • 本文由 发表于 2022年12月28日 03:05:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/74933236.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定