chromedp的点击在我的golang代码中不起作用。你能找出问题在哪里吗?

huangapple go评论81阅读模式
英文:

chromedp click is not working in my golang code. can you find what's wrong?

问题

我正在使用chromedp进行网络爬虫工作。

为了获取我想要的内容(页面的HTML),我需要点击一个特定的按钮。

所以我使用了chromedp.click和chromedp.outerhtml,但是我只能得到点击之前页面的HTML,而不是点击完成后页面的HTML。

你能看一下我的代码并给我建议如何修复吗?


func runCrawler(URL string, lineNum string, stationNm string) {
	
        // 爬取设置
	opts := append(chromedp.DefaultExecAllocatorOptions[:],
		chromedp.Flag("headless", false))
	
        // 创建Chrome实例
	contextVar, cancelFunc := chromedp.NewExecAllocator(context.Background(), opts...)
	defer cancelFunc()

	contextVar, cancelFunc = chromedp.NewContext(contextVar)
	defer cancelFunc()


	var htmlContent string

	err := chromedp.Run(contextVar,
		chromedp.Navigate(URL),
		chromedp.WaitVisible(".end_footer_area"),
		chromedp.Click(".end_section.station_info_section > div.at_end.sofzqce > div > div.c10jv2ep.wrap_btn_schedule.schedule_time > button"),
		chromedp.OuterHTML("html", &htmlContent, chromedp.ByQuery),
	)
	fmt.Println("html", htmlContent)
	checkErr(err)

我还给出了主页和需要点击的按钮。

页面URL: https://pts.map.naver.com/end-subway/ends/web/11321/home

我需要点击的按钮区域:

chromedp的点击在我的golang代码中不起作用。你能找出问题在哪里吗?

非常感谢!

英文:

I'm working on scrapper with chromedp.

To get what i want (page html), i have to click a specific button.

So I used chromedp.click, and chromedp.outerhtml, but i only got html of page before click, not the html of page after click have done.

Can you see my code and advice me how to fix it?


func runCrawler(URL string, lineNum string, stationNm string) {
	
        // settings for crawling
	opts := append(chromedp.DefaultExecAllocatorOptions[:],
		chromedp.Flag("headless", false))
	
        // create chrome instance
	contextVar, cancelFunc := chromedp.NewExecAllocator(context.Background(), opts...)
	defer cancelFunc()

	contextVar, cancelFunc = chromedp.NewContext(contextVar)
	defer cancelFunc()


	var htmlContent string

	err := chromedp.Run(contextVar,
		chromedp.Navigate(URL),
		chromedp.WaitVisible(".end_footer_area"),
		chromedp.Click(".end_section.station_info_section > div.at_end.sofzqce > div > div.c10jv2ep.wrap_btn_schedule.schedule_time > button"),
		chromedp.OuterHTML("html", &htmlContent, chromedp.ByQuery),
	)
	fmt.Println("html", htmlContent)
	checkErr(err)

i also give you homepage and button i need to click

Page URL: https://pts.map.naver.com/end-subway/ends/web/11321/home

Button Area I need to click:

chromedp的点击在我的golang代码中不起作用。你能找出问题在哪里吗?

Thank you very much

答案1

得分: 0

你想要获取的页面在一个新标签页中打开。

在这种情况下,我们可以使用chromedp.WaitNewTarget来创建一个通道,从中我们可以接收到新标签页的目标ID。然后使用chromedp.WithTargetID选项创建一个新的上下文,以便我们可以连接到新的标签页。从这里开始,一切都是你已经熟悉的。

package main

import (
	"context"
	"fmt"
	"strings"

	"github.com/chromedp/cdproto/target"
	"github.com/chromedp/chromedp"
)

func main() {
	opts := append(chromedp.DefaultExecAllocatorOptions[:],
		chromedp.Flag("headless", false),
	)

	ctx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
	defer cancel()

	ctx, cancel = chromedp.NewContext(ctx)
	defer cancel()

	var htmlContent string

	ch := chromedp.WaitNewTarget(ctx, func(i *target.Info) bool {
		return strings.Contains(i.URL, "/timetable/web/")
	})

	err := chromedp.Run(ctx,
		chromedp.Navigate("https://pts.map.naver.com/end-subway/ends/web/11321/home"),
		chromedp.WaitVisible(".end_footer_area"),
		chromedp.Click(".end_section.station_info_section > div.at_end.sofzqce > div > div.c10jv2ep.wrap_btn_schedule.schedule_time > button"),
	)
	if err != nil {
		panic(err)
	}

	newCtx, cancel := chromedp.NewContext(ctx, chromedp.WithTargetID(<-ch))
	defer cancel()

	if err := chromedp.Run(newCtx,
		chromedp.WaitReady(".table_schedule", chromedp.ByQuery),
		chromedp.OuterHTML("html", &htmlContent, chromedp.ByQuery),
	); err != nil {
		panic(err)
	}
	fmt.Println("html", htmlContent)
}
英文:

The page you want to get is open in a new tab (target).

In this case, we can use chromedp.WaitNewTarget to create a chan from where we can receive the target id of the new tab. Then create a new context with the chromedp.WithTargetID option so that we can connect to the new tab. From here everything is what you are already familiar with.

package main

import (
	&quot;context&quot;
	&quot;fmt&quot;
	&quot;strings&quot;

	&quot;github.com/chromedp/cdproto/target&quot;
	&quot;github.com/chromedp/chromedp&quot;
)

func main() {
	opts := append(chromedp.DefaultExecAllocatorOptions[:],
		chromedp.Flag(&quot;headless&quot;, false),
	)

	ctx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
	defer cancel()

	ctx, cancel = chromedp.NewContext(ctx)
	defer cancel()

	var htmlContent string

	ch := chromedp.WaitNewTarget(ctx, func(i *target.Info) bool {
		return strings.Contains(i.URL, &quot;/timetable/web/&quot;)
	})

	err := chromedp.Run(ctx,
		chromedp.Navigate(&quot;https://pts.map.naver.com/end-subway/ends/web/11321/home&quot;),
		chromedp.WaitVisible(&quot;.end_footer_area&quot;),
		chromedp.Click(&quot;.end_section.station_info_section &gt; div.at_end.sofzqce &gt; div &gt; div.c10jv2ep.wrap_btn_schedule.schedule_time &gt; button&quot;),
	)
	if err != nil {
		panic(err)
	}

	newCtx, cancel := chromedp.NewContext(ctx, chromedp.WithTargetID(&lt;-ch))
	defer cancel()

	if err := chromedp.Run(newCtx,
		chromedp.WaitReady(&quot;.table_schedule&quot;, chromedp.ByQuery),
		chromedp.OuterHTML(&quot;html&quot;, &amp;htmlContent, chromedp.ByQuery),
	); err != nil {
		panic(err)
	}
	fmt.Println(&quot;html&quot;, htmlContent)
}

huangapple
  • 本文由 发表于 2023年5月2日 16:10:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76152907.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定