可以使用 Playwright Python 进行网页抓取后按下另一个按钮吗?

huangapple go评论74阅读模式
英文:

Can I press a button after another button using Playwright Python webscraping?

问题

I'll provide translations for the code you shared, but I'll exclude the code itself and focus on the comments and text.

以下是代码的翻译:

  1. import语句和库导入部分不需要翻译。

  2. def scrape_ranking(url, sheet_name): - 这是一个自定义函数定义,用于抓取数据。

  3. with sync_playwright() as p: - 使用Playwright库。

  4. browser = p.chromium.launch(headless=True) - 启动无头浏览器。

  5. page = browser.new_page() - 创建一个新的浏览器页面。

  6. page.goto(url) - 载入指定的网页。

  7. with page.expect_popup() as popup_info: - 期望出现弹出窗口。

  8. page.click('text="LUCKY MISSILE") - 点击名为"Lucky Missile"的赛马链接。

  9. page.get_by_text("Show All").click() - 点击"Show All"按钮。

  10. popup = popup_info.value - 获取弹出窗口的信息。

  11. popup.wait_for_load_state("domcontentloaded") - 等待弹出窗口加载完成。

  12. html = popup.content() - 获取弹出窗口的HTML内容。

  13. browser.close() - 关闭浏览器。

  14. tables = pd.read_html(html) - 从HTML内容中提取表格数据。

  15. df = tables[7] - 获取第8个表格(索引从0开始)。

  16. with pd.ExcelWriter("hkjc.xlsx", engine="openpyxl", mode='a', if_sheet_exists='overlay') as writer: - 创建Excel文件以写入数据。

  17. df.to_excel(writer, sheet_name=sheet_name, index=True) - 将数据写入Excel文件。

  18. url = 'https://racing.hkjc.com/racing/information/English/racing/RaceCard.aspx?RaceDate=2023/04/06&Racecourse=HV&RaceNo=1' - 设置要抓取的网页链接。

  19. scrape_ranking(url, "LUCKY MISSILE") - 调用自定义函数来执行抓取操作。

请注意,代码中包含一些HTML元素和Python库的特定术语,这些术语可能需要根据您的需求进行进一步理解和处理。

英文:

I'm trying to write a code that will go onto this website "https://racing.hkjc.com/racing/information/English/racing/RaceCard.aspx?RaceDate=2023/04/06&Racecourse=HV&RaceNo=1"and click on the horse named "lucky missile". It should get led to a popup window with a table of all the statistics of the horse.

Then, I want the program to click on the "Show All" button on the far right, so the table doesn't just show the statistics from the last 3 seasons, but instead the statistics from all seasons.

This is where my program encounters an issue. It can't seem to find the "Show All" button. Does anyone know how to fix this?

import pandas as pd
import xlsxwriter
from bs4 import BeautifulSoup
from playwright.sync_api import Playwright, sync_playwright, expect
import xlwings as xw

def scrape_ranking(url, sheet_name):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(url)

        with page.expect_popup() as popup_info:
            page.click('text="LUCKY MISSILE"')

        page.get_by_text("Show All").click()

        popup = popup_info.value
        popup.wait_for_load_state("domcontentloaded")
        
        html = popup.content()
        browser.close()

    tables = pd.read_html(html)
    df = tables[7]
    with pd.ExcelWriter("hkjc.xlsx", engine="openpyxl", mode='a', if_sheet_exists='overlay') as writer:
        df.to_excel(writer, sheet_name=sheet_name, index=True)


url = ('https://racing.hkjc.com/racing/information/English/racing/RaceCard.aspx?RaceDate=2023/04/06&Racecourse=HV&RaceNo=1')
scrape_ranking(url, "LUCKY MISSILE")

答案1

得分: 1

Sure, here is the translated content:

那个"button"看起来像是带有文本"Show all",但文本已经栅格化到图像上(令人不悦):

<img
  src="/racing/content/Images/StaticFile/English/hf_allr_btn.jpg"
  alt="Show All"
  style="width: 92px; height: 24px"
  id="hf_allr_btn_r"
  class="active"
  delsrc="/racing/content/Images/StaticFile/English/hf_allr_btn.jpg"
  border="0"
/>

你可以使用以下代码来选择它:

popup.get_by_alt_text("Show All").click()

这将触发导航,进入一个新页面。

故事的寓意:使用浏览器的开发工具来检查元素,了解它的真正属性。

英文:

That "button" looks like it has the text "Show all", but the text is rasterized onto an image (shudder):

<img
  src="/racing/content/Images/StaticFile/English/hf_allr_btn.jpg"
  alt="Show All"
  style="width: 92px; height: 24px"
  id="hf_allr_btn_r"
  class="active"
  delsrc="/racing/content/Images/StaticFile/English/hf_allr_btn.jpg"
  border="0"
/>

You could select this with

popup.get_by_alt_text("Show All").click()

which triggers a navigation, leading to a new page.

Moral of the story: use the browser's dev tools to inspect the element to see what it really is.

huangapple
  • 本文由 发表于 2023年4月6日 21:31:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75950132.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定