2023年4月6日 21:31:07go评论107阅读模式

英文:

Can I press a button after another button using Playwright Python webscraping?

问题

I'll provide translations for the code you shared, but I'll exclude the code itself and focus on the comments and text.

以下是代码的翻译：

import语句和库导入部分不需要翻译。
def scrape_ranking(url, sheet_name): - 这是一个自定义函数定义，用于抓取数据。
with sync_playwright() as p: - 使用Playwright库。
browser = p.chromium.launch(headless=True) - 启动无头浏览器。
page = browser.new_page() - 创建一个新的浏览器页面。
page.goto(url) - 载入指定的网页。
with page.expect_popup() as popup_info: - 期望出现弹出窗口。
page.click('text="LUCKY MISSILE") - 点击名为"Lucky Missile"的赛马链接。
page.get_by_text("Show All").click() - 点击"Show All"按钮。
popup = popup_info.value - 获取弹出窗口的信息。
popup.wait_for_load_state("domcontentloaded") - 等待弹出窗口加载完成。
html = popup.content() - 获取弹出窗口的HTML内容。
browser.close() - 关闭浏览器。
tables = pd.read_html(html) - 从HTML内容中提取表格数据。
df = tables[7] - 获取第8个表格（索引从0开始）。
with pd.ExcelWriter("hkjc.xlsx", engine="openpyxl", mode='a', if_sheet_exists='overlay') as writer: - 创建Excel文件以写入数据。
df.to_excel(writer, sheet_name=sheet_name, index=True) - 将数据写入Excel文件。
url = 'https://racing.hkjc.com/racing/information/English/racing/RaceCard.aspx?RaceDate=2023/04/06&Racecourse=HV&RaceNo=1' - 设置要抓取的网页链接。
scrape_ranking(url, "LUCKY MISSILE") - 调用自定义函数来执行抓取操作。

请注意，代码中包含一些HTML元素和Python库的特定术语，这些术语可能需要根据您的需求进行进一步理解和处理。

英文:

I'm trying to write a code that will go onto this website "https://racing.hkjc.com/racing/information/English/racing/RaceCard.aspx?RaceDate=2023/04/06&Racecourse=HV&RaceNo=1"and click on the horse named "lucky missile". It should get led to a popup window with a table of all the statistics of the horse.

Then, I want the program to click on the "Show All" button on the far right, so the table doesn't just show the statistics from the last 3 seasons, but instead the statistics from all seasons.

This is where my program encounters an issue. It can't seem to find the "Show All" button. Does anyone know how to fix this?

import pandas as pd
import xlsxwriter
from bs4 import BeautifulSoup
from playwright.sync_api import Playwright, sync_playwright, expect
import xlwings as xw
def scrape_ranking(url, sheet_name):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(url)
        with page.expect_popup() as popup_info:
            page.click(&#39;text=&quot;LUCKY MISSILE&quot;&#39;)
        page.get_by_text(&quot;Show All&quot;).click()
        popup = popup_info.value
        popup.wait_for_load_state(&quot;domcontentloaded&quot;)
        
        html = popup.content()
        browser.close()
    tables = pd.read_html(html)
    df = tables[7]
    with pd.ExcelWriter(&quot;hkjc.xlsx&quot;, engine=&quot;openpyxl&quot;, mode=&#39;a&#39;, if_sheet_exists=&#39;overlay&#39;) as writer:
        df.to_excel(writer, sheet_name=sheet_name, index=True)
url = (&#39;https://racing.hkjc.com/racing/information/English/racing/RaceCard.aspx?RaceDate=2023/04/06&amp;Racecourse=HV&amp;RaceNo=1&#39;)
scrape_ranking(url, &quot;LUCKY MISSILE&quot;)

答案1

得分: 1

Sure, here is the translated content:

那个"button"看起来像是带有文本"Show all"，但文本已经栅格化到图像上（令人不悦）：

&lt;img
  src=&quot;/racing/content/Images/StaticFile/English/hf_allr_btn.jpg&quot;
  alt=&quot;Show All&quot;
  style=&quot;width: 92px; height: 24px&quot;
  id=&quot;hf_allr_btn_r&quot;
  class=&quot;active&quot;
  delsrc=&quot;/racing/content/Images/StaticFile/English/hf_allr_btn.jpg&quot;
  border=&quot;0&quot;
/&gt;

你可以使用以下代码来选择它：

popup.get_by_alt_text(&quot;Show All&quot;).click()

这将触发导航，进入一个新页面。

故事的寓意：使用浏览器的开发工具来检查元素，了解它的真正属性。

英文:

That "button" looks like it has the text "Show all", but the text is rasterized onto an image (shudder):

&lt;img
  src=&quot;/racing/content/Images/StaticFile/English/hf_allr_btn.jpg&quot;
  alt=&quot;Show All&quot;
  style=&quot;width: 92px; height: 24px&quot;
  id=&quot;hf_allr_btn_r&quot;
  class=&quot;active&quot;
  delsrc=&quot;/racing/content/Images/StaticFile/English/hf_allr_btn.jpg&quot;
  border=&quot;0&quot;
/&gt;

You could select this with

popup.get_by_alt_text(&quot;Show All&quot;).click()

which triggers a navigation, leading to a new page.

Moral of the story: use the browser's dev tools to inspect the element to see what it really is.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

可以使用 Playwright Python 进行网页抓取后按下另一个按钮吗？

问题

答案1

在一个 Polars 数据框中如何找到每列的空值数量？

日期显示为datetime.date(YYYY, M, D)

在Python中跨继承使用类变量

在使用pyautogui和keyboard库进行Python循环时出现问题。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。