遇到问题:在点击按钮后无法获取HTML / WebDriverWait 在条件明显满足时超时

huangapple go评论71阅读模式
英文:

Having trouble getting html from after a button is clicked/WebDriverWait is timing out when the condition is clearly met

问题

我试图抓取 https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL 的数据,但我想要在单击“Show All”按钮后获取HTML。我知道点击命令有效,因为我可以在浏览器中看到表格变化,如果我之后访问按钮的类,它会将 'active' 添加到字符串的末尾。但当我尝试从那之后的表格中获取数据时,它仍然是相同的。我不确定是否可以在没有等待的情况下解决这个问题,但如果不能,我需要帮助。

我添加了一个显式等待,以防它只是需要一秒钟,但是无论我将等待时间设置多长时间,等待都会超时。以下是我的等待代码。

url = 'https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL'
driver = webdriver.Chrome('https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL')
driver.get(url)
gross_data = driver.find_element(By.CLASS_NAME, 'all-gross-data')
button = gross_data.find_element(By.TAG_NAME, 'button')
driver.execute_script('arguments[0].click();', button)
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'btn btn-red show-all-grosses active')))


没有等待的情况下(仍然只提供单击按钮之前表格的数据)的代码如下:

driver = webdriver.Chrome()
driver.get(url)
gross_data = driver.find_element(By.CLASS_NAME, 'all-gross-data')
button = gross_data.find_element(By.TAG_NAME, 'button')
driver.execute_script('arguments[0].click();', button)
after_click = driver.find_element(By.CLASS_NAME, 'table-body')
after_click = after_click.get_attribute('outerHTML')
print(after_click)



<details>
<summary>英文:</summary>

I am trying to scrape https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL but I want the html from after the &quot;Show All&quot; button is clicked. I know that the click command works because I can see the table change in the browser and if I access the class of the button afterward, it added &#39;active&#39; to the end of the string. But when I try to grab the data from the table after that, it is still the same. I&#39;m not sure if this can be solved without a wait but if not, I need help with that.

I added an explicit wait in case it just needed a second but the wait keeps timing out no matter how long I make it. Here is my code for the wait.

url = 'https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL'
driver = webdriver.Chrome('https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL')
driver.get(url)
gross_data = driver.find_element(By.CLASS_NAME, 'all-gross-data')
button = gross_data.find_element(By.TAG_NAME, 'button')
driver.execute_script('arguments[0].click();', button)
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'btn btn-red show-all-grosses active')))


And this is without the wait (which still only gives the data from the table before the button was clicked):

driver = webdriver.Chrome()
driver.get(url)
gross_data = driver.find_element(By.CLASS_NAME, 'all-gross-data')
button = gross_data.find_element(By.TAG_NAME, 'button')
driver.execute_script('arguments[0].click();', button)
after_click = driver.find_element(By.CLASS_NAME, 'table-body')
after_click = after_click.get_attribute('outerHTML')
print(after_click)




</details>


# 答案1
**得分**: 0

1. 尽量避免通过JavaScript注入点击任何元素。如果需要点击元素,只需使用element.click()
2. 如果您的字符串中包含空格,通过CSS查找元素通常会失败。如果您的类名是class="This is my class",您可以使用以下CSS(*=)表示包含关系:

    CSS=*[class*="This"][class*="my"][class*="is"]

<details>
<summary>英文:</summary>

1. Try to avoid clicking any element via JavaScript injection. If you need to click on an element just use element.click()
2. Finding element by CSS will usually fail if you string has white spaces. If your class name is class=&quot;This is my class&quot; you can use this CSS via (*=) which means contains:

    CSS=*[class*=&quot;This&quot;][class*=&quot;my&quot;][class*=&quot;is&quot;]

</details>



# 答案2
**得分**: 0

尝试以下代码,添加滚动显示所有按钮,并在点击之前添加等待表格加载。

```python
尝试以下代码,添加滚动显示所有按钮,并在点击之前添加等待表格加载

import time
from selenium import webdriver
from selenium.common import TimeoutException, MoveTargetOutOfBoundsException
from selenium.webdriver import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from selenium.webdriver.common.action_chains import ActionChains

options = webdriver.ChromeOptions()
data = []
url = 'https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL'
driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver, 60)
action = ActionChains(driver)

# 关闭广告
try:
    adButton = wait.until(
        EC.visibility_of_element_located((By.XPATH, '//a[text()="AD - CLICK HERE TO CLOSE"]')))
    adButton.click()
except TimeoutException as t:
    print("忽略,因为有时广告不会显示")

showAllButton = driver.find_element(By.XPATH, '//button[text()="Show All"]')

driver.execute_script("arguments[0].scrollIntoView({block: 'center'});", showAllButton)
time.sleep(2)

showAllButton.click()
# 等待新表加载
time.sleep(5)

table = driver.find_element(By.CLASS_NAME, "table")

tableHeader = table.find_element(By.XPATH, '//div[@class="table-header"]')
cells = tableHeader.find_elements(By.CLASS_NAME, "cell")

header = []
for cell in cells:
    header.append(cell.text)

data.append(header)

tableBody = table.find_element(By.XPATH, "//div[@class='table-body']")

rows = tableBody.find_elements(By.CLASS_NAME, "row")
for row in rows:
    tableCells = row.find_elements(By.CLASS_NAME, "cell")
    newRow = []
    for cell in tableCells:
        newRow.append(cell.text)
    data.append(newRow)
driver.quit()

df = pd.DataFrame(data)
print(df)
df.to_csv("sam.csv", index=False, header=False)

输出CSV如下图所示:
遇到问题:在点击按钮后无法获取HTML / WebDriverWait 在条件明显满足时超时

英文:

Try below code, added scrolling Show all button before clicking and added wait for table to loaf

import time

from selenium import webdriver
from selenium.common import TimeoutException, MoveTargetOutOfBoundsException
from selenium.webdriver import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from selenium.webdriver.common.action_chains import ActionChains

options = webdriver.ChromeOptions()
data = []
url = &#39;https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL&#39;
driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver, 60)
action = ActionChains(driver)
# Close AD
try:
    adButton = wait.until(
        EC.visibility_of_element_located((By.XPATH, &#39;//a[text()=&quot;AD - CLICK HERE TO CLOSE&quot;]&#39;)))
    adButton.click()
except TimeoutException as t:
    print(&quot;Ignoring as Sometimes ad doesn&#39;t show up&quot;)

showAllButton = driver.find_element(By.XPATH, &#39;//button[text()=&quot;Show All&quot;]&#39;)

driver.execute_script(&quot;arguments[0].scrollIntoView({block: &#39;center&#39;});&quot;,showAllButton)
time.sleep(2)

showAllButton.click()
# Wait for new table to load
time.sleep(5)

table = driver.find_element(By.CLASS_NAME, &quot;table&quot;)

tableHeader = table.find_element(By.XPATH, &#39;//div[@class=&quot;table-header&quot;]&#39;)
cells = tableHeader.find_elements(By.CLASS_NAME, &quot;cell&quot;)

header = []
for cell in cells:
    header.append(cell.text)

data.append(header)

tableBody = table.find_element(By.XPATH, &quot;//div[@class=&#39;table-body&#39;]&quot;)

rows = tableBody.find_elements(By.CLASS_NAME, &quot;row&quot;)
for row in rows:
    tableCells = row.find_elements(By.CLASS_NAME, &quot;cell&quot;)
    newRow = []
    for cell in tableCells:
        newRow.append(cell.text)
    data.append(newRow)
driver.quit()

df = pd.DataFrame(data)
print(df)
df.to_csv(&quot;sam.csv&quot;, index=False, header=False)

outputs csv like
遇到问题:在点击按钮后无法获取HTML / WebDriverWait 在条件明显满足时超时

huangapple
  • 本文由 发表于 2023年5月22日 10:44:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76302763.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定