英文:
Having trouble getting html from after a button is clicked/WebDriverWait is timing out when the condition is clearly met
问题
我试图抓取 https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL 的数据,但我想要在单击“Show All”按钮后获取HTML。我知道点击命令有效,因为我可以在浏览器中看到表格变化,如果我之后访问按钮的类,它会将 'active' 添加到字符串的末尾。但当我尝试从那之后的表格中获取数据时,它仍然是相同的。我不确定是否可以在没有等待的情况下解决这个问题,但如果不能,我需要帮助。
我添加了一个显式等待,以防它只是需要一秒钟,但是无论我将等待时间设置多长时间,等待都会超时。以下是我的等待代码。
url = 'https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL'
driver = webdriver.Chrome('https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL')
driver.get(url)
gross_data = driver.find_element(By.CLASS_NAME, 'all-gross-data')
button = gross_data.find_element(By.TAG_NAME, 'button')
driver.execute_script('arguments[0].click();', button)
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'btn btn-red show-all-grosses active')))
没有等待的情况下(仍然只提供单击按钮之前表格的数据)的代码如下:
driver = webdriver.Chrome()
driver.get(url)
gross_data = driver.find_element(By.CLASS_NAME, 'all-gross-data')
button = gross_data.find_element(By.TAG_NAME, 'button')
driver.execute_script('arguments[0].click();', button)
after_click = driver.find_element(By.CLASS_NAME, 'table-body')
after_click = after_click.get_attribute('outerHTML')
print(after_click)
<details>
<summary>英文:</summary>
I am trying to scrape https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL but I want the html from after the "Show All" button is clicked. I know that the click command works because I can see the table change in the browser and if I access the class of the button afterward, it added 'active' to the end of the string. But when I try to grab the data from the table after that, it is still the same. I'm not sure if this can be solved without a wait but if not, I need help with that.
I added an explicit wait in case it just needed a second but the wait keeps timing out no matter how long I make it. Here is my code for the wait.
url = 'https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL'
driver = webdriver.Chrome('https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL')
driver.get(url)
gross_data = driver.find_element(By.CLASS_NAME, 'all-gross-data')
button = gross_data.find_element(By.TAG_NAME, 'button')
driver.execute_script('arguments[0].click();', button)
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'btn btn-red show-all-grosses active')))
And this is without the wait (which still only gives the data from the table before the button was clicked):
driver = webdriver.Chrome()
driver.get(url)
gross_data = driver.find_element(By.CLASS_NAME, 'all-gross-data')
button = gross_data.find_element(By.TAG_NAME, 'button')
driver.execute_script('arguments[0].click();', button)
after_click = driver.find_element(By.CLASS_NAME, 'table-body')
after_click = after_click.get_attribute('outerHTML')
print(after_click)
</details>
# 答案1
**得分**: 0
1. 尽量避免通过JavaScript注入点击任何元素。如果需要点击元素,只需使用element.click()
2. 如果您的字符串中包含空格,通过CSS查找元素通常会失败。如果您的类名是class="This is my class",您可以使用以下CSS(*=)表示包含关系:
CSS=*[class*="This"][class*="my"][class*="is"]
<details>
<summary>英文:</summary>
1. Try to avoid clicking any element via JavaScript injection. If you need to click on an element just use element.click()
2. Finding element by CSS will usually fail if you string has white spaces. If your class name is class="This is my class" you can use this CSS via (*=) which means contains:
CSS=*[class*="This"][class*="my"][class*="is"]
</details>
# 答案2
**得分**: 0
尝试以下代码,添加滚动显示所有按钮,并在点击之前添加等待表格加载。
```python
尝试以下代码,添加滚动显示所有按钮,并在点击之前添加等待表格加载
import time
from selenium import webdriver
from selenium.common import TimeoutException, MoveTargetOutOfBoundsException
from selenium.webdriver import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from selenium.webdriver.common.action_chains import ActionChains
options = webdriver.ChromeOptions()
data = []
url = 'https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL'
driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver, 60)
action = ActionChains(driver)
# 关闭广告
try:
adButton = wait.until(
EC.visibility_of_element_located((By.XPATH, '//a[text()="AD - CLICK HERE TO CLOSE"]')))
adButton.click()
except TimeoutException as t:
print("忽略,因为有时广告不会显示")
showAllButton = driver.find_element(By.XPATH, '//button[text()="Show All"]')
driver.execute_script("arguments[0].scrollIntoView({block: 'center'});", showAllButton)
time.sleep(2)
showAllButton.click()
# 等待新表加载
time.sleep(5)
table = driver.find_element(By.CLASS_NAME, "table")
tableHeader = table.find_element(By.XPATH, '//div[@class="table-header"]')
cells = tableHeader.find_elements(By.CLASS_NAME, "cell")
header = []
for cell in cells:
header.append(cell.text)
data.append(header)
tableBody = table.find_element(By.XPATH, "//div[@class='table-body']")
rows = tableBody.find_elements(By.CLASS_NAME, "row")
for row in rows:
tableCells = row.find_elements(By.CLASS_NAME, "cell")
newRow = []
for cell in tableCells:
newRow.append(cell.text)
data.append(newRow)
driver.quit()
df = pd.DataFrame(data)
print(df)
df.to_csv("sam.csv", index=False, header=False)
英文:
Try below code, added scrolling Show all button before clicking and added wait for table to loaf
import time
from selenium import webdriver
from selenium.common import TimeoutException, MoveTargetOutOfBoundsException
from selenium.webdriver import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from selenium.webdriver.common.action_chains import ActionChains
options = webdriver.ChromeOptions()
data = []
url = 'https://www.broadwayworld.com/grosses/A-BEAUTIFUL-NOISETHE-NEIL-DIAMOND-MUSICAL'
driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver, 60)
action = ActionChains(driver)
# Close AD
try:
adButton = wait.until(
EC.visibility_of_element_located((By.XPATH, '//a[text()="AD - CLICK HERE TO CLOSE"]')))
adButton.click()
except TimeoutException as t:
print("Ignoring as Sometimes ad doesn't show up")
showAllButton = driver.find_element(By.XPATH, '//button[text()="Show All"]')
driver.execute_script("arguments[0].scrollIntoView({block: 'center'});",showAllButton)
time.sleep(2)
showAllButton.click()
# Wait for new table to load
time.sleep(5)
table = driver.find_element(By.CLASS_NAME, "table")
tableHeader = table.find_element(By.XPATH, '//div[@class="table-header"]')
cells = tableHeader.find_elements(By.CLASS_NAME, "cell")
header = []
for cell in cells:
header.append(cell.text)
data.append(header)
tableBody = table.find_element(By.XPATH, "//div[@class='table-body']")
rows = tableBody.find_elements(By.CLASS_NAME, "row")
for row in rows:
tableCells = row.find_elements(By.CLASS_NAME, "cell")
newRow = []
for cell in tableCells:
newRow.append(cell.text)
data.append(newRow)
driver.quit()
df = pd.DataFrame(data)
print(df)
df.to_csv("sam.csv", index=False, header=False)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论