如何正确等待多个动态下拉菜单使用Selenium和Python填充?

huangapple go评论76阅读模式
英文:

How to properly wait for multiple dynamic dropdown menus to populate with Selenium & Python?

问题

请问你想知道如何解决你在第三个下拉菜单 (Events) 中获取选项时遇到的问题,是吗?

英文:

I'm trying to scrape the IFSC Results website. There are four dropdown menus there: Year, League, Events, and Category.

I'd like to scrape the most recent, non-empty year (2022) along with all of the events that happened in the World Cups/Championships. This would be index 1 in the first two dropdowns. My issue is that I don't know how to get the options for the third dropdown menu (Events). If there is no League selected, no options show up in the dropdown:

如何正确等待多个动态下拉菜单使用Selenium和Python填充?

When you select 2022 for year, the second dropdown automatically updates with the last event for that year, and then the third dropdown updates similarly.

如何正确等待多个动态下拉菜单使用Selenium和Python填充?

My problem:
I figured this would be related to some kind of wait issue, and I've attempted to wait many, many times, but I'm not entirely sure what event to wait for. Currently, I'm trying this:

# Dropdown menus for each choice
year_dd     = self.browser.find_element(By.XPATH, '//select[@id="years"]')
league_dd   = self.browser.find_element(By.XPATH, '//select[@id="indexes"]')
event_dd    = self.browser.find_element(By.XPATH, '//select[@id="events"]')
category_dd = self.browser.find_element(By.XPATH, '//select[@id="categories"]')
            
# Selenium Select class gets objects in dropdown and puts them in corresponding list
years_ob      = Select(year_dd).select_by_index(1)     #0 is most recent year (no events have happened yet)
leagues_ob    = Select(league_dd).select_by_index(1)   #starts at index 1 (World Cups)

wait = WebDriverWait(self.browser, 5)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "option[value='/api/v1/events/*']")))

# I have tried all of the following with similar results
# wait.until(EC.presence_of_all_elements_located((By.XPATH, "//select[@id='events']")))
# wait.until(EC.presence_of_element_located((By.XPATH, "//select[@id='events']"), (By.TAG_NAME, "option")))
# wait.until(EC.element_to_be_clickable((By.TAG_NAME, "option")))
# wait.until(EC.visibility_of_any_elements_located((By.XPATH, "//select[@id='events']")))
# wait.until(lambda x: x.find_element(By.XPATH, "//select[@id='events']/option[text()='IFSC*']"))

year_opts   = Select(year_dd).options
league_opts = Select(league_dd).options
event_opts  = Select(event_dd).options
cat_opts    = Select(category_dd).options

# Extracts the text from the objects and adds to list
years      = [x.text for x in year_opts]
leagues    = [x.text for x in league_opts[1:]]           
events     = [x.text for x in event_opts[1:]]
categories = [x.text for x in cat_opts[1:]]
            
print(years)
print(leagues)
print(events)
print(categories)

I can extract the years, and the leagues (the first two dropdown menus) just fine, but the third dropdown returns an empty list 95% of the time, and I don't know why. I think it might be connected to that disabled option tag? Can anyone help point me in the right direction?

答案1

得分: 1

这是你提供的代码的输出:

['2023', '2022', '2021', '2020', '2019', '2018', '2017', '2016', '2015', '2014', '2013', '2012', '2011', '2010', '2009', '2008', '2007', '2006', '2005', '2004', '2003', '2002', '2001', '2000', '1999', '1998', '1997', '1996', '1995', '1994', '1993', '1992', '1991', '1990']
['World Cups and World Championships', 'IFSC Youth', 'IFSC Asia Adults', 'IFSC Asia Youth', 'IFSC Europe Adults', 'IFSC Europe Youth', 'Games', 'IFSC Pan America Adults', 'IFSC Africa', 'IFSC Paraclimbing', 'IFSC Oceania', 'Other events', 'Masters and Promotional Events']
['IFSC - Climbing World Cup (B) - Meiringen (SUI) 2022', 'IFSC - Climbing World Cup (B,S) - Seoul (KOR) 2022', 'IFSC - Climbing World Cup (B,S) - Salt Lake City (USA) 2022', 'IFSC - Climbing World Cup (B,S) - Salt Lake City (USA) 2022', 'IFSC - Climbing World Cup (B) - Brixen (ITA) 2022', 'IFSC - Climbing World Cup (B,L) - Innsbruck (AUT) 2022', 'IFSC - Climbing World Cup (L,S) - Villars (SUI) 2022', 'IFSC - Climbing World Cup (L,S) - Chamonix (FRA) 2022', 'IFSC - Climbing World Cup (L) - Briançon (FRA) 2022', 'IFSC - Climbing World Cup (L) - Koper (SLO) 2022', 'IFSC - Climbing World Cup (L,S) - Edinburgh (GBR) 2022', 'IFSC - Climbing World Cup (L,S) - Jakarta (INA) 2022', 'IFSC - Climbing World Cup (B&L) - Morioka, Iwate (JPN) 2022']
['BOULDER Men', 'BOULDER Women']

请注意,我仅提供代码输出的翻译部分。如果您需要其他帮助,请随时告诉我。

英文:

There were a few issues.

  1. The elements you are looking for are inside an IFRAME. Using Selenium, you have to switch into the IFRAME context to be able to access those elements. I do this using a WebDriverWait

    WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe.jch-lazyloaded")))
    
  2. Each of the dropdowns contains values that are loaded dynamically so we need to wait for each to have more than one OPTION. We can do this with a simple lambda expression used in a wait

    WebDriverWait(driver, 10).until(lambda d: len(years.options) > 1)
    

    We need to do this for each of the dropdowns to make sure the script works consistently. Without it, I'd get errors on selecting Years maybe 50% of the time.

Taking all this together, the script below runs successfully

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.wait import WebDriverWait

driver = webdriver.Chrome()
driver.maximize_window()
driver.get('https://www.ifsc-climbing.org/index.php/world-competition/last-result')

wait = WebDriverWait(driver, 10)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe.jch-lazyloaded")))
years_select = Select(wait.until(EC.visibility_of_element_located((By.ID, "years"))))
wait.until(lambda d: len(years_select.options) > 1)
years_select.select_by_index(1)
league_select = Select(driver.find_element(By.ID, "indexes"))
wait.until(lambda d: len(league_select.options) > 1)
league_select.select_by_index(1)
events_select = Select(driver.find_element(By.ID, "events"))
wait.until(lambda d: len(events_select.options) > 1)
category_select = Select(driver.find_element(By.ID, "categories"))
wait.until(lambda d: len(category_select.options) > 1)
# print(len(events_select.options))

# Extracts the text from the objects and adds to list
years      = [x.text for x in years_select.options]
leagues    = [x.text for x in league_select.options[1:]]           
events     = [x.text for x in events_select.options[1:]]
categories = [x.text for x in category_select.options[1:]]
            
print(years)
print(leagues)
print(events)
print(categories)

driver.quit()

and outputs

['2023', '2022', '2021', '2020', '2019', '2018', '2017', '2016', '2015', '2014', '2013', '2012', '2011', '2010', '2009', '2008', '2007', '2006', '2005', '2004', '2003', '2002', '2001', '2000', '1999', '1998', '1997', '1996', '1995', '1994', '1993', '1992', '1991', '1990']
['World Cups and World Championships', 'IFSC Youth', 'IFSC Asia Adults', 'IFSC Asia Youth', 'IFSC Europe Adults', 'IFSC Europe Youth', 'Games', 'IFSC Pan America Adults', 'IFSC Africa', 'IFSC Paraclimbing', 'IFSC Oceania', 'Other events', 'Masters and Promotional Events']
['IFSC - Climbing World Cup (B) - Meiringen (SUI) 2022', 'IFSC - Climbing World Cup (B,S) - Seoul (KOR) 2022', 'IFSC - Climbing World Cup (B,S) - Salt Lake City (USA) 2022', 'IFSC - Climbing World Cup (B,S) - Salt Lake City (USA) 2022', 'IFSC - Climbing World Cup (B) - Brixen (ITA) 2022', 'IFSC - Climbing World Cup (B,L) - Innsbruck (AUT) 2022', 'IFSC - Climbing World Cup (L,S) - Villars (SUI) 2022', 'IFSC - Climbing World Cup (L,S) - Chamonix (FRA) 2022', 'IFSC - Climbing World Cup (L) - Briançon (FRA) 2022', 'IFSC - Climbing World Cup (L) - Koper (SLO) 2022', 'IFSC - Climbing World Cup (L,S) - Edinburgh (GBR) 2022', 'IFSC - Climbing World Cup (L,S) - Jakarta (INA) 2022', 'IFSC - Climbing World Cup (B&L) - Morioka, Iwate (JPN) 2022']
['BOULDER Men', 'BOULDER Women']

huangapple
  • 本文由 发表于 2023年4月6日 22:13:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75950540.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定