英文:
Finding an element on website using Selenium for Python that has newlines inside its class name
问题
I'm trying to scrape some data from LinkedIn but I noticed that the elements id change each time I load the page with Selenium. So I tried using class name to find all the elements but the class names have newline inside of them, preventing me from scraping the website.
example of class with newlines here
I tried doing the below:
job_test = "ember-view jobs-search-results__list-item occludable-update p0 relative scaffold-layout__list-item\n
"
job_list = driver.find_elements(By.CLASS_NAME, job_test)
I even tried this:
job_test = '''ember-view jobs-search-results__list-item occludable-update p0 relative scaffold-layout__list-item
'''
job_list = driver.find_elements(By.CLASS_NAME, job_test)
But it does not show me any elements when I print job_list. What do I do here?
英文:
I'm trying to scrape some data from LinkedIn but I noticed that the elements id change each time I load the page with Selenium. So I tried using class name to find all the elements but the class names have newline inside of them, preventing me from scraping the website.
example of class with newlines here
I tried doing the below:
job_test = "ember-view jobs-search-results__list-item occludable-update p0 relative scaffold-layout__list-item\n \n \n "
job_list = driver.find_elements(By.CLASS_NAME, job_test)
I even tried this:
job_test = '''ember-view jobs-search-results__list-item occludable-update p0 relative scaffold-layout__list-item
'''
job_list = driver.find_elements(By.CLASS_NAME, job_test)
But it does not show me any elements when I print job_list. What do I do here?
答案1
得分: 2
By.CLASS_NAME
只接受一个类名,所以你不能传递多个类名。请参考:使用Selenium时出现无效选择器:不允许复合类名错误
解决方案
要创建作业列表,你需要使用WebDriverWait 来诱发 visibility_of_all_elements_located(),你可以使用以下任一定位策略:
-
使用 CLASS_NAME:
driver.get('https://www.linkedin.com/jobs/search/?currentJobId=3425809260&keywords=python') job_list = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "jobs-search-results__list-item")))
-
使用 CSS_SELECTOR:
driver.get('https://www.linkedin.com/jobs/search/?currentJobId=3425809260&keywords=python') job_list = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "li.jobs-search-results__list-item")))
-
使用 XPATH:
driver.get('https://www.linkedin.com/jobs/search/?currentJobId=3425809260&keywords=python') job_list = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[contains(@class, 'jobs-search-results__list-item')]")))
英文:
By.CLASS_NAME
accepts only one classname, so you can't pass multiple. See: Invalid selector: Compound class names not permitted error using Selenium
Solution
To create the job list you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:
-
Using CLASS_NAME:
driver.get('https://www.linkedin.com/jobs/search/?currentJobId=3425809260&keywords=python') job_list = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "jobs-search-results__list-item")))
-
Using CSS_SELECTOR:
driver.get('https://www.linkedin.com/jobs/search/?currentJobId=3425809260&keywords=python') job_list = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "li.jobs-search-results__list-item")))
-
Using XPATH:
driver.get('https://www.linkedin.com/jobs/search/?currentJobId=3425809260&keywords=python') job_list = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[contains(@class, 'jobs-search-results__list-item')]")))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论