英文:
Iterate over an unordered list in selenium and output price value
问题
我正在使用Selenium和Python来爬取TrueCar网站的信息。我目前已经设置好了代码,可以找到包含所有车辆列表的无序列表,然后遍历它们以打印出价格。但是,使用本地XPath只会打印出无序列表中第一个元素的价格值。我尝试在for循环中动态更新XPath,使XPath中的索引会变化,但然后我收到一个找不到元素的错误。
以下是我用来更好地演示这个问题的代码:
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://www.truecar.com/used-cars-for-sale/listings/bmw/m4/location-palm-desert-ca/")
listingSection = driver.find_element(By.XPATH, '/html/body/div[3]/div/div[2]/main/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul')
listings = listingSection.find_elements(By.TAG_NAME, "li")
for i, listing in enumerate(listings):
# 无效的元素
price = listing.find_element(By.XPATH, "//*[@id='mainContent']/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul/li[1]/div/div/div[3]/div[2]/div/div[2]/div/div").text # 这只会返回列表中的第一个项目,而不幸的是列表元素没有ID
# 错误
price = listing.find_element(By.XPATH, f"//*[@id='mainContent']/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul/li[{i + 1}]/div/div/div[3]/div[2]/div/div[2]/div/div").text # 这会返回一个错误
print(price)
我知道这个问题的原因可能是XPath中的ID字段与列表元素本身无关,但由于它们没有ID,我不确定该怎么做,因为我是新手在进行网页抓取。
英文:
I am scraping true car using selenium and python. I currently have the code set up to find the unordered list containing all listings and then iterate over them to print the price. Using the local xpath will only ever print the price value of the first element in the unordered list. I tried dynamically updating the xpath in the for loop such that the index in the xpath will change but then I get an error that it cannot find the element.
Here is the code I have to better demonstrate this.
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://www.truecar.com/used-cars-for-sale/listings/bmw/m4/location-palm-desert-ca/")
listingSection = driver.find_element(By.XPATH, '/html/body/div[3]/div/div[2]/main/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul')
listings = listingSection.find_elements(By.TAG_NAME, "li")
for i, listing in enumerate(listings):
// wrong element
price = listing.find_element(By.XPATH, "//*[@id="mainContent"]/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul/li[1]/div/div/div[3]/div[2]/div/div[2]/div/div").text //this will only return the first item in the list and sadly the list elements do not have ids
// error
price = listing.find_element(By.XPATH, f"//*[@id="mainContent"]/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul/li[{i + 1}]/div/div/div[3]/div[2]/div/div[2]/div/div").text // this returns an error
print(price)
I know that the reason for this is likely that the id field for the xpath is not relative to the list elements themselves but since they have no ids I am not sure how to go about this as I am new to webscraping.
答案1
得分: 1
他们没有id
,但他们有data-test
属性,我怀疑是为了Selenium测试而添加的。
listings = driver.find_elements(By.CSS_SELECTOR, '[data-test="vehicleCardPricingBlockPrice"]')
for listing in listings:
print(listing.text)
另外,当使用WebElement
来定位另一个具有xpath
的WebElement
时,你需要告诉它使用当前上下文,使用.
。
listing.find_element(By.XPATH, './/*[@id="mainContent"]')
英文:
They don't have id
s but they do have data-test
attribute, which I suspect was added for the purpose of Selenium testing.
listings = driver.find_elements(By.CSS_SELECTOR, '[data-test="vehicleCardPricingBlockPrice"]')
for listing in listings:
print(listing.text)
As a side not, when using WebElement
to locate another WebElement
with xpath
you need to tell it to use current context with .
listing.find_element(By.XPATH, './/*[@id="mainContent"]')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论