在Selenium中迭代遍历无序列表并输出价格数值。

huangapple go评论89阅读模式
英文:

Iterate over an unordered list in selenium and output price value

问题

我正在使用Selenium和Python来爬取TrueCar网站的信息。我目前已经设置好了代码,可以找到包含所有车辆列表的无序列表,然后遍历它们以打印出价格。但是,使用本地XPath只会打印出无序列表中第一个元素的价格值。我尝试在for循环中动态更新XPath,使XPath中的索引会变化,但然后我收到一个找不到元素的错误。

以下是我用来更好地演示这个问题的代码:

driver = webdriver.Chrome(ChromeDriverManager().install())

driver.get("https://www.truecar.com/used-cars-for-sale/listings/bmw/m4/location-palm-desert-ca/")

listingSection = driver.find_element(By.XPATH, '/html/body/div[3]/div/div[2]/main/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul')
listings = listingSection.find_elements(By.TAG_NAME, "li")

for i, listing in enumerate(listings):
    # 无效的元素
    price = listing.find_element(By.XPATH, "//*[@id='mainContent']/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul/li[1]/div/div/div[3]/div[2]/div/div[2]/div/div").text # 这只会返回列表中的第一个项目,而不幸的是列表元素没有ID
    # 错误
    price = listing.find_element(By.XPATH, f"//*[@id='mainContent']/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul/li[{i + 1}]/div/div/div[3]/div[2]/div/div[2]/div/div").text # 这会返回一个错误
    print(price)

我知道这个问题的原因可能是XPath中的ID字段与列表元素本身无关,但由于它们没有ID,我不确定该怎么做,因为我是新手在进行网页抓取。

英文:

I am scraping true car using selenium and python. I currently have the code set up to find the unordered list containing all listings and then iterate over them to print the price. Using the local xpath will only ever print the price value of the first element in the unordered list. I tried dynamically updating the xpath in the for loop such that the index in the xpath will change but then I get an error that it cannot find the element.
Here is the code I have to better demonstrate this.


driver = webdriver.Chrome(ChromeDriverManager().install())

driver.get("https://www.truecar.com/used-cars-for-sale/listings/bmw/m4/location-palm-desert-ca/")


listingSection = driver.find_element(By.XPATH, '/html/body/div[3]/div/div[2]/main/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul')
listings = listingSection.find_elements(By.TAG_NAME, "li")

for i, listing in enumerate(listings):
// wrong element
    price = listing.find_element(By.XPATH, "//*[@id="mainContent"]/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul/li[1]/div/div/div[3]/div[2]/div/div[2]/div/div").text //this will only return the first item in the list and sadly the list elements do not have ids
// error
    price = listing.find_element(By.XPATH, f"//*[@id="mainContent"]/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul/li[{i + 1}]/div/div/div[3]/div[2]/div/div[2]/div/div").text // this returns an error
    print(price)

I know that the reason for this is likely that the id field for the xpath is not relative to the list elements themselves but since they have no ids I am not sure how to go about this as I am new to webscraping.

答案1

得分: 1

他们没有id,但他们有data-test属性,我怀疑是为了Selenium测试而添加的。

listings = driver.find_elements(By.CSS_SELECTOR, '[data-test="vehicleCardPricingBlockPrice"]')
for listing in listings:
    print(listing.text)

另外,当使用WebElement来定位另一个具有xpathWebElement时,你需要告诉它使用当前上下文,使用.

listing.find_element(By.XPATH, './/*[@id="mainContent"]')
英文:

They don't have ids but they do have data-test attribute, which I suspect was added for the purpose of Selenium testing.

listings = driver.find_elements(By.CSS_SELECTOR, '[data-test="vehicleCardPricingBlockPrice"]')
for listing in listings:
    print(listing.text)

As a side not, when using WebElement to locate another WebElement with xpath you need to tell it to use current context with .

listing.find_element(By.XPATH, './/*[@id="mainContent"]')

huangapple
  • 本文由 发表于 2023年7月20日 14:18:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76727150.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定