2023年7月20日 14:18:00go评论119阅读模式

英文:

Iterate over an unordered list in selenium and output price value

问题

我正在使用Selenium和Python来爬取TrueCar网站的信息。我目前已经设置好了代码，可以找到包含所有车辆列表的无序列表，然后遍历它们以打印出价格。但是，使用本地XPath只会打印出无序列表中第一个元素的价格值。我尝试在for循环中动态更新XPath，使XPath中的索引会变化，但然后我收到一个找不到元素的错误。

以下是我用来更好地演示这个问题的代码：

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://www.truecar.com/used-cars-for-sale/listings/bmw/m4/location-palm-desert-ca/")
listingSection = driver.find_element(By.XPATH, '/html/body/div[3]/div/div[2]/main/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul')
listings = listingSection.find_elements(By.TAG_NAME, "li")
for i, listing in enumerate(listings):
    # 无效的元素
    price = listing.find_element(By.XPATH, "//*[@id='mainContent']/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul/li[1]/div/div/div[3]/div[2]/div/div[2]/div/div").text # 这只会返回列表中的第一个项目，而不幸的是列表元素没有ID
    # 错误
    price = listing.find_element(By.XPATH, f"//*[@id='mainContent']/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul/li[{i + 1}]/div/div/div[3]/div[2]/div/div[2]/div/div").text # 这会返回一个错误
    print(price)

我知道这个问题的原因可能是XPath中的ID字段与列表元素本身无关，但由于它们没有ID，我不确定该怎么做，因为我是新手在进行网页抓取。

英文:

I am scraping true car using selenium and python. I currently have the code set up to find the unordered list containing all listings and then iterate over them to print the price. Using the local xpath will only ever print the price value of the first element in the unordered list. I tried dynamically updating the xpath in the for loop such that the index in the xpath will change but then I get an error that it cannot find the element.
Here is the code I have to better demonstrate this.


driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(&quot;https://www.truecar.com/used-cars-for-sale/listings/bmw/m4/location-palm-desert-ca/&quot;)
listingSection = driver.find_element(By.XPATH, &#39;/html/body/div[3]/div/div[2]/main/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul&#39;)
listings = listingSection.find_elements(By.TAG_NAME, &quot;li&quot;)
for i, listing in enumerate(listings):
// wrong element
    price = listing.find_element(By.XPATH, &quot;//*[@id=&quot;mainContent&quot;]/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul/li[1]/div/div/div[3]/div[2]/div/div[2]/div/div&quot;).text //this will only return the first item in the list and sadly the list elements do not have ids
// error
    price = listing.find_element(By.XPATH, f&quot;//*[@id=&quot;mainContent&quot;]/div/div[2]/div[1]/div[2]/div[2]/div[2]/ul/li[{i + 1}]/div/div/div[3]/div[2]/div/div[2]/div/div&quot;).text // this returns an error
    print(price)

I know that the reason for this is likely that the id field for the xpath is not relative to the list elements themselves but since they have no ids I am not sure how to go about this as I am new to webscraping.

答案1

得分: 1

他们没有id，但他们有data-test属性，我怀疑是为了Selenium测试而添加的。

listings = driver.find_elements(By.CSS_SELECTOR, '[data-test="vehicleCardPricingBlockPrice"]')
for listing in listings:
    print(listing.text)

另外，当使用WebElement来定位另一个具有xpath的WebElement时，你需要告诉它使用当前上下文，使用.。

listing.find_element(By.XPATH, './/*[@id="mainContent"]')

英文:

They don't have ids but they do have data-test attribute, which I suspect was added for the purpose of Selenium testing.

listings = driver.find_elements(By.CSS_SELECTOR, &#39;[data-test=&quot;vehicleCardPricingBlockPrice&quot;]&#39;)
for listing in listings:
    print(listing.text)

As a side not, when using WebElement to locate another WebElement with xpath you need to tell it to use current context with .

listing.find_element(By.XPATH, &#39;.//*[@id=&quot;mainContent&quot;]&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Selenium中迭代遍历无序列表并输出价格数值。

问题

答案1

如何将这段代码更改为 Polars？” TypeError: ‘GroupBy’ 对象不可订阅”

pandas – 在多列中筛选具有相同值的行

为什么 `.loc()` 函数返回一个空的 Series？

driver = webdriver.Chrome() :: 使用Selenium方法时的问题 – 如何解决

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。