Selenium, the problem of not being able to obtain all price and date information of a product



the code that follows the price of the cheapest price product searched on the site at a certain time at a certain time interval for the same seller:

import pandas as pd
import undetected_chromedriver as uc
from import By
from selenium.webdriver.common.keys import Keys
from import WebDriverWait
from import expected_conditions as EC
import time
import matplotlib.pyplot as plt
from time import sleep
import inspect
import os
from bs4 import BeautifulSoup
import requests

# Get the search term and tracking period from the user
search_term = input(&quot;Please enter the name of the product you want to search: &quot;)
months =input(&quot;Please enter the number of months you want to track the product: &quot;)

# To ensure that the user enters a non-string value 
while not months.isdigit():
    print(&quot;Warning: Please enter a valid integer value for the number of months.&quot;)
    months = input(&quot;Please enter the number of months you want to track the product: &quot;)
months = int(months)

# Start the web driver and go to the Hepsiburada homepage
options = uc.ChromeOptions()
options.add_argument(&#39;--blink-settings=imagesEnabled=false&#39;) # disable images for loading of page faster
prefs = {&quot;profile.default_content_setting_values.notifications&quot; : 2}
driver = uc.Chrome(options=options)

url = &#39;;
wait = WebDriverWait(driver, 15)

# close cookies bar
wait.until(EC.element_to_be_clickable((By.ID, &#39;onetrust-accept-btn-handler&#39;))).click()

# Enter the search term in the search box and press Enter
search_box = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, &#39;theme-IYtZzqYPto8PhOx3ku3c&#39;)))
search_box.send_keys(search_term + Keys.RETURN)

# load all products
number_of_products = int(wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, &#39;searchResultSummaryBar-AVnHBWRNB0_veFy34hco&#39;)))[1].text)
### visibility_of_all_elements_located is a wait strategy in Selenium that checks if all elements of a certain type are visible on the page and waits until they become visible before continuing.

number_of_loaded_products = 0
while number_of_loaded_products &lt; number_of_products:
    loaded_products = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, &#39;li[class*=productListContent][id]&#39;)))
    number_of_loaded_products = len(loaded_products)
    driver.execute_script(&#39;arguments[0].scrollIntoView({block: &quot;center&quot;, behavior: &quot;smooth&quot;});&#39;, loaded_products[-1])

# Get the link, name, price and seller of all the products
product = {key:[] for key in [&#39;name&#39;,&#39;price&#39;,&#39;seller&#39;,&#39;url&#39;]}
product[&#39;name&#39;]  = [h3.text for h3 in driver.find_elements(By.CSS_SELECTOR, &#39;h3[data-test-id=product-card-name]&#39;)]
product[&#39;url&#39;]   = [a.get_attribute(&#39;href&#39;) for a in driver.find_elements(By.CSS_SELECTOR, &#39;a[class*=ProductCard]&#39;)]
product[&#39;price&#39;] = [float(div.text.replace(&#39;TL&#39;,&#39;&#39;).replace(&#39;,&#39;,&#39;.&#39;)) for div in driver.find_elements(By.CSS_SELECTOR, &#39;div[data-test-id=price-current-price]&#39;)]
for i,url in enumerate(product[&#39;url&#39;]):
    print(f&#39;Search seller names {i+1}/{number_of_loaded_products}&#39;, end=&#39;\r&#39;)
    product[&#39;seller&#39;] += [wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, &#39;.seller a&#39;))).text]
    product[&#39;url&#39;][i] = driver.current_url # useful to replace some long urls
# Sort by price in ascending order
import pandas as pd
product_list = pd.DataFrame(product).sort_values(by=&#39;price&#39;).to_dict(&#39;list&#39;)

print(f&quot;\nThe product selected from the search results is:&quot;+
      f&quot;\nname:   {product_list[&#39;name&#39;][0]}&quot;+
      f&quot;\nprice:  {product_list[&#39;price&#39;][0]}&quot;+
      f&quot;\nseller: {product_list[&#39;seller&#39;][0]}&quot;+
      f&quot;\nurl:    {product_list[&#39;url&#39;][0]}&quot;)
# Go to the page of the selected product

# Get the prices
prices = []
dates = []
while len(prices) &lt; months:
    price_elems = driver.find_elements(By.XPATH, &quot;//div[@class=&#39;price-area&#39;]//strong[@itemprop=&#39;price&#39;]&quot;)
    date_elems = driver.find_elements(By.XPATH, &quot;//div[@class=&#39;product-info&#39;]//span[@class=&#39;product-info-date&#39;]&quot;)
    for price_elem, date_elem in zip(price_elems, date_elems):
        price = float(price_elem.text.replace(&#39;.&#39;, &#39;&#39;).replace(&#39;,&#39;, &#39;.&#39;))
        date = pd.to_datetime(date_elem.text, format=&#39;%d %B %Y, %H:%M&#39;)
    next_button = driver.find_element(By.XPATH, &quot;//a[@class=&#39;page-next&#39;]&quot;)
    if &#39;disabled&#39; in next_button.get_attribute(&#39;class&#39;):
        driver.execute_script(&quot;arguments[0].click();&quot;, next_button)

# Create a DataFrame and select the data for the last X months
df = pd.DataFrame({&#39;Date&#39;: dates, &#39;Price&#39;: prices})
df[&#39;Hour&#39;] = df[&#39;Date&#39;].dt.hour
df = df.groupby([&#39;Date&#39;, &#39;Hour&#39;]).mean().reset_index()
start_date = - pd.DateOffset(months=months)
end_date =
df = df.loc[(df[&#39;Date&#39;] &gt;= start_date) &amp; (df[&#39;Date&#39;] &lt;= end_date)]

# Create the plot
plt.plot(df[&#39;Date&#39;], df[&#39;Price&#39;])
plt.title(&#39;Price Changes of {} in the Last {} Months&#39;.format(product_list[&#39;name&#39;][0], months))
plt.ylabel(&#39;Price (TL)&#39;)

I am trying to create a graph of the price of a product searched on the website with the cheapest price, for the same seller during a certain month, at the same time (For instance, let's say the product whose price we follow is "pınar süt 1lt"). However, I could not draw the graph because I could not obtain the "prices" and "dates" information.This list is empty. How can I obtain this graph?

Focusing point:The piece of code under the '# Get the prices' comment is working incorrectly. The code up to this part is working properly.


I'll try to direct you towards a solution. As I understand you need to track product price changes and process them somehow. You can do it by periodically running a script that collects product prices and stores data somewhere for future analysis.

I see currently you use WebDriver to grab price data. My first suggestion is to try to use Web API instead. Communication with a website via HTTP is much faster and more stable than via UI. Automating UI requires you to deal with different issues related to element inaccessibility, tricky waits, and unexpected overlapping controls. Web API gives you a clear interface to request and receive needed data without the overhead to handle UI. Speaking shortly - UI is for humans, API is for machines. Use API if possible.

In case you need to stay with WebDriver, check the following to address data extraction issues:

  • the required data is displayed on the page before extraction

Make sure the steps leading to the target data page are completed successfully. It may happen some automated interactions fail silently and the needed data is not shown.
Button clicks may be skipped, element state waits may be wrong thus letting the script extract data when it's not displayed.

Watch your script execution in real-time and make sure it successfully passes all steps before data extraction. If some steps fail, put sleeps initially just to verify it is a page state issue, then replace on custom waits if so. Try to use different click methods if some fail.

  • make sure the required data is in view, not outside the visible page area

To extract data from some controls they need to be in view. Scroll the page if needed

  • parent iframe is selected before interaction with a child element

Some UI controls may be put inside iframes. If interaction with a control fails, check if it is inside iframe. WebDriver needs to be switched to the parent iframe before interaction with controls inside.
Use WebInspector in a browser to find if a given UI control is inside iframe

  • try to execute a script in another browser

Ideally, WebDriver script should work the same for all browsers but in fact issues happen and a button click that fails in one browser may works in another one.
If your script looks perfect but element interaction still fails, try another browser.

