Scrape Google Maps results Website URL selenium

huangapple go评论53阅读模式
英文:

Scrape Google Maps results Website URL selenium

问题

以下是您要的代码部分的翻译:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Start the browser
driver = webdriver.Chrome()

# Open google.de and accept cookies
driver.get("https://www.google.de/")
wait = WebDriverWait(driver, 25)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#L2AGLb > div"))).click()

# Search for "Kinderarzt Kanton Aargau"
search_box = driver.find_element(By.NAME, "q")
search_box.send_keys("Kinderarzt Kanton Aargau")
search_box.submit()

# Switch to Maps tab
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(text(), 'Maps')]")).click()

# Wait for links and extract
results = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[aria-label^='Results for'] > div > div > a")))
for result in results:
    link = result.get_attribute("href")
    print(link)

# Close the browser
driver.quit()

请注意,我只翻译了代码部分,不包括您的问题或其他内容。如果您有任何问题或需要进一步的帮助,请随时告诉我。

英文:

I am trying to search with python via Google Maps and I want to get the URL from the results.

Following steps I approach:

  1. open google
  2. Accept cookies
  3. Search for random thing (in this example "pediatrician in Aargau")
  4. switch to google maps

This is where I get the error, as I am trying to wait for the results to load, but I always get a timeout. I can see in the window that opens, that the results are fully loaded.

Is there anything wrong with my code? I would like to extract the website URL of the results.

Here is the code that I have so far:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Start the browser
driver = webdriver.Chrome()

# Open google.de and accept cookies
driver.get("https://www.google.de/")
wait = WebDriverWait(driver, 25)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#L2AGLb > div"))).click()

# Search for "Kinderarzt Kanton Aargau"
search_box = driver.find_element(By.NAME, "q")
search_box.send_keys("Kinderarzt Kanton Aargau")
search_box.submit()

# Switch to Maps tab
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(text(), 'Maps')]"))).click()

# Wait for links and extract
results = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[aria-label^='Results for'] > div > div > a")))
for result in results:
    link = result.get_attribute("href")
    print(link)


# Close the browser
driver.quit()

PS: I have tried to increase the time for the webdriver, but that won't help. I think it can not find the object and there must be another way to identify the objects.

答案1

得分: 1

以下是您要翻译的内容:

First, you can skip several steps by just building the URL for google maps with the desired search string. Second, your "Wait for results to load" locator was not on my page. My guess is that the class you are using is changing regularly. I used a different CSS selector and found it just fine.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Start the browser
driver = webdriver.Chrome()

# Declare string to search for and encode it
search_string = "Kinderarzt Kanton Aargau"
partial_url = search_string.replace(" ", "+")

# Open google.de and accept cookies
driver.get(f"https://www.google.de/maps/search/{partial_url}/")
wait = WebDriverWait(driver, 25)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#L2AGLb > div"))).click()

# Wait for links and extract
results = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[aria-label^='Results for'] > div > div > a")))
for result in results:
    link = result.get_attribute("href")
    print(link)

# Close the browser
driver.quit()

翻译完成。

英文:

First, you can skip several steps by just building the URL for google maps with the desired search string. Second, your "Wait for results to load" locator was not on my page. My guess is that the class you are using is changing regularly. I used a different CSS selector and found it just fine.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Start the browser
driver = webdriver.Chrome()

# Declare string to search for and encode it
search_string = "Kinderarzt Kanton Aargau"
partial_url = search_string.replace(" ", "+")

# Open google.de and accept cookies
driver.get(f"https://www.google.de/maps/search/{partial_url}/")
wait = WebDriverWait(driver, 25)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#L2AGLb > div"))).click()

# Wait for links and extract
results = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[aria-label^='Results for'] > div > div > a")))
for result in results:
    link = result.get_attribute("href")
    print(link)

# Close the browser
driver.quit()

The result is

https://www.google.de/maps/place/Dr.+med.+Helena+Gerritsma+Schirlo/data=!4m7!3m6!1s0x47903be8d0d4a09d:0xc97d85a6fa076207!8m2!3d47.3906733!4d8.0443884!16s%2Fg%2F1tghc1gd!19sChIJnaDU0Og7kEcRB2IH-qaFfck?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarztpraxis+Dr.+med.+Armin+B%C3%BChler+%26+Thomas+Justen/data=!4m7!3m6!1s0x479069d7b30c674b:0xd04693e64cbc42b0!8m2!3d47.5804824!4d8.2163541!16s%2Fg%2F1ptw0srs4!19sChIJS2cMs9dpkEcRsEK8TOaTRtA?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarztpraxis+Lenzburg/data=!4m7!3m6!1s0x4790160e650976b1:0x5352d33510a53d99!8m2!3d47.3855278!4d8.1753395!16s%2Fg%2F11hz17jwcy!19sChIJsXYJZQ4WkEcRmT2lEDXTUlM?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarzthaus+-+Kinderarztpraxis/data=!4m7!3m6!1s0x47903bf002633251:0xf029086640b016ee!8m2!3d47.391928!4d8.051698!16s%2Fg%2F11cfdn2j8!19sChIJUTJjAvA7kEcR7hawQGYIKfA?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Dr.+med.+Nils+Hammerich/data=!4m7!3m6!1s0x4790160e650976b1:0x7116ed2cc14996ea!8m2!3d47.3856086!4d8.1753854!16s%2Fg%2F1tl0w7qv!19sChIJsXYJZQ4WkEcR6pZJwSztFnE?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarzt+Berikon/data=!4m7!3m6!1s0x47900e152314a493:0x72ca7fe58b7b3a5f!8m2!3d47.3612625!4d8.3674472!16s%2Fg%2F11c311g_px!19sChIJk6QUIxUOkEcRXzp7i-V_ynI?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Dr.+med.+Hana+Balent+Ilitsch/data=!4m7!3m6!1s0x4790697f95fe3a73:0xaff715a22ab56e78!8m2!3d47.5883105!4d8.2882387!16s%2Fg%2F11hyjwg_32!19sChIJczr-lX9pkEcReG61KqIV968?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Dr.+med.+Belzer+Heierling+Tanja/data=!4m7!3m6!1s0x47906d2a4e9698fd:0x6865ac23234b8dc9!8m2!3d47.4637622!4d8.3284463!16s%2Fg%2F1tksm8d9!19sChIJ_ZiWTiptkEcRyY1LIyOsZWg?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Praxis+f%C3%BCr+Kinder-+und+Jugendmedizin+Dr.+Dirk+Bock/data=!4m7!3m6!1s0x47906b5c9071d861:0x516c763f7642c9ff!8m2!3d47.4731839!4d8.1959905!16s%2Fg%2F11mpc9wm91!19sChIJYdhxkFxrkEcR_8lCdj92bFE?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Alleviamed+Kinderarztpraxis+Meisterschwanden/data=!4m7!3m6!1s0x4790193bdf03b5f1:0xfef98e265772814a!8m2!3d47.2956342!4d8.2279202!16s%2Fg%2F11gr2z_z2f!19sChIJ8bUD3zsZkEcRSoFyVyaO-f4?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarztpraxis+Suhrepark+AG/data=!4m7!3m6!1s0x47903c69ae471281:0xcb34880030319dd7!8m2!3d47.3727496!4d8.0809937!16s%2Fg%2F1v3kl_4v!19sChIJgRJHrmk8kEcR150xMACINMs?authuser=0&hl=en&rclk=1

huangapple
  • 本文由 发表于 2023年2月19日 06:10:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75496699.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定