如何使用Selenium插入循环?

huangapple go评论100阅读模式
英文:

How to insert a loop with selenium?

问题

这是您要翻译的代码部分:

  1. I need to loop the Google Maps page, we have several pages to be scraped, but I can only scrape the first one
  2. Here is the code I use to scrape the first page, I would like to scrape all the pages but I don't know how to do that.
  3. [Link][1]
  4. url = "https://www.google.com/search?q=contabilidade+em+manaus&biw=1366&bih=657&tbm=lcl&sxsrf=AJOqlzXTyAs7rej8A4k9tuuY9FmGpdOjLg:1676314056027&ei=yIXqY8-tAavQ1sQP0Yy2wAo&ved=0ahUKEwjPsd2-lJP9AhUrqJUCHVGGDagQ4dUDCAk&uact=5&oq=contabilidade+em+manaus&gs_lcp=Cg1nd3Mtd2l6LWxvY2FsEAMyBAgjECcyBggAEBYQHjIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyCQgAEBYQHhDxBDIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyBggAEBYQHjIGCAAQFhAeMgkIABAWEB4Q8QRQAFgAYPYDaABwAHgAgAHWAYgB1gGSAQMyLTGYAQDAAQE&sclient=gws-wiz-local&pccc=1#rlfi=hd:;si:;mv:[[-3.0446025000000003,-59.9553221],[-3.1346859,-60.061026600000005]];t...

如果您有任何其他问题或需要进一步的帮助,请随时告诉我。

英文:

I need to loop the Google Maps page, we have several pages to be scraped, but I can only scrape the first one

Here is the code I use to scrape the first page, I would like to scrape all the pages but I don't know how to do that.

Link

  1. url = "https://www.google.com/search?q=contabilidade+em+manaus&biw=1366&bih=657&tbm=lcl&sxsrf=AJOqlzXTyAs7rej8A4k9tuuY9FmGpdOjLg:1676314056027&ei=yIXqY8-tAavQ1sQP0Yy2wAo&ved=0ahUKEwjPsd2-lJP9AhUrqJUCHVGGDagQ4dUDCAk&uact=5&oq=contabilidade+em+manaus&gs_lcp=Cg1nd3Mtd2l6LWxvY2FsEAMyBAgjECcyBggAEBYQHjIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyCQgAEBYQHhDxBDIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyBggAEBYQHjIGCAAQFhAeMgkIABAWEB4Q8QRQAFgAYPYDaABwAHgAgAHWAYgB1gGSAQMyLTGYAQDAAQE&sclient=gws-wiz-local&pccc=1#rlfi=hd:;si:;mv:[[-3.0446025000000003,-59.9553221],[-3.1346859,-60.061026600000005]];tbs:lrf:!1m4!1u3!2m2!3m1!1e1!1m4!1u2!2m2!2m1!1e1!2m1!1e2!2m1!1e3!3sIAE,lf:1,lf_ui:14"
  2. wait = WebDriverWait(driver, 20)
  3. ########################################################################################################################
  4. procurar = "contabilidade em Curitiba"
  5. ########################################################################################################################
  6. links = []
  7. Nome = []
  8. Endereco = []
  9. Telefone = []
  10. ########################################################################################################################
  11. driver.get(url)
  12. driver.maximize_window()
  13. sleep(2)
  14. print("O que procura?")
  15. driver.find_element(By. XPATH, "//input[@value='contabilidade em manaus']").clear()
  16. sleep(2)
  17. input_buscar = driver.find_element(By. XPATH, "//input[@aria-label='Pesquisar']")
  18. input_buscar.send_keys(procurar, Keys. ENTER)
  19. sleep(2)
  20. ########################################################################################################################
  21. while True:
  22. try:
  23. classe_empresas = driver.find_elements(By.XPATH, "(//div[@class='rllt__details'])")
  24. for empresa in classe_empresas:
  25. empresa.click()
  26. sleep(2)
  27. nome = driver.find_element(By.XPATH, "//h2[@data-attrid='title']").text
  28. print(nome)
  29. Nome.append(nome)
  30. endereco = driver.find_element(By. XPATH, "//span[@class='LrzXr']").get_attribute("innerHTML")
  31. print(endereco)
  32. Endereco.append(endereco)
  33. try:
  34. tel = driver.find_element(By.CSS_SELECTOR, ".LrzXr.zdqRlf.kno-fv").text
  35. print(tel)
  36. Telefone.append(tel)
  37. except:
  38. sem_telefone = "Não Tem Telefone Cadastrado"
  39. Telefone.append(tel)
  40. print(sem_telefone)
  41. driver.find_element(By.XPATH, "//span[normalize-space()='Mais']").click()
  42. except:
  43. break
  44. data = {'Nome': Nome, 'Endereço': Endereco, 'Telefone': Telefone}
  45. df = pd.DataFrame(data)
  46. df.to_excel('GoogleMaps.xlsx', engine='xlsxwriter')
  47. print(df)

答案1

得分: 1

你正在展示有效的方法,看到你正在抓取的URL将会很有帮助。然而,在你发送的Google Maps URL中,有一个名为"start:x"的最后一个参数,其中x是一个定义页面上显示的项目起始位置的数字。你可以使用该值来在抓取所有结果时进行更改。

这是该数值:

https://www.google.com/search?q=contabilidade+&biw=1...;start:20 //最终属性

此外,你可以通过循环点击页面底部的页数来实现:

driver.find_element(By.XPATH, "/html/body/div[6]/div/div[9]/div1/div/div[2]/div[2]/div/div/div/div/div/div/div/div/div/div[2]/div/table/tbody/tr/td[" + yourIterateVar + "]/a").click()

其中yourIterateVar从第2页开始,一直循环直到出现错误(没有更多页面)。

英文:

You are showing what works, it would be great to see which url you are scraping. However, in the Google Maps URL you sent, there is a last parameter called "start:x" where x is a number that defines the start of the items displayed on the page.
You can use that value to change it as you scrape all the results.

Here is the value:

  1. https://www.google.com/search?q=contabilidade+&biw=1...;start:20 //the final prop

Also you can click on the number of the page in the bottom with a loop´:

  1. driver.find_element(By.XPATH, "/html/body/div[6]/div/div[9]/div[1]/div/div[2]/div[2]/div/div/div/div/div/div/div/div/div/div[2]/div/table/tbody/tr/td[" + yourIterateVar + "]/a").click()

where yourIterateVar start at page 2 until error (no more pages).

huangapple
  • 本文由 发表于 2023年2月18日 03:06:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/75488362.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定