如何使用Selenium插入循环?

huangapple go评论56阅读模式
英文:

How to insert a loop with selenium?

问题

这是您要翻译的代码部分:

I need to loop the Google Maps page, we have several pages to be scraped, but I can only scrape the first one

Here is the code I use to scrape the first page, I would like to scrape all the pages but I don't know how to do that.

[Link][1]

url = "https://www.google.com/search?q=contabilidade+em+manaus&biw=1366&bih=657&tbm=lcl&sxsrf=AJOqlzXTyAs7rej8A4k9tuuY9FmGpdOjLg:1676314056027&ei=yIXqY8-tAavQ1sQP0Yy2wAo&ved=0ahUKEwjPsd2-lJP9AhUrqJUCHVGGDagQ4dUDCAk&uact=5&oq=contabilidade+em+manaus&gs_lcp=Cg1nd3Mtd2l6LWxvY2FsEAMyBAgjECcyBggAEBYQHjIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyCQgAEBYQHhDxBDIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyBggAEBYQHjIGCAAQFhAeMgkIABAWEB4Q8QRQAFgAYPYDaABwAHgAgAHWAYgB1gGSAQMyLTGYAQDAAQE&sclient=gws-wiz-local&pccc=1#rlfi=hd:;si:;mv:[[-3.0446025000000003,-59.9553221],[-3.1346859,-60.061026600000005]];t...

如果您有任何其他问题或需要进一步的帮助,请随时告诉我。

英文:

I need to loop the Google Maps page, we have several pages to be scraped, but I can only scrape the first one

Here is the code I use to scrape the first page, I would like to scrape all the pages but I don't know how to do that.

Link

url = "https://www.google.com/search?q=contabilidade+em+manaus&biw=1366&bih=657&tbm=lcl&sxsrf=AJOqlzXTyAs7rej8A4k9tuuY9FmGpdOjLg:1676314056027&ei=yIXqY8-tAavQ1sQP0Yy2wAo&ved=0ahUKEwjPsd2-lJP9AhUrqJUCHVGGDagQ4dUDCAk&uact=5&oq=contabilidade+em+manaus&gs_lcp=Cg1nd3Mtd2l6LWxvY2FsEAMyBAgjECcyBggAEBYQHjIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyCQgAEBYQHhDxBDIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyBggAEBYQHjIGCAAQFhAeMgkIABAWEB4Q8QRQAFgAYPYDaABwAHgAgAHWAYgB1gGSAQMyLTGYAQDAAQE&sclient=gws-wiz-local&pccc=1#rlfi=hd:;si:;mv:[[-3.0446025000000003,-59.9553221],[-3.1346859,-60.061026600000005]];tbs:lrf:!1m4!1u3!2m2!3m1!1e1!1m4!1u2!2m2!2m1!1e1!2m1!1e2!2m1!1e3!3sIAE,lf:1,lf_ui:14"
    wait = WebDriverWait(driver, 20)
    ########################################################################################################################
    procurar = "contabilidade em Curitiba"
    ########################################################################################################################
    links = []
    Nome = []
    Endereco = []
    Telefone = []
    ########################################################################################################################
         
    driver.get(url)
    driver.maximize_window()
    sleep(2)
    print("O que procura?")
    driver.find_element(By. XPATH, "//input[@value='contabilidade em manaus']").clear()
    sleep(2)
    input_buscar = driver.find_element(By. XPATH, "//input[@aria-label='Pesquisar']")
    input_buscar.send_keys(procurar, Keys. ENTER)
    sleep(2)
    ########################################################################################################################
    
    while True:
          try:
                classe_empresas = driver.find_elements(By.XPATH, "(//div[@class='rllt__details'])")
                for empresa in classe_empresas:
                      empresa.click()
                      sleep(2)
    
                      nome = driver.find_element(By.XPATH, "//h2[@data-attrid='title']").text
                      print(nome)
                      Nome.append(nome)
    
                      endereco = driver.find_element(By. XPATH, "//span[@class='LrzXr']").get_attribute("innerHTML")
                      print(endereco)
                      Endereco.append(endereco)
    
                      try:
                            tel = driver.find_element(By.CSS_SELECTOR, ".LrzXr.zdqRlf.kno-fv").text
                            print(tel)
                            Telefone.append(tel)
                      except:
                            sem_telefone = "Não Tem Telefone Cadastrado"
                            Telefone.append(tel)
                            print(sem_telefone)
                
                driver.find_element(By.XPATH, "//span[normalize-space()='Mais']").click()
          except:
                break
                
                                                
    data = {'Nome': Nome, 'Endereço': Endereco, 'Telefone': Telefone} 
    df = pd.DataFrame(data)
                                              
    df.to_excel('GoogleMaps.xlsx', engine='xlsxwriter')
    print(df)

答案1

得分: 1

你正在展示有效的方法,看到你正在抓取的URL将会很有帮助。然而,在你发送的Google Maps URL中,有一个名为"start:x"的最后一个参数,其中x是一个定义页面上显示的项目起始位置的数字。你可以使用该值来在抓取所有结果时进行更改。

这是该数值:

https://www.google.com/search?q=contabilidade+&biw=1...;start:20 //最终属性

此外,你可以通过循环点击页面底部的页数来实现:

driver.find_element(By.XPATH, "/html/body/div[6]/div/div[9]/div1/div/div[2]/div[2]/div/div/div/div/div/div/div/div/div/div[2]/div/table/tbody/tr/td[" + yourIterateVar + "]/a").click()

其中yourIterateVar从第2页开始,一直循环直到出现错误(没有更多页面)。

英文:

You are showing what works, it would be great to see which url you are scraping. However, in the Google Maps URL you sent, there is a last parameter called "start:x" where x is a number that defines the start of the items displayed on the page.
You can use that value to change it as you scrape all the results.

Here is the value:

https://www.google.com/search?q=contabilidade+&biw=1...;start:20 //the final prop

Also you can click on the number of the page in the bottom with a loop´:

driver.find_element(By.XPATH, "/html/body/div[6]/div/div[9]/div[1]/div/div[2]/div[2]/div/div/div/div/div/div/div/div/div/div[2]/div/table/tbody/tr/td[" + yourIterateVar + "]/a").click()

where yourIterateVar start at page 2 until error (no more pages).

huangapple
  • 本文由 发表于 2023年2月18日 03:06:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/75488362.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定