2023年2月18日 03:06:27go评论100阅读模式

英文:

How to insert a loop with selenium?

问题

这是您要翻译的代码部分：

I need to loop the Google Maps page, we have several pages to be scraped, but I can only scrape the first one
Here is the code I use to scrape the first page, I would like to scrape all the pages but I don't know how to do that.
[Link][1]
url = "https://www.google.com/search?q=contabilidade+em+manaus&amp;biw=1366&amp;bih=657&amp;tbm=lcl&amp;sxsrf=AJOqlzXTyAs7rej8A4k9tuuY9FmGpdOjLg:1676314056027&amp;ei=yIXqY8-tAavQ1sQP0Yy2wAo&amp;ved=0ahUKEwjPsd2-lJP9AhUrqJUCHVGGDagQ4dUDCAk&amp;uact=5&amp;oq=contabilidade+em+manaus&amp;gs_lcp=Cg1nd3Mtd2l6LWxvY2FsEAMyBAgjECcyBggAEBYQHjIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyCQgAEBYQHhDxBDIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyBggAEBYQHjIGCAAQFhAeMgkIABAWEB4Q8QRQAFgAYPYDaABwAHgAgAHWAYgB1gGSAQMyLTGYAQDAAQE&amp;sclient=gws-wiz-local&amp;pccc=1#rlfi=hd:;si:;mv:[[-3.0446025000000003,-59.9553221],[-3.1346859,-60.061026600000005]];t...

如果您有任何其他问题或需要进一步的帮助，请随时告诉我。

英文:

I need to loop the Google Maps page, we have several pages to be scraped, but I can only scrape the first one

Here is the code I use to scrape the first page, I would like to scrape all the pages but I don't know how to do that.

Link

url = &quot;https://www.google.com/search?q=contabilidade+em+manaus&amp;biw=1366&amp;bih=657&amp;tbm=lcl&amp;sxsrf=AJOqlzXTyAs7rej8A4k9tuuY9FmGpdOjLg:1676314056027&amp;ei=yIXqY8-tAavQ1sQP0Yy2wAo&amp;ved=0ahUKEwjPsd2-lJP9AhUrqJUCHVGGDagQ4dUDCAk&amp;uact=5&amp;oq=contabilidade+em+manaus&amp;gs_lcp=Cg1nd3Mtd2l6LWxvY2FsEAMyBAgjECcyBggAEBYQHjIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyCQgAEBYQHhDxBDIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyBggAEBYQHjIGCAAQFhAeMgkIABAWEB4Q8QRQAFgAYPYDaABwAHgAgAHWAYgB1gGSAQMyLTGYAQDAAQE&amp;sclient=gws-wiz-local&amp;pccc=1#rlfi=hd:;si:;mv:[[-3.0446025000000003,-59.9553221],[-3.1346859,-60.061026600000005]];tbs:lrf:!1m4!1u3!2m2!3m1!1e1!1m4!1u2!2m2!2m1!1e1!2m1!1e2!2m1!1e3!3sIAE,lf:1,lf_ui:14&quot;
    wait = WebDriverWait(driver, 20)
    ########################################################################################################################
    procurar = &quot;contabilidade em Curitiba&quot;
    ########################################################################################################################
    links = []
    Nome = []
    Endereco = []
    Telefone = []
    ########################################################################################################################
         
    driver.get(url)
    driver.maximize_window()
    sleep(2)
    print(&quot;O que procura?&quot;)
    driver.find_element(By. XPATH, &quot;//input[@value=&#39;contabilidade em manaus&#39;]&quot;).clear()
    sleep(2)
    input_buscar = driver.find_element(By. XPATH, &quot;//input[@aria-label=&#39;Pesquisar&#39;]&quot;)
    input_buscar.send_keys(procurar, Keys. ENTER)
    sleep(2)
    ########################################################################################################################
    
    while True:
          try:
                classe_empresas = driver.find_elements(By.XPATH, &quot;(//div[@class=&#39;rllt__details&#39;])&quot;)
                for empresa in classe_empresas:
                      empresa.click()
                      sleep(2)
    
                      nome = driver.find_element(By.XPATH, &quot;//h2[@data-attrid=&#39;title&#39;]&quot;).text
                      print(nome)
                      Nome.append(nome)
    
                      endereco = driver.find_element(By. XPATH, &quot;//span[@class=&#39;LrzXr&#39;]&quot;).get_attribute(&quot;innerHTML&quot;)
                      print(endereco)
                      Endereco.append(endereco)
    
                      try:
                            tel = driver.find_element(By.CSS_SELECTOR, &quot;.LrzXr.zdqRlf.kno-fv&quot;).text
                            print(tel)
                            Telefone.append(tel)
                      except:
                            sem_telefone = &quot;N&#227;o Tem Telefone Cadastrado&quot;
                            Telefone.append(tel)
                            print(sem_telefone)
                
                driver.find_element(By.XPATH, &quot;//span[normalize-space()=&#39;Mais&#39;]&quot;).click()
          except:
                break
                
                                                
    data = {&#39;Nome&#39;: Nome, &#39;Endere&#231;o&#39;: Endereco, &#39;Telefone&#39;: Telefone} 
    df = pd.DataFrame(data)
                                              
    df.to_excel(&#39;GoogleMaps.xlsx&#39;, engine=&#39;xlsxwriter&#39;)
    print(df)

答案1

得分: 1

你正在展示有效的方法，看到你正在抓取的URL将会很有帮助。然而，在你发送的Google Maps URL中，有一个名为"start:x"的最后一个参数，其中x是一个定义页面上显示的项目起始位置的数字。你可以使用该值来在抓取所有结果时进行更改。

这是该数值：

https://www.google.com/search?q=contabilidade+&biw=1...;start:20 //最终属性

此外，你可以通过循环点击页面底部的页数来实现：

driver.find_element(By.XPATH, "/html/body/div[6]/div/div[9]/div1/div/div[2]/div[2]/div/div/div/div/div/div/div/div/div/div[2]/div/table/tbody/tr/td[" + yourIterateVar + "]/a").click()

其中yourIterateVar从第2页开始，一直循环直到出现错误（没有更多页面）。

英文:

You are showing what works, it would be great to see which url you are scraping. However, in the Google Maps URL you sent, there is a last parameter called "start:x" where x is a number that defines the start of the items displayed on the page.
You can use that value to change it as you scrape all the results.

Here is the value:

https://www.google.com/search?q=contabilidade+&amp;biw=1...;start:20 //the final prop

Also you can click on the number of the page in the bottom with a loop´:

driver.find_element(By.XPATH, &quot;/html/body/div[6]/div/div[9]/div[1]/div/div[2]/div[2]/div/div/div/div/div/div/div/div/div/div[2]/div/table/tbody/tr/td[&quot; + yourIterateVar + &quot;]/a&quot;).click()

where yourIterateVar start at page 2 until error (no more pages).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Selenium插入循环？

问题

答案1

音频流在Flask中使用生成器/生成器不起作用。

搜索字符串中的插入符 (^) 的方法是：

如何使用Python SDK为容器设置UID和GID？

在同一行上打印两个语句。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。