英文:
How to insert a loop with selenium?
问题
这是您要翻译的代码部分:
I need to loop the Google Maps page, we have several pages to be scraped, but I can only scrape the first one
Here is the code I use to scrape the first page, I would like to scrape all the pages but I don't know how to do that.
[Link][1]
url = "https://www.google.com/search?q=contabilidade+em+manaus&biw=1366&bih=657&tbm=lcl&sxsrf=AJOqlzXTyAs7rej8A4k9tuuY9FmGpdOjLg:1676314056027&ei=yIXqY8-tAavQ1sQP0Yy2wAo&ved=0ahUKEwjPsd2-lJP9AhUrqJUCHVGGDagQ4dUDCAk&uact=5&oq=contabilidade+em+manaus&gs_lcp=Cg1nd3Mtd2l6LWxvY2FsEAMyBAgjECcyBggAEBYQHjIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyCQgAEBYQHhDxBDIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyBggAEBYQHjIGCAAQFhAeMgkIABAWEB4Q8QRQAFgAYPYDaABwAHgAgAHWAYgB1gGSAQMyLTGYAQDAAQE&sclient=gws-wiz-local&pccc=1#rlfi=hd:;si:;mv:[[-3.0446025000000003,-59.9553221],[-3.1346859,-60.061026600000005]];t...
如果您有任何其他问题或需要进一步的帮助,请随时告诉我。
英文:
I need to loop the Google Maps page, we have several pages to be scraped, but I can only scrape the first one
Here is the code I use to scrape the first page, I would like to scrape all the pages but I don't know how to do that.
url = "https://www.google.com/search?q=contabilidade+em+manaus&biw=1366&bih=657&tbm=lcl&sxsrf=AJOqlzXTyAs7rej8A4k9tuuY9FmGpdOjLg:1676314056027&ei=yIXqY8-tAavQ1sQP0Yy2wAo&ved=0ahUKEwjPsd2-lJP9AhUrqJUCHVGGDagQ4dUDCAk&uact=5&oq=contabilidade+em+manaus&gs_lcp=Cg1nd3Mtd2l6LWxvY2FsEAMyBAgjECcyBggAEBYQHjIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyCQgAEBYQHhDxBDIJCAAQFhAeEPEEMgkIABAWEB4Q8QQyBggAEBYQHjIGCAAQFhAeMgkIABAWEB4Q8QRQAFgAYPYDaABwAHgAgAHWAYgB1gGSAQMyLTGYAQDAAQE&sclient=gws-wiz-local&pccc=1#rlfi=hd:;si:;mv:[[-3.0446025000000003,-59.9553221],[-3.1346859,-60.061026600000005]];tbs:lrf:!1m4!1u3!2m2!3m1!1e1!1m4!1u2!2m2!2m1!1e1!2m1!1e2!2m1!1e3!3sIAE,lf:1,lf_ui:14"
wait = WebDriverWait(driver, 20)
########################################################################################################################
procurar = "contabilidade em Curitiba"
########################################################################################################################
links = []
Nome = []
Endereco = []
Telefone = []
########################################################################################################################
driver.get(url)
driver.maximize_window()
sleep(2)
print("O que procura?")
driver.find_element(By. XPATH, "//input[@value='contabilidade em manaus']").clear()
sleep(2)
input_buscar = driver.find_element(By. XPATH, "//input[@aria-label='Pesquisar']")
input_buscar.send_keys(procurar, Keys. ENTER)
sleep(2)
########################################################################################################################
while True:
try:
classe_empresas = driver.find_elements(By.XPATH, "(//div[@class='rllt__details'])")
for empresa in classe_empresas:
empresa.click()
sleep(2)
nome = driver.find_element(By.XPATH, "//h2[@data-attrid='title']").text
print(nome)
Nome.append(nome)
endereco = driver.find_element(By. XPATH, "//span[@class='LrzXr']").get_attribute("innerHTML")
print(endereco)
Endereco.append(endereco)
try:
tel = driver.find_element(By.CSS_SELECTOR, ".LrzXr.zdqRlf.kno-fv").text
print(tel)
Telefone.append(tel)
except:
sem_telefone = "Não Tem Telefone Cadastrado"
Telefone.append(tel)
print(sem_telefone)
driver.find_element(By.XPATH, "//span[normalize-space()='Mais']").click()
except:
break
data = {'Nome': Nome, 'Endereço': Endereco, 'Telefone': Telefone}
df = pd.DataFrame(data)
df.to_excel('GoogleMaps.xlsx', engine='xlsxwriter')
print(df)
答案1
得分: 1
你正在展示有效的方法,看到你正在抓取的URL将会很有帮助。然而,在你发送的Google Maps URL中,有一个名为"start:x"的最后一个参数,其中x是一个定义页面上显示的项目起始位置的数字。你可以使用该值来在抓取所有结果时进行更改。
这是该数值:
https://www.google.com/search?q=contabilidade+&biw=1...;start:20 //最终属性
此外,你可以通过循环点击页面底部的页数来实现:
driver.find_element(By.XPATH, "/html/body/div[6]/div/div[9]/div1/div/div[2]/div[2]/div/div/div/div/div/div/div/div/div/div[2]/div/table/tbody/tr/td[" + yourIterateVar + "]/a").click()
其中yourIterateVar从第2页开始,一直循环直到出现错误(没有更多页面)。
英文:
You are showing what works, it would be great to see which url you are scraping. However, in the Google Maps URL you sent, there is a last parameter called "start:x" where x is a number that defines the start of the items displayed on the page.
You can use that value to change it as you scrape all the results.
Here is the value:
https://www.google.com/search?q=contabilidade+&biw=1...;start:20 //the final prop
Also you can click on the number of the page in the bottom with a loop´:
driver.find_element(By.XPATH, "/html/body/div[6]/div/div[9]/div[1]/div/div[2]/div[2]/div/div/div/div/div/div/div/div/div/div[2]/div/table/tbody/tr/td[" + yourIterateVar + "]/a").click()
where yourIterateVar start at page 2 until error (no more pages).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论