Selenium通过Selenium保持循环导航(Python)

huangapple go评论86阅读模式
英文:

Selenium navigation through selenium keep looping (python)

问题

我刚刚开始使用Selenium来从网页上抓取表格。因此,我使用Selenium实现了网页的导航。但是,当我运行代码时,结果一直在循环。我很确定我的代码写错了。我应该如何修复代码,以使Selenium导航正常工作?

  1. import requests
  2. import csv
  3. from bs4 import BeautifulSoup as bs
  4. from selenium import webdriver
  5. import time # 添加导入时间模块
  6. browser = webdriver.Chrome()
  7. browser.get('https://dir.businessworld.com.my/15/posts/16-Computers-The-Internet')
  8. soup = bs(browser.page_source, 'html.parser') # 指定解析器为'html.parser'
  9. filename = "C:/Users/User/Desktop/test.csv"
  10. csv_writer = csv.writer(open(filename, 'w', newline='')) # 添加newline=''以防止行之间的额外空行
  11. pages_remaining = True
  12. while pages_remaining:
  13. for tr in soup.find_all("tr"):
  14. data = []
  15. for th in tr.find_all("th"):
  16. data.append(th.text.strip())
  17. if data:
  18. print("Inserting headers: {}".format(','.join(data)))
  19. csv_writer.writerow(data)
  20. continue
  21. for td in tr.find_all("td"):
  22. if td.a:
  23. data.append(td.a.text.strip())
  24. else:
  25. data.append(td.text.strip())
  26. if data:
  27. print("Inserting data: {}".format(','.join(data)))
  28. csv_writer.writerow(data)
  29. try:
  30. next_link = browser.find_element_by_xpath('//*[@id="content"]/div[3]/table/tbody/tr/td[2]/table/tbody/tr/td[6]/a')
  31. next_link.click()
  32. time.sleep(30)
  33. except NoSuchElementException:
  34. pages_remaining = False

请注意,我对代码进行了一些修改,包括指定了HTML解析器,修复了CSV写入的newline问题,并添加了时间模块以等待页面加载。希望这可以帮助您解决问题。

英文:

I'm just started using selenium to scrape the table from webpage. So, I implemented the navigation of webpage using selenium. But, the the result keep looping when I run the code. Pretty sure that I wrote the code wrong. What should I fix the code so the navigation selenium works?

  1. import requests
  2. import csv
  3. from bs4 import BeautifulSoup as bs
  4. from selenium import webdriver
  5. browser=webdriver.Chrome()
  6. browser.get('https://dir.businessworld.com.my/15/posts/16-Computers-The-Internet')
  7. # url = requests.get("https://dir.businessworld.com.my/15/posts/16-Computers-The-Internet/")
  8. soup=bs(browser.page_source)
  9. filename = "C:/Users/User/Desktop/test.csv"
  10. csv_writer = csv.writer(open(filename, 'w'))
  11. pages_remaining = True
  12. while pages_remaining:
  13. for tr in soup.find_all("tr"):
  14. data = []
  15. # for headers ( entered only once - the first time - )
  16. for th in tr.find_all("th"):
  17. data.append(th.text)
  18. if data:
  19. print("Inserting headers : {}".format(','.join(data)))
  20. csv_writer.writerow(data)
  21. continue
  22. for td in tr.find_all("td"):
  23. if td.a:
  24. data.append(td.a.text.strip())
  25. else:
  26. data.append(td.text.strip())
  27. if data:
  28. print("Inserting data: {}".format(','.join(data)))
  29. csv_writer.writerow(data)
  30. try:
  31. #Checks if there are more pages with links
  32. next_link = driver.find_element_by_xpath('//*[@id="content"]/div[3]/table/tbody/tr/td[2]/table/tbody/tr/td[6]/a ]')
  33. next_link.click()
  34. time.sleep(30)
  35. except NoSuchElementException:
  36. rows_remaining = False

答案1

得分: 1

检查页面上是否存在“下一页”按钮,然后单击,否则退出循环。

  1. if len(browser.find_elements_by_xpath("//a[contains(.,'下一页')]")) > 0:
  2. browser.find_element_by_xpath("//a[contains(.,'下一页')]").click()
  3. else:
  4. break

不需要使用 time.sleep(),而是使用 WebDriverWait

  1. WebDriverWait(browser, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.postlisting")))
英文:

Check if there any next button present on the page then click else exit from while loop.

  1. if len(browser.find_elements_by_xpath("//a[contains(.,'Next')]"))>0:
  2. browser.find_element_by_xpath("//a[contains(.,'Next')]").click()
  3. else:
  4. break

No need to use time.sleep() instead use WebDriverWait()


Code:

  1. import csv
  2. from bs4 import BeautifulSoup as bs
  3. from selenium import webdriver
  4. from selenium.webdriver.support.ui import WebDriverWait
  5. from selenium.webdriver.support import expected_conditions as EC
  6. from selenium.webdriver.common.by import By
  7. browser=webdriver.Chrome()
  8. browser.get('https://dir.businessworld.com.my/15/posts/16-Computers-The-Internet')
  9. WebDriverWait(browser, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.postlisting")))
  10. soup=bs(browser.page_source)
  11. filename = "C:/Users/User/Desktop/test.csv"
  12. csv_writer = csv.writer(open(filename, 'w'))
  13. pages_remaining = True
  14. while pages_remaining:
  15. WebDriverWait(browser,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table.postlisting")))
  16. for tr in soup.find_all("tr"):
  17. data = []
  18. # for headers ( entered only once - the first time - )
  19. for th in tr.find_all("th"):
  20. data.append(th.text)
  21. if data:
  22. print("Inserting headers : {}".format(','.join(data)))
  23. csv_writer.writerow(data)
  24. continue
  25. for td in tr.find_all("td"):
  26. if td.a:
  27. data.append(td.a.text.strip())
  28. else:
  29. data.append(td.text.strip())
  30. if data:
  31. print("Inserting data: {}".format(','.join(data)))
  32. csv_writer.writerow(data)
  33. if len(browser.find_elements_by_xpath("//a[contains(.,'Next')]"))>0:
  34. browser.find_element_by_xpath("//a[contains(.,'Next')]").click()
  35. else:
  36. break

huangapple
  • 本文由 发表于 2020年1月6日 17:59:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/59610011.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定