Python selenium 不获取表格,请查看

huangapple go评论108阅读模式
英文:

Python selenium not taking tables , please review

问题

以下是代码的翻译部分:

  1. # 导入必要的库
  2. from selenium import webdriver
  3. from selenium.webdriver.chrome.options import Options
  4. import pandas as pd
  5. # 指定网页链接
  6. url = 'https://www.zaubacorp.com/company-list'
  7. # 配置Selenium选项
  8. options = Options()
  9. options.add_argument('--headless')
  10. # 创建Chrome驱动程序的新实例
  11. driver = webdriver.Chrome(options=options)
  12. # 跳转到网页
  13. driver.get(url)
  14. # 等待页面加载
  15. driver.implicitly_wait(10)
  16. # 使用'tag_name'定位策略找到页面上的所有表格元素
  17. tables = driver.find_elements('tag name', 'table')
  18. # 遍历表格以找到需要的表格
  19. table = None
  20. for t in tables:
  21. if 'list-group-item' in t.get_attribute('class'):
  22. table = t
  23. break
  24. if table:
  25. # 提取表格数据
  26. data = []
  27. for row in table.find_elements('tag name', 'tr'):
  28. rowData = []
  29. for cell in row.find_elements('tag name', 'td'):
  30. rowData.append(cell.text)
  31. data.append(rowData)
  32. # 将表格数据存储在DataFrame中
  33. results = pd.DataFrame(data)
  34. # 打印结果
  35. print(results)
  36. else:
  37. print('未找到表格.')
  38. # 关闭Selenium驱动程序
  39. driver.quit()

以上是您提供的代码的中文翻译。

英文:

so below the main code i have written, website is https://www.zaubacorp.com/company-list

  1. from selenium import webdriver
  2. from selenium.webdriver.chrome.options import Options
  3. import pandas as pd
  4. url = 'https://www.zaubacorp.com/company-list'
  5. # Set up Selenium options
  6. options = Options()
  7. options.add_argument('--headless')
  8. # Create a new instance of the Chrome driver
  9. driver = webdriver.Chrome(options=options)
  10. # Navigate to the webpage
  11. driver.get(url)
  12. # Wait for the page to load
  13. driver.implicitly_wait(10)
  14. # Find all table elements on the page using the 'tag_name' locator strategy
  15. tables = driver.find_elements('tag name', 'table')
  16. # Iterate through the tables to find the one you need
  17. table = None
  18. for t in tables:
  19. if 'list-group-item' in t.get_attribute('class'):
  20. table = t
  21. break
  22. if table:
  23. # Extract the table data
  24. data = []
  25. for row in table.find_elements('tag name', 'tr'):
  26. rowData = []
  27. for cell in row.find_elements('tag name', 'td'):
  28. rowData.append(cell.text)
  29. data.append(rowData)
  30. # Store the table data in a DataFrame
  31. results = pd.DataFrame(data)
  32. # Print the results
  33. print(results)
  34. else:
  35. print('Table not found.')
  36. # Close the Selenium driver
  37. driver.quit()

So the above code is not working to get details of the table , i am not even looping it to get details of others pages yet, please check and let me know where i am wrong?

答案1

得分: 1

import requests
import pandas as pd
from bs4 import BeautifulSoup
import re

url = 'https://www.zaubacorp.com/company-list/p-1-company.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
last_page = soup.find_all('a', text=lambda text: text and '>>' in text)[0]['href']
match = int(re.search(r'p-(\d+)', last_page).group(1))

dfs = []
tot = match
for page in range(1, match+1):
url = f'https://www.zaubacorp.com/company-list/p-{page}-company.html'
print(f'Page: {page} of {tot}')
dfs.append(pd.read_html(url)[0])

df = pd.concat(dfs)

英文:

Any reason you're using selenium? You can just have pandas parse the tables. Will take a while to go through all the pages though.

  1. import requests
  2. import pandas as pd
  3. from bs4 import BeautifulSoup
  4. import re
  5. url = 'https://www.zaubacorp.com/company-list/p-1-company.html'
  6. response = requests.get(url)
  7. soup = BeautifulSoup(response.text, 'html.parser')
  8. last_page = soup.find_all('a', text=lambda text: text and '>>' in text)[0]['href']
  9. match = int(re.search(r'p-(\d+)',last_page).group(1))
  10. dfs = []
  11. tot = match
  12. for page in range(1, match+1):
  13. url = f'https://www.zaubacorp.com/company-list/p-{page}-company.html'
  14. print(f'Page: {page} of {tot}')
  15. dfs.append(pd.read_html(url)[0])
  16. df = pd.concat(dfs)

答案2

得分: 0

你犯了一个小错误。Find_Elements不接受两个字符串,而是一个By选项和一个字符串:

  1. tables = driver.find_elements(By.TAG_NAME, 'table')
英文:

You made a small mistake. Find_Elements does not take 2 strings, but a By option and a string:

  1. tables = driver.find_elements(By.TAG_NAME, 'table')

huangapple
  • 本文由 发表于 2023年5月21日 18:04:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76299324.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定