Python selenium 不获取表格,请查看

huangapple go评论74阅读模式
英文:

Python selenium not taking tables , please review

问题

以下是代码的翻译部分:

# 导入必要的库
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import pandas as pd

# 指定网页链接
url = 'https://www.zaubacorp.com/company-list'

# 配置Selenium选项
options = Options()
options.add_argument('--headless')

# 创建Chrome驱动程序的新实例
driver = webdriver.Chrome(options=options)

# 跳转到网页
driver.get(url)

# 等待页面加载
driver.implicitly_wait(10)

# 使用'tag_name'定位策略找到页面上的所有表格元素
tables = driver.find_elements('tag name', 'table')

# 遍历表格以找到需要的表格
table = None
for t in tables:
    if 'list-group-item' in t.get_attribute('class'):
        table = t
        break

if table:
    # 提取表格数据
    data = []
    for row in table.find_elements('tag name', 'tr'):
        rowData = []
        for cell in row.find_elements('tag name', 'td'):
            rowData.append(cell.text)
        data.append(rowData)

    # 将表格数据存储在DataFrame中
    results = pd.DataFrame(data)

    # 打印结果
    print(results)
else:
    print('未找到表格.')

# 关闭Selenium驱动程序
driver.quit()

以上是您提供的代码的中文翻译。

英文:

so below the main code i have written, website is https://www.zaubacorp.com/company-list

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import pandas as pd

url = 'https://www.zaubacorp.com/company-list'

# Set up Selenium options
options = Options()
options.add_argument('--headless')

# Create a new instance of the Chrome driver
driver = webdriver.Chrome(options=options)

# Navigate to the webpage
driver.get(url)

# Wait for the page to load
driver.implicitly_wait(10)

# Find all table elements on the page using the 'tag_name' locator strategy
tables = driver.find_elements('tag name', 'table')

# Iterate through the tables to find the one you need
table = None
for t in tables:
    if 'list-group-item' in t.get_attribute('class'):
        table = t
        break

if table:
    # Extract the table data
    data = []
    for row in table.find_elements('tag name', 'tr'):
        rowData = []
        for cell in row.find_elements('tag name', 'td'):
            rowData.append(cell.text)
        data.append(rowData)

    # Store the table data in a DataFrame
    results = pd.DataFrame(data)

    # Print the results
    print(results)
else:
    print('Table not found.')

# Close the Selenium driver
driver.quit()

So the above code is not working to get details of the table , i am not even looping it to get details of others pages yet, please check and let me know where i am wrong?

答案1

得分: 1

import requests
import pandas as pd
from bs4 import BeautifulSoup
import re

url = 'https://www.zaubacorp.com/company-list/p-1-company.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
last_page = soup.find_all('a', text=lambda text: text and '>>' in text)[0]['href']
match = int(re.search(r'p-(\d+)', last_page).group(1))

dfs = []
tot = match
for page in range(1, match+1):
url = f'https://www.zaubacorp.com/company-list/p-{page}-company.html'
print(f'Page: {page} of {tot}')
dfs.append(pd.read_html(url)[0])

df = pd.concat(dfs)

英文:

Any reason you're using selenium? You can just have pandas parse the tables. Will take a while to go through all the pages though.

import requests
import pandas as pd
from bs4 import BeautifulSoup
import re


url = 'https://www.zaubacorp.com/company-list/p-1-company.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
last_page = soup.find_all('a', text=lambda text: text and '>>' in text)[0]['href']
match = int(re.search(r'p-(\d+)',last_page).group(1))

dfs = []
tot = match
for page in range(1, match+1):
    url = f'https://www.zaubacorp.com/company-list/p-{page}-company.html'
    print(f'Page: {page} of {tot}')
    dfs.append(pd.read_html(url)[0])
    
df = pd.concat(dfs)

答案2

得分: 0

你犯了一个小错误。Find_Elements不接受两个字符串,而是一个By选项和一个字符串:

tables = driver.find_elements(By.TAG_NAME, 'table')
英文:

You made a small mistake. Find_Elements does not take 2 strings, but a By option and a string:

tables = driver.find_elements(By.TAG_NAME, 'table')

huangapple
  • 本文由 发表于 2023年5月21日 18:04:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76299324.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定