问题

以下是代码的翻译部分：

# 导入必要的库
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import pandas as pd

# 指定网页链接
url = 'https://www.zaubacorp.com/company-list'

# 配置Selenium选项
options = Options()
options.add_argument('--headless')

# 创建Chrome驱动程序的新实例
driver = webdriver.Chrome(options=options)

# 跳转到网页
driver.get(url)

# 等待页面加载
driver.implicitly_wait(10)

# 使用'tag_name'定位策略找到页面上的所有表格元素
tables = driver.find_elements('tag name', 'table')

# 遍历表格以找到需要的表格
table = None
for t in tables:
    if 'list-group-item' in t.get_attribute('class'):
        table = t
        break

if table:
    # 提取表格数据
    data = []
    for row in table.find_elements('tag name', 'tr'):
        rowData = []
        for cell in row.find_elements('tag name', 'td'):
            rowData.append(cell.text)
        data.append(rowData)

    # 将表格数据存储在DataFrame中
    results = pd.DataFrame(data)

    # 打印结果
    print(results)
else:
    print('未找到表格.')

# 关闭Selenium驱动程序
driver.quit()

以上是您提供的代码的中文翻译。

英文:

so below the main code i have written, website is https://www.zaubacorp.com/company-list

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import pandas as pd

url = &#39;https://www.zaubacorp.com/company-list&#39;

# Set up Selenium options
options = Options()
options.add_argument(&#39;--headless&#39;)

# Create a new instance of the Chrome driver
driver = webdriver.Chrome(options=options)

# Navigate to the webpage
driver.get(url)

# Wait for the page to load
driver.implicitly_wait(10)

# Find all table elements on the page using the &#39;tag_name&#39; locator strategy
tables = driver.find_elements(&#39;tag name&#39;, &#39;table&#39;)

# Iterate through the tables to find the one you need
table = None
for t in tables:
    if &#39;list-group-item&#39; in t.get_attribute(&#39;class&#39;):
        table = t
        break

if table:
    # Extract the table data
    data = []
    for row in table.find_elements(&#39;tag name&#39;, &#39;tr&#39;):
        rowData = []
        for cell in row.find_elements(&#39;tag name&#39;, &#39;td&#39;):
            rowData.append(cell.text)
        data.append(rowData)

    # Store the table data in a DataFrame
    results = pd.DataFrame(data)

    # Print the results
    print(results)
else:
    print(&#39;Table not found.&#39;)

# Close the Selenium driver
driver.quit()

So the above code is not working to get details of the table , i am not even looping it to get details of others pages yet, please check and let me know where i am wrong?

答案1

得分: 1

import requests
import pandas as pd
from bs4 import BeautifulSoup
import re

url = 'https://www.zaubacorp.com/company-list/p-1-company.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
last_page = soup.find_all('a', text=lambda text: text and '>>' in text)[0]['href']
match = int(re.search(r'p-(\d+)', last_page).group(1))

dfs = []
tot = match
for page in range(1, match+1):
url = f'https://www.zaubacorp.com/company-list/p-{page}-company.html'
print(f'Page: {page} of {tot}')
dfs.append(pd.read_html(url)[0])

df = pd.concat(dfs)

英文:

Any reason you're using selenium? You can just have pandas parse the tables. Will take a while to go through all the pages though.

import requests
import pandas as pd
from bs4 import BeautifulSoup
import re


url = &#39;https://www.zaubacorp.com/company-list/p-1-company.html&#39;
response = requests.get(url)
soup = BeautifulSoup(response.text, &#39;html.parser&#39;)
last_page = soup.find_all(&#39;a&#39;, text=lambda text: text and &#39;&gt;&gt;&#39; in text)[0][&#39;href&#39;]
match = int(re.search(r&#39;p-(\d+)&#39;,last_page).group(1))

dfs = []
tot = match
for page in range(1, match+1):
    url = f&#39;https://www.zaubacorp.com/company-list/p-{page}-company.html&#39;
    print(f&#39;Page: {page} of {tot}&#39;)
    dfs.append(pd.read_html(url)[0])
    
df = pd.concat(dfs)

答案2

得分: 0

你犯了一个小错误。Find_Elements不接受两个字符串，而是一个By选项和一个字符串：

tables = driver.find_elements(By.TAG_NAME, 'table')

英文:

You made a small mistake. Find_Elements does not take 2 strings, but a By option and a string:

tables = driver.find_elements(By.TAG_NAME, &#39;table&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python selenium 不获取表格，请查看

问题

答案1

答案2

Pandas根据条件和分组，递增数据框中的每第n行。

Python错误：当列表具有多个值时，列表赋值索引超出范围

错误连接 IMAP 服务器使用 python imaplib 和 mailcow

Python中使用*args定义一个Callable，同时强制执行第一个参数。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论