pandas:将Excel拆分为’Sheet’和索引的工作表列表的最佳方法是什么?

huangapple go评论76阅读模式
英文:

pandas: best way split excel to list of sheets by 'Sheet' and index?

问题

问题是:

我有一个包含许多工作表的文件('Sheet1'、'Sheet2'...),需要创建一个新文件,其中包含其中一些工作表(例如:'Sheet16'、'Sheet23'),或者通过工作表的索引来选择,例如0(第一个)、15、22、-1(最后一个)。
可以帮助我吗?

英文:

The problem is:

I have file with many Sheets ('Sheet1',Sheet2...) and need to create one new file with some of the Sheets (for example: 'Sheet16', 'Sheet23') or by index of Sheet for example 0(for first), 15, 22, -1 (for last).
Can help me?

答案1

得分: 2

如Suraj已经提到的,您可以使用xlsxwriter来实现所需的目标。以下是使用pandas的一个稍微简化和更互动的方法来处理您的问题。

要从Excel文件中提取单独的工作表,您可以使用pandasExcelFile属性。以下是如何将这些工作表提取为数据框:

xls = pd.ExcelFile(filename)
target_sheets = ['Sheet16', 'Sheet23', 'Sheet32']

或者您可以将target_sheets设置为索引的列表,例如[16, 23, 32]

for sheet in target_sheets:
    if isinstance(sheet, int):
        sheet = xls.sheet_names[sheet]
    df = pd.read_excel(xls, sheet)
    print(df)

这将为您提供单独的工作表,它们是可以根据需要查看/修改的pandas数据框。如果您想要创建一个单独的Excel文件,将列出的工作表合并在一起,您可以使用pd.ExcelWriter。以下是一个示例:

writer = pd.ExcelWriter(outfile)

for sheet in target_sheets:
    if isinstance(sheet, int):
        sheet = xls.sheet_names[sheet]
    df = pd.read_excel(xls, sheet)
    print(df)
    df.to_excel(writer, sheet_name=sheet, index=False)

writer.save()

随时调整变量,并确保包含import pandas as pd

英文:

As Suraj has already mentioned, you can use xlsxwriter to achieve the desired objective. Here's a slightly simpler and more interactive approach to your problem, using pandas.

To extract individual sheets from an Excel file, you can use the ExcelFile attribute of pandas. Here's how you can extract the sheets as dataframes:

xls = pd.ExcelFile(filename)
target_sheets = ['Sheet16', 'Sheet23', 'Sheet32']

or you can set target_sheets as a list of indices, say [16, 23, 32]

for sheet in target_sheets:
    if isinstance(sheet, int):
        sheet = xls.sheet_names[sheet]
    df = pd.read_excel(xls, sheet)
    print(df)

This will give you the individual sheets as pandas dataframes that you can view/modify as required. If you want to create a separate Excel file combining the listed number of sheets, you can use pd.ExcelWriter. Here's an example:

writer = pd.ExcelWriter(outfile)

for sheet in target_sheets:
    if isinstance(sheet, int):
        sheet = xls.sheet_names[sheet]
    df = pd.read_excel(xls, sheet)
    print(df)
    df.to_excel(writer, sheet_name=sheet, index=False)
    
writer.save()

Feel free to tweak with the variables and make sure to include import pandas as pd.

答案2

得分: 1

正如BigBen在评论中提到的,您应该使用与xl相关的库。

一种方法是使用xlsxwriter,以下代码可能适用于您的用例:

import xlsxwriter

# 打开原始工作簿
wb1 = xlsxwriter.Workbook('workbook.xlsx')
# 打开目标工作簿
wb2 = xlsxwriter.Workbook('dest.xlsx')

# 定义需要复制的工作表名称
sheets = ["Sheet1", "Sheet2"]
for sheet in sheets:
   ws1 = wb1.get_sheet_by_name(sheet)
   # 创建新工作表
   ws2 = wb2.add_worksheet(sheet)
   # 复制数据
   ws2.copy_worksheet(ws1)

# 关闭工作簿
wb1.close()
wb2.close()

openpyxl也可以使用类似上面示例的代码结构。

英文:

As BigBen mentioned in the comments you should use xl specific libraries.

One way would be by using xlsxwriter, a code like this might work for your use case:

import xlsxwriter

# Open org workbook
wb1 = xlsxwriter.Workbook('workbook.xlsx')
# Open the destination workbook
wb2 = xlsxwriter.Workbook('dest.xlsx')

# define the sheetnames that need to be copied
sheets = ["Sheet1", "Sheet2"]
for sheet in sheets:
   ws1 = wb1.get_sheet_by_name(sheet)
   # Create a new worksheet
   ws2 = wb2.add_worksheet(sheet)
   # Copy data
   ws2.copy_worksheet(ws1)

# Close the workbooks
wb_source.close()
wb_dest.close()

openpyxl can also be used with a similar code structure as the above example

huangapple
  • 本文由 发表于 2023年8月5日 03:04:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76838583.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定