英文:
pandas: best way split excel to list of sheets by 'Sheet' and index?
问题
问题是:
我有一个包含许多工作表的文件('Sheet1'、'Sheet2'...),需要创建一个新文件,其中包含其中一些工作表(例如:'Sheet16'、'Sheet23'),或者通过工作表的索引来选择,例如0(第一个)、15、22、-1(最后一个)。
可以帮助我吗?
英文:
The problem is:
I have file with many Sheets ('Sheet1',Sheet2...) and need to create one new file with some of the Sheets (for example: 'Sheet16', 'Sheet23') or by index of Sheet for example 0(for first), 15, 22, -1 (for last).
Can help me?
答案1
得分: 2
如Suraj已经提到的,您可以使用xlsxwriter
来实现所需的目标。以下是使用pandas
的一个稍微简化和更互动的方法来处理您的问题。
要从Excel文件中提取单独的工作表,您可以使用pandas
的ExcelFile
属性。以下是如何将这些工作表提取为数据框:
xls = pd.ExcelFile(filename)
target_sheets = ['Sheet16', 'Sheet23', 'Sheet32']
或者您可以将target_sheets
设置为索引的列表,例如[16, 23, 32]
:
for sheet in target_sheets:
if isinstance(sheet, int):
sheet = xls.sheet_names[sheet]
df = pd.read_excel(xls, sheet)
print(df)
这将为您提供单独的工作表,它们是可以根据需要查看/修改的pandas数据框。如果您想要创建一个单独的Excel文件,将列出的工作表合并在一起,您可以使用pd.ExcelWriter
。以下是一个示例:
writer = pd.ExcelWriter(outfile)
for sheet in target_sheets:
if isinstance(sheet, int):
sheet = xls.sheet_names[sheet]
df = pd.read_excel(xls, sheet)
print(df)
df.to_excel(writer, sheet_name=sheet, index=False)
writer.save()
随时调整变量,并确保包含import pandas as pd
。
英文:
As Suraj has already mentioned, you can use xlsxwriter
to achieve the desired objective. Here's a slightly simpler and more interactive approach to your problem, using pandas
.
To extract individual sheets from an Excel file, you can use the ExcelFile
attribute of pandas
. Here's how you can extract the sheets as dataframes:
xls = pd.ExcelFile(filename)
target_sheets = ['Sheet16', 'Sheet23', 'Sheet32']
or you can set target_sheets
as a list of indices, say [16, 23, 32]
for sheet in target_sheets:
if isinstance(sheet, int):
sheet = xls.sheet_names[sheet]
df = pd.read_excel(xls, sheet)
print(df)
This will give you the individual sheets as pandas dataframes that you can view/modify as required. If you want to create a separate Excel file combining the listed number of sheets, you can use pd.ExcelWriter
. Here's an example:
writer = pd.ExcelWriter(outfile)
for sheet in target_sheets:
if isinstance(sheet, int):
sheet = xls.sheet_names[sheet]
df = pd.read_excel(xls, sheet)
print(df)
df.to_excel(writer, sheet_name=sheet, index=False)
writer.save()
Feel free to tweak with the variables and make sure to include import pandas as pd
.
答案2
得分: 1
正如BigBen在评论中提到的,您应该使用与xl相关的库。
一种方法是使用xlsxwriter,以下代码可能适用于您的用例:
import xlsxwriter
# 打开原始工作簿
wb1 = xlsxwriter.Workbook('workbook.xlsx')
# 打开目标工作簿
wb2 = xlsxwriter.Workbook('dest.xlsx')
# 定义需要复制的工作表名称
sheets = ["Sheet1", "Sheet2"]
for sheet in sheets:
ws1 = wb1.get_sheet_by_name(sheet)
# 创建新工作表
ws2 = wb2.add_worksheet(sheet)
# 复制数据
ws2.copy_worksheet(ws1)
# 关闭工作簿
wb1.close()
wb2.close()
openpyxl也可以使用类似上面示例的代码结构。
英文:
As BigBen mentioned in the comments you should use xl specific libraries.
One way would be by using xlsxwriter, a code like this might work for your use case:
import xlsxwriter
# Open org workbook
wb1 = xlsxwriter.Workbook('workbook.xlsx')
# Open the destination workbook
wb2 = xlsxwriter.Workbook('dest.xlsx')
# define the sheetnames that need to be copied
sheets = ["Sheet1", "Sheet2"]
for sheet in sheets:
ws1 = wb1.get_sheet_by_name(sheet)
# Create a new worksheet
ws2 = wb2.add_worksheet(sheet)
# Copy data
ws2.copy_worksheet(ws1)
# Close the workbooks
wb_source.close()
wb_dest.close()
openpyxl can also be used with a similar code structure as the above example
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论