英文:
I want to create a summary page in the first page of my excel
问题
我有一个包含数据的CSV表格。我想根据文本文件将CSV中的数据类型减少到27种类型。筛选后,我希望Excel文件包含27个工作表,每个工作表上都有一个数据类型。现在,我想创建一个摘要工作表,第一列包含数据类型的名称,第二列包含各自工作表上每种数据类型的总行数。在最后一步中,我希望我的摘要工作表排在第一位。我的代码似乎在最后一步之前都能正常工作,但摘要工作表无法放在第一位。
import pandas as pd
# 将CSV文件加载到DataFrame中
df = pd.read_csv('my_file.csv')
# 从文本文件中读取数据类型
with open('data_types.txt', 'r') as file:
data_types = file.read().splitlines()
# 筛选DataFrame,仅包含指定的数据类型
filtered_df = df[df['a'].isin(data_types)]
# 为每种数据类型创建一个单独的工作表
writer = pd.ExcelWriter('filtered_data.xlsx', engine='xlsxwriter')
summary_data = []
for data_type in data_types:
# 为当前数据类型筛选DataFrame
type_filtered_df = filtered_df[filtered_df['a'] == data_type]
# 将筛选后的数据写入Excel文件的新工作表
type_filtered_df.to_excel(writer, sheet_name=f'Type_{data_type}', index=False)
# 添加摘要数据
summary_data.append([data_type, len(type_filtered_df)])
# 创建摘要DataFrame
summary_df = pd.DataFrame(summary_data, columns=['数据类型', '总行数'])
# 为摘要DataFrame的两列添加标题行
summary_df_header = pd.DataFrame([['数据类型', '总行数']], columns=['数据类型'])
summary_df = pd.concat([summary_df_header, summary_df])
# 将DataFrame保存到'Summary'工作表
summary_df.to_excel(writer, sheet_name='摘要', index=False)
# 访问工作簿和摘要工作表
workbook = writer.book
summary_sheet = writer.sheets['摘要']
# 合并标题行的单元格
summary_sheet.merge_range('A1:B1', '数据类型')
# 重新排列工作表,使摘要工作表成为第一个工作表
sheets = workbook.sheetnames
summary_sheet_index = sheets.index('摘要')
sheets.insert(0, sheets.pop(summary_sheet_index))
for i, sheet_name in enumerate(sheets):
workbook.set_sheet_order(sheet_name, i)
# 保存并关闭Excel文件
writer.save()
英文:
I have a table of data in my csv. I want to reduce the data_type in my csv to 27 types according to the text file. After the filter, I want the Excel to contain 27 pages with each data type on each page. Now I want to make a summary page with the first column containing the names of the data types and the second column containing the total amount of rows of each data type on their respective pages. For the last step, I want my summary page to be arranged as the first page. My codes seem to work fine until the last step whereas the summary page cannot be placed to the first page.
import pandas as pd
# Load the CSV file into a DataFrame
df = pd.read_csv('my_file.csv')
# Read the data types from a text file
with open('data_types.txt', 'r') as file:
data_types = file.read().splitlines()
# Filter the DataFrame to include only the specified data types
filtered_df = df[df['a'].isin(data_types)]
# Create a separate sheet for each type of data
writer = pd.ExcelWriter('filtered_data.xlsx', engine='xlsxwriter')
summary_data = []
for data_type in data_types:
# Filter the DataFrame for the current data type
type_filtered_df = filtered_df[filtered_df['a'] == data_type]
# Write the filtered data to a new sheet in the Excel file
type_filtered_df.to_excel(writer, sheet_name=f'Type_{data_type}', index=False)
# Append the summary data
summary_data.append([data_type, len(type_filtered_df)])
# Create the summary DataFrame
summary_df = pd.DataFrame(summary_data, columns=['Data Type', 'Total Rows'])
# Add a header row for both columns in the summary DataFrame
summary_df_header = pd.DataFrame([['Header']], columns=['Data Type'])
summary_df = pd.concat([summary_df_header, summary_df])
# Save the DataFrame to the 'Summary' sheet
summary_df.to_excel(writer, sheet_name='Summary', index=False)
# Access the workbook and summary sheet
workbook = writer.book
summary_sheet = writer.sheets['Summary']
# Merge the cells for the header row
summary_sheet.merge_range('A1:B1', 'Header')
# Reorder the sheets so that the Summary sheet is the first sheet
sheets = workbook.sheetnames
summary_sheet_index = sheets.index('Summary')
sheets.insert(0, sheets.pop(summary_sheet_index))
for i, sheet_name in enumerate(sheets):
workbook.set_sheet_order(sheet_name, i)
# Save and close the Excel file
writer.save()
答案1
得分: 1
在以下代码之前创建名为Summary
的工作表:
# 为每种数据类型创建单独的工作表
writer = pd.ExcelWriter('filtered_data.xlsx', engine='xlsxwriter')
writer.book.add_worksheet('Summary')
# 你的代码
...
# 保存并关闭Excel文件
# writer.save() # 对于xlsxwriter引擎,我不确定.save() 是否适用
writer.close()
提示:在你的循环中使用 groupby
:
for data_type, type_filtered_df in filtered_df.groupby('a'):
# 将筛选后的数据写入Excel文件的新工作表
type_filtered_df.to_excel(writer, sheet_name=f'Type_{data_type}', index=False)
# 添加汇总数据
summary_data.append([data_type, len(type_filtered_df)])
英文:
Create the Summary
sheet before:
# Create a separate sheet for each type of data
writer = pd.ExcelWriter('filtered_data.xlsx', engine='xlsxwriter')
writer.book.add_worksheet('Summary')
# Your code
...
# Save and close the Excel file
# writer.save() # I'm not sure about .save() for xlsxwriter engine
writer.close()
Tips: use groupby
for your loop:
for data_type, type_filtered_df in filtered_df.groupby('a'):
# Write the filtered data to a new sheet in the Excel file
type_filtered_df.to_excel(writer, sheet_name=f'Type_{data_type}', index=False)
# Append the summary data
summary_data.append([data_type, len(type_filtered_df)])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论