英文:
EPPlus - How to Handle Rows beyond 1,048,576
问题
我正在生成交易报告,可能包含大量数据。但问题是Excel仅支持最大行数为1,048,576。
我正在使用EPPlus v6.2.3。当处理大量数据时,它会出现"Index Out of Bound"错误。如果数据超过最大行数,我想在工作簿中添加新工作表。我该如何做?
以下是我的代码片段:
public static ExcelPackage ExportToExcel(ExcelPackage excelPackage, DataTable dataTable, ExcelCellAddress startCellAddress)
{
// 创建工作表
ExcelWorksheet ws = excelPackage.Workbook.Worksheets[0];
ws.Cells[startCellAddress.Address].LoadFromDataTable(dataTable, false);
// 计算最后一列的字母
string finalColLetter = string.Empty;
string colCharset = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
int colCharsetLen = colCharset.Length;
if (dataTable.Columns.Count > colCharsetLen)
{
finalColLetter = colCharset.Substring((dataTable.Columns.Count - 1) / colCharsetLen - 1, 1);
}
finalColLetter += colCharset.Substring((dataTable.Columns.Count - 1) % colCharsetLen, 1);
string cellRange = string.Format("{0}:{1}{2}", startCellAddress.Address, finalColLetter, startCellAddress.Row + dataTable.Rows.Count - 1);
ExcelRange range = ws.Cells[cellRange];
// 分配边框
SetNormalCellStyle(range);
return excelPackage;
}
希望这能帮助你解决问题。
英文:
I am generating transaction reports which can contain a large amount of data. But the issue is that excel only support the max rows of 1,048,576.
I am using EPPlus v6.2.3. It is crashing with the error Index Out of Bound on large data. I want to add new sheet in the workbook if data exceeds the max row count. How can I do that?
Below is my code snippet:
public static ExcelPackage ExportToExcel(ExcelPackage excelPackage, DataTable dataTable, ExcelCellAddress startCellAddress)
{
//Create the worksheet
ExcelWorksheet ws = excelPackage.Workbook.Worksheets[0];
ws.Cells[startCellAddress.Address].LoadFromDataTable(dataTable, false);
// Calculate the final column letter
string finalColLetter = string.Empty;
string colCharset = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
int colCharsetLen = colCharset.Length;
if (dataTable.Columns.Count > colCharsetLen)
{
finalColLetter = colCharset.Substring((dataTable.Columns.Count - 1) / colCharsetLen - 1, 1);
}
finalColLetter += colCharset.Substring((dataTable.Columns.Count - 1) % colCharsetLen, 1);
string cellRange = string.Format("{0}:{1}{2}", startCellAddress.Address, finalColLetter, startCellAddress.Row + dataTable.Rows.Count - 1);
ExcelRange range = ws.Cells[cellRange];
// Assign borders
SetNormalCellStyle(range);
return excelPackage;
}
答案1
得分: 1
你可以通过在循环中创建一个工作表来将数据从多个工作表中分离。
每当数据达到1,048,576的阈值时,开始一个新的工作表。
在你的问题中没有提供示例代码,我为你创建了一个简单的示例,无论使用的包或版本如何,它都应该有效:
using OfficeOpenXml;
(...)
int worksheetIndex = 0,
row = 1,
dataPerWorksheet = 1_000_000,
dataItems = 3_000_000;
ExcelWorksheet currentWorksheet = null!;
for (int i = 0; i < dataItems; i++)
{
//检查是否需要新的工作表
if (i % dataPerWorksheet == 0)
{
worksheetIndex++;
currentWorksheet = package.Workbook.Worksheets.Add($"Data{worksheetIndex}");
row = 1;
}
//将数据添加到当前工作表的行
currentWorksheet.Cells[row, 1].Value = i; //我使用i作为数据
row++;
}
在添加了样板后,这给了我一个包含连续计数从0到3_000_000的Excel文件,每个工作表包含1_000_000行数据。
我使用的是OpenOffice,但这应该在Excel上很少或不需要修改的情况下工作。
额外信息:
关于你的编辑,你可以通过修改上面的示例来使用DataTable
中的数据进行相同的操作。
例如:
int worksheetIndex = 0,
row = startCellAddress.Row,
col = startCellAddress.Column
dataPerWorksheet = 1_000_000;
//声明工作表
ExcelWorksheet ws = null;
for (int r = 0; r < dataTable.Rows.Count; r++)
{
if (r % dataPerWorksheet == 0)
{
worksheetIndex++;
ws = excelPackage.Workbook.Worksheets.Add($"Data{worksheetIndex}");
row = startCellAddress.Row;
}
ws.Cells[Row:row, Col:col].Value = dataTable.Rows[r]["A"];
ws.Cells[Row: row, Col: col+1].Value = dataTable.Rows[r]["B"];
row++;
}
关于评论: 你需要循环遍历每一行,将数据从一行复制到另一行。LoadFromDataTable()
方法可能也会循环遍历所有行,只是在你的代码中不显示。
关于性能,我不确定是否有更好的方法来实现这一点,我不是专家,但打开文件比创建文件要花费更长的时间。
结果
//数据中的行数= 1,500,000
//dataPerWorksheet = 500,000
ExportToExcel(package, data, new("B2"));
> 使用多个工作表修改后的方法花费了08秒15毫秒,文件大小为19.4 MB(20,348,432字节),使用我的数据集。
英文:
You can divide you data from multiple sheets by creating a sheet in a loop
Whenever the data reaches your threshold of 1,048,576 start a new sheet
Without sample code on your question I took the liberty to craft a simple example that should work regardless of the package or version:
using OfficeOpenXml;
(...)
int worksheetIndex = 0,
row = 1,
dataPerWorksheet = 1_000_000,
dataItems = 3_000_000;
ExcelWorksheet currentWorksheet = null!;
for (int i = 0; i < dataItems; i++)
{
//check if you need a new sheet
if (i % dataPerWorksheet == 0)
{
worksheetIndex++;
currentWorksheet = package.Workbook.Worksheets.Add($"Data{worksheetIndex}");
row = 1;
}
//Add data to the row on current sheet
currentWorksheet.Cells[row, 1].Value = i; //I used i as the data
row++;
}
With the addition of boiler plate this gave me one excel file with 3 worksheets filled with continuous count from 0 to 3_000_000, each sheet holding 1_000_000 rows of data.
I am working with OpenOffice but this should work with little to none modifications on Excel
Extra:
Regarding your edit, you can do the same with your example by modifying the above example to get data from DataTable
For example:
int worksheetIndex = 0,
row = startCellAddress.Row,
col = startCellAddress.Column
dataPerWorksheet = 1_000_000;
//Declare worksheet
ExcelWorksheet ws = null;
for (int r = 0; r < dataTable.Rows.Count; r++)
{
if (r % dataPerWorksheet == 0)
{
worksheetIndex++;
ws = excelPackage.Workbook.Worksheets.Add($"Data{worksheetIndex}");
row = startCellAddress.Row;
}
ws.Cells[Row:row, Col:col].Value = dataTable.Rows[r]["A"];
ws.Cells[Row: row, Col: col+1].Value = dataTable.Rows[r]["B"];
row++;
}
Regarding the comment: you need to loop over each row to copy the data from one row to the other. The method LoadFromDataTable()
probably loops over all rows as well, just not on your side of the code.
Regarding performance I am not sure if there are better ways to do this, I am not an expert, but it took much longer to open the file than to create it.
Result
//rows in data= 1_500_000
//dataPerWorksheet = 500_000
ExportToExcel(package, data, new("B2"));
> Your method with the multiple worksheets modification took 08s15ms and the file has 19,4 MB (20 348 432 bytes) with my dataset.
答案2
得分: 1
这是您提供的代码的翻译:
// 当数据超过Excel允许的最大行数时,使用EPPlus并且不循环遍历大数据集来添加工作表,终于我成功了。
// 创建工作表
ExcelWorksheet ws = null;
int prevAddedRows = 0;
int excelMaxRows = 1048576; // Excel允许的最大行数
int totalRows = dataTable.Rows.Count; // 数据表的总行数
int maxCount = excelMaxRows; // 每个工作表最多行数
int startIndex = 0;
int remainingRowsCount;
ExcelCellAddress startCell = startCellAddress; // 起始单元格地址
int totalSheets = (totalRows / excelMaxRows) + 1; // 总工作表数
for (int i = 0; i < totalSheets; i++)
{
if (i == 0)
ws = excelPackage.Workbook.Worksheets[i]; // 使用现有工作表
else
{
ws = excelPackage.Workbook.Worksheets.Add(string.Format("Sheet{0}", i + 1)); // 创建新工作表
startCell = new ExcelCellAddress("A1"); // 重置起始单元格
}
DataTable selectedDataTable = dataTable.AsEnumerable().Skip(startIndex).Take(maxCount).CopyToDataTable(); // 选择部分数据
ws.Cells[startCell.Address].LoadFromDataTable(selectedDataTable, false); // 将数据加载到工作表
prevAddedRows += maxCount; // 更新已添加的行数
remainingRowsCount = totalRows - prevAddedRows; // 剩余行数
if (remainingRowsCount < excelMaxRows)
{
maxCount = remainingRowsCount; // 更新最大行数
}
startIndex = prevAddedRows;
}
return excelPackage;
如果您需要进一步的帮助,请告诉我。
英文:
Finally I have achieved to add worksheet as the data exceeds the max no. of allowed rows in excel, using EPPlus and without looping on the large dataset.
Here is the sample code:
//Create the worksheet
ExcelWorksheet ws = null;
int prevAddedRows = 0;
int excelMaxRows = 1048576;
int totalRows = dataTable.Rows.Count;
int maxCount = excelMaxRows;
int startIndex = 0;
int remainingRowsCount;
ExcelCellAddress startCell = startCellAddress;
int totalSheets = (totalRows / excelMaxRows) + 1;
for (int i = 0; i < totalSheets; i++)
{
if (i == 0)
ws = excelPackage.Workbook.Worksheets[i];
else
{
ws = excelPackage.Workbook.Worksheets.Add(string.Format("Sheet{0}", i + 1));
startCell = new ExcelCellAddress("A1");
}
DataTable selectedDataTable = dataTable.AsEnumerable().Skip(startIndex).Take(maxCount).CopyToDataTable();
ws.Cells[startCell.Address].LoadFromDataTable(selectedDataTable, false);
prevAddedRows += maxCount;
remainingRowsCount = totalRows - prevAddedRows;
if (remainingRowsCount < excelMaxRows)
{
maxCount = remainingRowsCount;
}
startIndex = prevAddedRows;
}
return excelPackage;
}
答案3
得分: 0
如果您愿意考虑EPPlus的替代方案,我维护了一个可能适合您的Excel库:Sylvan.Data.Excel。
以下是一个完整的C# 11示例,演示如何写入超出单个工作表限制的数据:
using Sylvan.Data;
using Sylvan.Data.Excel;
using System.Diagnostics;
var sw = Stopwatch.StartNew();
// 合成一些数据,并将其公开为DbDataReader
var data =
Enumerable.Range(0, 2_000_000)
.Select(i => new { Id = i, Date = DateTime.Today.AddHours(i), Value = Math.PI * i });
System.Data.Common.DbDataReader dr = data.AsDataReader();
// 创建一个ExcelDataWriter,它还可以写入 .xlsb 文件
using var edw = ExcelDataWriter.Create("report.xlsx");
bool done = false;
int page = 1;
do
{
// 将数据写入Excel文件
var result = edw.Write(dr, "Report_" + page++);
// result 将标识已写入的行数
// 以及IsComplete指示是否已完全使用DbDataReader。
done = result.IsComplete;
} while (!done);
sw.Stop();
var size = new FileInfo("report.xlsx").Length;
Console.WriteLine($"在 {sw.Elapsed} 内创建了 {size} 文件。");
在我的计算机上,这将在约3.8秒内生成一个约30MB的文件。
在00:00:03.8263647内创建了30313078文件。
英文:
If you are willing to consider an alternative to EPPlus, I maintain an Excel library that might work for you: Sylvan.Data.Excel.
Here is a complete C# 11 example showing how to write data that exceeds the limit of a single sheet:
using Sylvan.Data;
using Sylvan.Data.Excel;
using System.Diagnostics;
var sw = Stopwatch.StartNew();
// synthesize some data, and expose it as a DbDataReader
var data =
Enumerable.Range(0, 2_000_000)
.Select(i => new { Id = i, Date = DateTime.Today.AddHours(i), Value = Math.PI * i });
System.Data.Common.DbDataReader dr = data.AsDataReader();
// create an ExcelDataWriter, which can also write .xlsb files
using var edw = ExcelDataWriter.Create("report.xlsx");
bool done = false;
int page = 1;
do
{
// write the data to the excel file
var result = edw.Write(dr, "Report_" + page++);
// result will identify the number of rows written
// and IsComplete indicating if the entire DbDataReader was consumed.
done = result.IsComplete;
} while (!done);
sw.Stop();
var size = new FileInfo("report.xlsx").Length;
Console.WriteLine($"Created {size} file in {sw.Elapsed}.");
On my machine, this produces a ~30MB file in about 3.8 seconds.
Created 30313078 file in 00:00:03.8263647.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论