EPPlus – 如何处理超过1,048,576行的情况

huangapple go评论67阅读模式
英文:

EPPlus - How to Handle Rows beyond 1,048,576

问题

我正在生成交易报告,可能包含大量数据。但问题是Excel仅支持最大行数为1,048,576。

我正在使用EPPlus v6.2.3。当处理大量数据时,它会出现"Index Out of Bound"错误。如果数据超过最大行数,我想在工作簿中添加新工作表。我该如何做?

以下是我的代码片段:

public static ExcelPackage ExportToExcel(ExcelPackage excelPackage, DataTable dataTable, ExcelCellAddress startCellAddress)
{
    // 创建工作表
    ExcelWorksheet ws = excelPackage.Workbook.Worksheets[0];

    ws.Cells[startCellAddress.Address].LoadFromDataTable(dataTable, false);
    // 计算最后一列的字母
    string finalColLetter = string.Empty;
    string colCharset = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    int colCharsetLen = colCharset.Length;

    if (dataTable.Columns.Count > colCharsetLen)
    {
        finalColLetter = colCharset.Substring((dataTable.Columns.Count - 1) / colCharsetLen - 1, 1);
    }

    finalColLetter += colCharset.Substring((dataTable.Columns.Count - 1) % colCharsetLen, 1);
    string cellRange = string.Format("{0}:{1}{2}", startCellAddress.Address, finalColLetter, startCellAddress.Row + dataTable.Rows.Count - 1);

    ExcelRange range = ws.Cells[cellRange];
    // 分配边框
    SetNormalCellStyle(range);

    return excelPackage;
}

希望这能帮助你解决问题。

英文:

I am generating transaction reports which can contain a large amount of data. But the issue is that excel only support the max rows of 1,048,576.

I am using EPPlus v6.2.3. It is crashing with the error Index Out of Bound on large data. I want to add new sheet in the workbook if data exceeds the max row count. How can I do that?

Below is my code snippet:

  public static ExcelPackage ExportToExcel(ExcelPackage excelPackage, DataTable dataTable, ExcelCellAddress startCellAddress)
            {
                //Create the worksheet
                ExcelWorksheet ws = excelPackage.Workbook.Worksheets[0];
    
                ws.Cells[startCellAddress.Address].LoadFromDataTable(dataTable, false);
                // Calculate the final column letter
                string finalColLetter = string.Empty;
                string colCharset = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
                int colCharsetLen = colCharset.Length;
    
                if (dataTable.Columns.Count > colCharsetLen)
                {
                    finalColLetter = colCharset.Substring((dataTable.Columns.Count - 1) / colCharsetLen - 1, 1);
                }
    
                finalColLetter += colCharset.Substring((dataTable.Columns.Count - 1) % colCharsetLen, 1);
                string cellRange = string.Format("{0}:{1}{2}", startCellAddress.Address, finalColLetter, startCellAddress.Row + dataTable.Rows.Count - 1);
    
                ExcelRange range = ws.Cells[cellRange];
                // Assign borders
                SetNormalCellStyle(range);
    
                return excelPackage;
            }

答案1

得分: 1

你可以通过在循环中创建一个工作表来将数据从多个工作表中分离。

每当数据达到1,048,576的阈值时,开始一个新的工作表。

在你的问题中没有提供示例代码,我为你创建了一个简单的示例,无论使用的包或版本如何,它都应该有效:

using OfficeOpenXml;
(...)

int worksheetIndex = 0, 
   row = 1, 
   dataPerWorksheet = 1_000_000,
   dataItems = 3_000_000;

ExcelWorksheet currentWorksheet = null!;

for (int i = 0; i < dataItems; i++)
{
    //检查是否需要新的工作表
    if (i % dataPerWorksheet == 0)
    {
        worksheetIndex++;
        currentWorksheet = package.Workbook.Worksheets.Add($"Data{worksheetIndex}");
        row = 1;
    }
    
    //将数据添加到当前工作表的行
    currentWorksheet.Cells[row, 1].Value = i; //我使用i作为数据
    row++;
}

在添加了样板后,这给了我一个包含连续计数从0到3_000_000的Excel文件,每个工作表包含1_000_000行数据。

我使用的是OpenOffice,但这应该在Excel上很少或不需要修改的情况下工作。

额外信息:

关于你的编辑,你可以通过修改上面的示例来使用DataTable中的数据进行相同的操作。

例如:

int worksheetIndex = 0,
    row = startCellAddress.Row,
    col = startCellAddress.Column
    dataPerWorksheet = 1_000_000;

//声明工作表
ExcelWorksheet ws = null;

for (int r = 0; r < dataTable.Rows.Count; r++)
{
    if (r % dataPerWorksheet == 0)
    {
        worksheetIndex++;
        ws = excelPackage.Workbook.Worksheets.Add($"Data{worksheetIndex}");
        row = startCellAddress.Row;
    }

    ws.Cells[Row:row, Col:col].Value = dataTable.Rows[r]["A"];
    ws.Cells[Row: row, Col: col+1].Value = dataTable.Rows[r]["B"];

    row++;
}

关于评论: 你需要循环遍历每一行,将数据从一行复制到另一行。LoadFromDataTable()方法可能也会循环遍历所有行,只是在你的代码中不显示。

关于性能,我不确定是否有更好的方法来实现这一点,我不是专家,但打开文件比创建文件要花费更长的时间。

结果

//数据中的行数= 1,500,000
//dataPerWorksheet = 500,000
ExportToExcel(package, data, new("B2"));

EPPlus – 如何处理超过1,048,576行的情况
> 使用多个工作表修改后的方法花费了08秒15毫秒,文件大小为19.4 MB(20,348,432字节),使用我的数据集。

英文:

You can divide you data from multiple sheets by creating a sheet in a loop

Whenever the data reaches your threshold of 1,048,576 start a new sheet

Without sample code on your question I took the liberty to craft a simple example that should work regardless of the package or version:

using OfficeOpenXml;
(...)

int worksheetIndex = 0, 
   row = 1, 
   dataPerWorksheet = 1_000_000,
   dataItems = 3_000_000;

ExcelWorksheet currentWorksheet = null!;

for (int i = 0; i < dataItems; i++)
{
    //check if you need a new sheet
    if (i % dataPerWorksheet == 0)
    {
        worksheetIndex++;
        currentWorksheet = package.Workbook.Worksheets.Add($"Data{worksheetIndex}");
        row = 1;
    }
    
    //Add data to the row on current sheet
    currentWorksheet.Cells[row, 1].Value = i; //I used i as the data
    row++;
}

With the addition of boiler plate this gave me one excel file with 3 worksheets filled with continuous count from 0 to 3_000_000, each sheet holding 1_000_000 rows of data.

I am working with OpenOffice but this should work with little to none modifications on Excel

Extra:

Regarding your edit, you can do the same with your example by modifying the above example to get data from DataTable

For example:

int worksheetIndex = 0,
    row = startCellAddress.Row,
    col = startCellAddress.Column
    dataPerWorksheet = 1_000_000;

//Declare worksheet
ExcelWorksheet ws = null;

for (int r = 0; r < dataTable.Rows.Count; r++)
{
    if (r % dataPerWorksheet == 0)
    {
        worksheetIndex++;
        ws = excelPackage.Workbook.Worksheets.Add($"Data{worksheetIndex}");
        row = startCellAddress.Row;
    }

    ws.Cells[Row:row, Col:col].Value = dataTable.Rows[r]["A"];
    ws.Cells[Row: row, Col: col+1].Value = dataTable.Rows[r]["B"];

    row++;
}

Regarding the comment: you need to loop over each row to copy the data from one row to the other. The method LoadFromDataTable() probably loops over all rows as well, just not on your side of the code.

Regarding performance I am not sure if there are better ways to do this, I am not an expert, but it took much longer to open the file than to create it.

Result

//rows in data= 1_500_000
//dataPerWorksheet = 500_000
ExportToExcel(package, data, new("B2"));

EPPlus – 如何处理超过1,048,576行的情况
> Your method with the multiple worksheets modification took 08s15ms and the file has 19,4 MB (20 348 432 bytes) with my dataset.

答案2

得分: 1

这是您提供的代码的翻译:

// 当数据超过Excel允许的最大行数时,使用EPPlus并且不循环遍历大数据集来添加工作表,终于我成功了。

// 创建工作表
ExcelWorksheet ws = null;
int prevAddedRows = 0;
int excelMaxRows = 1048576; // Excel允许的最大行数
int totalRows = dataTable.Rows.Count; // 数据表的总行数
int maxCount = excelMaxRows; // 每个工作表最多行数
int startIndex = 0;
int remainingRowsCount;
ExcelCellAddress startCell = startCellAddress; // 起始单元格地址

int totalSheets = (totalRows / excelMaxRows) + 1; // 总工作表数
for (int i = 0; i < totalSheets; i++)
{
    if (i == 0)
        ws = excelPackage.Workbook.Worksheets[i]; // 使用现有工作表
    else
    {
        ws = excelPackage.Workbook.Worksheets.Add(string.Format("Sheet{0}", i + 1)); // 创建新工作表
        startCell = new ExcelCellAddress("A1"); // 重置起始单元格
    }
    DataTable selectedDataTable = dataTable.AsEnumerable().Skip(startIndex).Take(maxCount).CopyToDataTable(); // 选择部分数据
    ws.Cells[startCell.Address].LoadFromDataTable(selectedDataTable, false); // 将数据加载到工作表
    prevAddedRows += maxCount; // 更新已添加的行数
    remainingRowsCount = totalRows - prevAddedRows; // 剩余行数
    if (remainingRowsCount < excelMaxRows)
    {
        maxCount = remainingRowsCount; // 更新最大行数
    }
    startIndex = prevAddedRows;              
}
return excelPackage;

如果您需要进一步的帮助,请告诉我。

英文:

Finally I have achieved to add worksheet as the data exceeds the max no. of allowed rows in excel, using EPPlus and without looping on the large dataset.

Here is the sample code:

//Create the worksheet
            ExcelWorksheet ws = null;
            int prevAddedRows = 0;
            int excelMaxRows = 1048576;
            int totalRows = dataTable.Rows.Count;
            int maxCount = excelMaxRows;
            int startIndex = 0;
            int remainingRowsCount;
            ExcelCellAddress startCell = startCellAddress;

            int totalSheets = (totalRows / excelMaxRows) + 1;
            for (int i = 0; i &lt; totalSheets; i++)
            {
                if (i == 0)
                    ws = excelPackage.Workbook.Worksheets[i];
                else
                {
                    ws = excelPackage.Workbook.Worksheets.Add(string.Format(&quot;Sheet{0}&quot;, i + 1));
                    startCell = new ExcelCellAddress(&quot;A1&quot;);
                }
                DataTable selectedDataTable = dataTable.AsEnumerable().Skip(startIndex).Take(maxCount).CopyToDataTable();
                ws.Cells[startCell.Address].LoadFromDataTable(selectedDataTable, false);
                prevAddedRows += maxCount;
                remainingRowsCount = totalRows - prevAddedRows;
                if (remainingRowsCount &lt; excelMaxRows)
                {
                    maxCount = remainingRowsCount;
                }
                startIndex = prevAddedRows;              
            }
            return excelPackage;
        }

答案3

得分: 0

如果您愿意考虑EPPlus的替代方案,我维护了一个可能适合您的Excel库:Sylvan.Data.Excel

以下是一个完整的C# 11示例,演示如何写入超出单个工作表限制的数据:

using Sylvan.Data;
using Sylvan.Data.Excel;
using System.Diagnostics;

var sw = Stopwatch.StartNew();

// 合成一些数据,并将其公开为DbDataReader
var data = 
    Enumerable.Range(0, 2_000_000)
    .Select(i => new { Id = i, Date = DateTime.Today.AddHours(i), Value = Math.PI * i });
System.Data.Common.DbDataReader dr = data.AsDataReader();

// 创建一个ExcelDataWriter,它还可以写入 .xlsb 文件
using var edw = ExcelDataWriter.Create("report.xlsx");

bool done = false;
int page = 1;
do
{
    // 将数据写入Excel文件
    var result = edw.Write(dr, "Report_" + page++);
    // result 将标识已写入的行数
    // 以及IsComplete指示是否已完全使用DbDataReader。
    done = result.IsComplete;
} while (!done);

sw.Stop();
var size = new FileInfo("report.xlsx").Length;
Console.WriteLine($"在 {sw.Elapsed} 内创建了 {size} 文件。");

在我的计算机上,这将在约3.8秒内生成一个约30MB的文件。

在00:00:03.8263647内创建了30313078文件。

英文:

If you are willing to consider an alternative to EPPlus, I maintain an Excel library that might work for you: Sylvan.Data.Excel.

Here is a complete C# 11 example showing how to write data that exceeds the limit of a single sheet:

using Sylvan.Data;
using Sylvan.Data.Excel;
using System.Diagnostics;

var sw = Stopwatch.StartNew();

// synthesize some data, and expose it as a DbDataReader
var data = 
    Enumerable.Range(0, 2_000_000)
    .Select(i =&gt; new { Id = i, Date = DateTime.Today.AddHours(i), Value = Math.PI * i });
System.Data.Common.DbDataReader dr = data.AsDataReader();

// create an ExcelDataWriter, which can also write .xlsb files
using var edw = ExcelDataWriter.Create(&quot;report.xlsx&quot;);

bool done = false;
int page = 1;
do
{
    // write the data to the excel file
    var result = edw.Write(dr, &quot;Report_&quot; + page++);
    // result will identify the number of rows written
    // and IsComplete indicating if the entire DbDataReader was consumed.
    done = result.IsComplete;
} while (!done);

sw.Stop();
var size = new FileInfo(&quot;report.xlsx&quot;).Length;
Console.WriteLine($&quot;Created {size} file in {sw.Elapsed}.&quot;);

On my machine, this produces a ~30MB file in about 3.8 seconds.

Created 30313078 file in 00:00:03.8263647.

huangapple
  • 本文由 发表于 2023年5月8日 02:17:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76195570-2.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定