2023年2月20日 00:55:46go评论64阅读模式

英文:

Rearranging a datatable

问题

我正在将一个Excel文件导入到一个数据表（dtImport），然后将该数据重新排列到另一个数据表（dtImportParsed）中。

这是我第一次导入时数据表（dtImport）的样子。

这是我试图重新排列数据表（dtImportParsed）的方式。

我目前是通过使用一些嵌套的for循环来完成这个操作，但这需要很长时间。例如，一个具有36列和4,000行的工作表大约需要30-40分钟才能完成。是否有一种替代方法可以加快速度？

这是我的代码：

for (int c = 2; c < dtImport.Columns.Count; c++) //对于每个日期列
{
    for (int r = 1; r < dtImport.Rows.Count; r++)
    {
        if (dtImportParsed.Rows.Count == 0)
        {
            DataRow dataRowImport = dtImportParsed.NewRow();
            dataRowImport["Date"] = dtImport.Columns[c].ColumnName.ToString().Trim();
            dataRowImport["account_id"] = dtImport.Rows[r]["account_id"].ToString().Trim();
            dataRowImport[dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
            dtImportParsed.Rows.Add(dataRowImport);
        }
        else
        {
            for (int i = 0; i < dtImportParsed.Rows.Count; i++)
            {
                if (dtImportParsed.Rows[i]["account_id"].ToString() == dtImport.Rows[r]["account_id"].ToString())
                {
                    if (dtImportParsed.Rows[i]["Date"].ToString() == dtImport.Columns[c].ColumnName.ToString())
                    {
                        dtImportParsed.Rows[i][dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
                        break;
                    }
                }
                else if (i == dtImportParsed.Rows.Count - 1)
                {
                    DataRow dataRowImport = dtImportParsed.NewRow();
                    dataRowImport["Date"] = dtImport.Columns[c].ColumnName.ToString().Trim();
                    dataRowImport["account_id"] = dtImport.Rows[r]["account_id"].ToString().Trim();
                    dataRowImport[dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
                    dtImportParsed.Rows.Add(dataRowImport);
                }
            }
        }
    }
}

英文:

I'm importing an excel file into a datatable (dtImport) and rearranging that data into another datatable (dtImportParsed).
 Here's what that datatable (dtImport) looks like when I first import it.

And this is how I'm trying to rearrange that datatable (dtImportParsed):

I'm currently accomplishing this by using some nested for loops, but this takes a very long time. For example, a sheet with 36 columns and 4,000 rows takes about 30-40 minutes to complete. Is there an alternative method of accomplishing this that would speed things up? 
Here's my code:

for (int c = 2; c &lt; dtImport.Columns.Count; c++) //for each date column
            {
                for (int r = 1; r &lt; dtImport.Rows.Count; r++)
                {
                    if (dtImportParsed.Rows.Count == 0)
                    {
                        DataRow dataRowImport = dtImportParsed.NewRow();
                        dataRowImport[&quot;Date&quot;] = dtImport.Columns[c].ColumnName.ToString().Trim();
                        dataRowImport[&quot;account_id&quot;] = dtImport.Rows[r][&quot;account_id&quot;].ToString().Trim();
                        dataRowImport[dtImport.Rows[r][&quot;Event Name&quot;].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
                        dtImportParsed.Rows.Add(dataRowImport);
                    }
                    else
                    {
                        for (int i = 0; i &lt; dtImportParsed.Rows.Count; i++)
                        {
                            if (dtImportParsed.Rows[i][&quot;account_id&quot;].ToString() == dtImport.Rows[r][&quot;account_id&quot;].ToString())
                            {
                                if (dtImportParsed.Rows[i][&quot;Date&quot;].ToString() == dtImport.Columns[c].ColumnName.ToString())
                                {
                                    dtImportParsed.Rows[i][dtImport.Rows[r][&quot;Event Name&quot;].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
                                    break;
                                }
                            }
                            else if (i == dtImportParsed.Rows.Count - 1)
                            {
                                DataRow dataRowImport = dtImportParsed.NewRow();
                                dataRowImport[&quot;Date&quot;] = dtImport.Columns[c].ColumnName.ToString().Trim();
                                dataRowImport[&quot;account_id&quot;] = dtImport.Rows[r][&quot;account_id&quot;].ToString().Trim();
                                dataRowImport[dtImport.Rows[r][&quot;Event Name&quot;].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
                                dtImportParsed.Rows.Add(dataRowImport);
                            }
                        }
                    }

                }
            }

答案1

得分: 1

你使用的算法生成预期结果的成本太高了！它将按顺序执行 (c x r x i)，其中 i > r，因为在最终表中插入了空字段；实际上，这是一个 O(n^3) 算法！而且你通过迭代 DataRow 来执行它，这对你的需求可能不高效。

如果你的源数据集不是很大（正如你所提到的），而且你没有内存限制，我建议你使用基于索引的数据结构在内存中安排预期的数据集。类似于这样：

var arrangeDragon = new Dictionary<string, Dictionary<string, Dictionary<string, string>>();

巨龙进场！吞噬内部的 for。

for (int c = 2; c < dtImport.Columns.Count; c++) // 对于每个日期列
{
    for (int r = 1; r < dtImport.Rows.Count; r++)
    {
        // ...

        // 代替: for (int i = 0; i < dtImportParsed.Rows.Count; i++) ...
        string date = dtImport.Columns[c].ColumnName.ToString().Trim();
        string accountId = dtImport.Rows[r]["account_id"].ToString();
        string eventName = dtImport.Rows[r]["Event Name"].ToString().Trim();

        if (!arrangeDragon.ContainsKey(date))
            arrangeDragon.Add(date, new Dictionary<string, Dictionary<string, string>>());

        if (!arrangeDragon[date].ContainsKey(accountId))
            arrangeDragon[date][accountId] = new Dictionary<string, string>();

        if (!arrangeDragon[date][accountId].ContainsKey(eventName))
            arrangeDragon[date][accountId][eventName] = dtImport.Rows[r][c].ToString().Trim();

        // ...
    }
}

这些检查将在 O(1) 内执行，而不是 O(i)，因此总体开销将降低到 O(n^2)，这是迭代表的本质

此外，检索顺序是 O(1)：

string data_field = arrangeDragon["1/1/2022"]["account1"]["Event1"];
Assert.AreEqual(data_field, "42");

现在你可以一次迭代嵌套的 Dictionary 并构建 dtImportParsed。

如果你的数据集很大或内存有限，你需要其他解决方案，这不是你的问题，如前所述

祝你好运！

英文:

The algorithm you use to generate your expected result is too expensive! It will execute in order (c x r x i) where i > r because empty fields are injected in the final table; Actually it is an O(n3) algorithm! Also you preform it on DataTables via iterating DataRows that probably are not efficient for your requirement.

If your source data-set is not large (as you mentioned) and you have not memory restriction, I propose you to arrange expected data-set in memory using index-based data structures. Something like this:

var arrangeDragon = new Dictionary&lt;string, Dictionary&lt;string, Dictionary&lt;string, string&gt;&gt;&gt;();

The dragon enters! And eats the inner for.

for (int c = 2; c &lt; dtImport.Columns.Count; c++) //for each date column
{
    for (int r = 1; r &lt; dtImport.Rows.Count; r++)
    {
        // ...

        // instead of:     for (int i = 0; i &lt; dtImportParsed.Rows.Count; i++) ...
        string date = dtImport.Columns[c].ColumnName.ToString().Trim();
        string accountId = dtImport.Rows[r][&quot;account_id&quot;].ToString();
        string eventName = dtImport.Rows [r][&quot;Event Name&quot;].ToString().Trim();

        if (!arrangeDragon.ContainsKey(date))
            arrangeDragon.Add(date, new Dictionary&lt;string, Dictionary&lt;string, string&gt;&gt;());

        if (!arrangeDragon[date].ContainsKey(accountId))
            arrangeDragon[date][accountId] = new Dictionary&lt;string, string&gt;();

        if (!arrangeDragon[date][accountId].ContainsKey(eventName))
            arrangeDragon[date][accountId][eventName] = dtImport.Rows[r][c].ToString().Trim();

        // ...
    }
}

These checks will execute in O(1) instead of O(i), so total overhead will decrease to O(n2) that is the nature of iterating table

Also retrieve order is O(1):

string data_field = arrangeDragon[&quot;1/1/2022&quot;][&quot;account1&quot;][&quot;Event1&quot;];
Assert.AreEqual(data_field, &quot;42&quot;);

Now you can iterate nested Dictionarys once and build the dtImportParsed.

If your data-set is large or host memory is low, you need other solutions that is not your problem as mentioned

Good luck

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

重排数据表

问题

答案1

Logic App 触发器用于 Azure Blob 存储不起作用。

将 SoftwareBitmap 转换为 Mat 用于 C#

有更好的方式来消费来自事件中心的数据吗？

检查 viewModel 是否等于 0 并显示
标签。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论