重排数据表

huangapple go评论55阅读模式
英文:

Rearranging a datatable

问题

我正在将一个Excel文件导入到一个数据表(dtImport),然后将该数据重新排列到另一个数据表(dtImportParsed)中。

这是我第一次导入时数据表(dtImport)的样子。
重排数据表

这是我试图重新排列数据表(dtImportParsed)的方式。
重排数据表

我目前是通过使用一些嵌套的for循环来完成这个操作,但这需要很长时间。例如,一个具有36列和4,000行的工作表大约需要30-40分钟才能完成。是否有一种替代方法可以加快速度?

这是我的代码:

for (int c = 2; c < dtImport.Columns.Count; c++) //对于每个日期列
{
    for (int r = 1; r < dtImport.Rows.Count; r++)
    {
        if (dtImportParsed.Rows.Count == 0)
        {
            DataRow dataRowImport = dtImportParsed.NewRow();
            dataRowImport["Date"] = dtImport.Columns[c].ColumnName.ToString().Trim();
            dataRowImport["account_id"] = dtImport.Rows[r]["account_id"].ToString().Trim();
            dataRowImport[dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
            dtImportParsed.Rows.Add(dataRowImport);
        }
        else
        {
            for (int i = 0; i < dtImportParsed.Rows.Count; i++)
            {
                if (dtImportParsed.Rows[i]["account_id"].ToString() == dtImport.Rows[r]["account_id"].ToString())
                {
                    if (dtImportParsed.Rows[i]["Date"].ToString() == dtImport.Columns[c].ColumnName.ToString())
                    {
                        dtImportParsed.Rows[i][dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
                        break;
                    }
                }
                else if (i == dtImportParsed.Rows.Count - 1)
                {
                    DataRow dataRowImport = dtImportParsed.NewRow();
                    dataRowImport["Date"] = dtImport.Columns[c].ColumnName.ToString().Trim();
                    dataRowImport["account_id"] = dtImport.Rows[r]["account_id"].ToString().Trim();
                    dataRowImport[dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
                    dtImportParsed.Rows.Add(dataRowImport);
                }
            }
        }
    }
}
英文:

I'm importing an excel file into a datatable (dtImport) and rearranging that data into another datatable (dtImportParsed).
<br><br>Here's what that datatable (dtImport) looks like when I first import it.<br>
重排数据表

And this is how I'm trying to rearrange that datatable (dtImportParsed):<br>
重排数据表

<br><br>
I'm currently accomplishing this by using some nested for loops, but this takes a very long time. For example, a sheet with 36 columns and 4,000 rows takes about 30-40 minutes to complete. Is there an alternative method of accomplishing this that would speed things up?<br><br>
Here's my code:<br>

for (int c = 2; c &lt; dtImport.Columns.Count; c++) //for each date column
            {
                for (int r = 1; r &lt; dtImport.Rows.Count; r++)
                {
                    if (dtImportParsed.Rows.Count == 0)
                    {
                        DataRow dataRowImport = dtImportParsed.NewRow();
                        dataRowImport[&quot;Date&quot;] = dtImport.Columns[c].ColumnName.ToString().Trim();
                        dataRowImport[&quot;account_id&quot;] = dtImport.Rows[r][&quot;account_id&quot;].ToString().Trim();
                        dataRowImport[dtImport.Rows[r][&quot;Event Name&quot;].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
                        dtImportParsed.Rows.Add(dataRowImport);
                    }
                    else
                    {
                        for (int i = 0; i &lt; dtImportParsed.Rows.Count; i++)
                        {
                            if (dtImportParsed.Rows[i][&quot;account_id&quot;].ToString() == dtImport.Rows[r][&quot;account_id&quot;].ToString())
                            {
                                if (dtImportParsed.Rows[i][&quot;Date&quot;].ToString() == dtImport.Columns[c].ColumnName.ToString())
                                {
                                    dtImportParsed.Rows[i][dtImport.Rows[r][&quot;Event Name&quot;].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
                                    break;
                                }
                            }
                            else if (i == dtImportParsed.Rows.Count - 1)
                            {
                                DataRow dataRowImport = dtImportParsed.NewRow();
                                dataRowImport[&quot;Date&quot;] = dtImport.Columns[c].ColumnName.ToString().Trim();
                                dataRowImport[&quot;account_id&quot;] = dtImport.Rows[r][&quot;account_id&quot;].ToString().Trim();
                                dataRowImport[dtImport.Rows[r][&quot;Event Name&quot;].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
                                dtImportParsed.Rows.Add(dataRowImport);
                            }
                        }
                    }

                }
            }

答案1

得分: 1

你使用的算法生成预期结果的成本太高了!它将按顺序执行 (c x r x i),其中 i > r,因为在最终表中插入了空字段;实际上,这是一个 O(n^3) 算法!而且你通过迭代 DataRow 来执行它,这对你的需求可能不高效。

如果你的源数据集不是很大(正如你所提到的),而且你没有内存限制,我建议你使用基于索引的数据结构在内存中安排预期的数据集。类似于这样:

var arrangeDragon = new Dictionary<string, Dictionary<string, Dictionary<string, string>>();

巨龙进场!吞噬内部的 for

for (int c = 2; c < dtImport.Columns.Count; c++) // 对于每个日期列
{
    for (int r = 1; r < dtImport.Rows.Count; r++)
    {
        // ...

        // 代替: for (int i = 0; i < dtImportParsed.Rows.Count; i++) ...
        string date = dtImport.Columns[c].ColumnName.ToString().Trim();
        string accountId = dtImport.Rows[r]["account_id"].ToString();
        string eventName = dtImport.Rows[r]["Event Name"].ToString().Trim();

        if (!arrangeDragon.ContainsKey(date))
            arrangeDragon.Add(date, new Dictionary<string, Dictionary<string, string>>());

        if (!arrangeDragon[date].ContainsKey(accountId))
            arrangeDragon[date][accountId] = new Dictionary<string, string>();

        if (!arrangeDragon[date][accountId].ContainsKey(eventName))
            arrangeDragon[date][accountId][eventName] = dtImport.Rows[r][c].ToString().Trim();

        // ...
    }
}

这些检查将在 O(1) 内执行,而不是 O(i),因此总体开销将降低到 O(n^2),这是迭代表的本质 重排数据表

此外,检索顺序是 O(1):

string data_field = arrangeDragon["1/1/2022"]["account1"]["Event1"];
Assert.AreEqual(data_field, "42");

现在你可以一次迭代嵌套的 Dictionary 并构建 dtImportParsed

如果你的数据集很大或内存有限,你需要其他解决方案,这不是你的问题,如前所述 重排数据表

祝你好运!

英文:

The algorithm you use to generate your expected result is too expensive! It will execute in order (c x r x i) where i &gt; r because empty fields are injected in the final table; Actually it is an O(n<sup>3</sup>) algorithm! Also you preform it on DataTables via iterating DataRows that probably are not efficient for your requirement.

If your source data-set is not large (as you mentioned) and you have not memory restriction, I propose you to arrange expected data-set in memory using index-based data structures. Something like this:

var arrangeDragon = new Dictionary&lt;string, Dictionary&lt;string, Dictionary&lt;string, string&gt;&gt;&gt;();

The dragon enters! And eats the inner for.

for (int c = 2; c &lt; dtImport.Columns.Count; c++) //for each date column
{
    for (int r = 1; r &lt; dtImport.Rows.Count; r++)
    {
        // ...

        // instead of:     for (int i = 0; i &lt; dtImportParsed.Rows.Count; i++) ...
        string date = dtImport.Columns[c].ColumnName.ToString().Trim();
        string accountId = dtImport.Rows[r][&quot;account_id&quot;].ToString();
        string eventName = dtImport.Rows [r][&quot;Event Name&quot;].ToString().Trim();

        if (!arrangeDragon.ContainsKey(date))
            arrangeDragon.Add(date, new Dictionary&lt;string, Dictionary&lt;string, string&gt;&gt;());

        if (!arrangeDragon[date].ContainsKey(accountId))
            arrangeDragon[date][accountId] = new Dictionary&lt;string, string&gt;();

        if (!arrangeDragon[date][accountId].ContainsKey(eventName))
            arrangeDragon[date][accountId][eventName] = dtImport.Rows[r][c].ToString().Trim();

        // ...
    }
}

These checks will execute in O(1) instead of O(i), so total overhead will decrease to O(n<sup>2</sup>) that is the nature of iterating table 重排数据表

Also retrieve order is O(1):

string data_field = arrangeDragon[&quot;1/1/2022&quot;][&quot;account1&quot;][&quot;Event1&quot;];
Assert.AreEqual(data_field, &quot;42&quot;);

Now you can iterate nested Dictionarys once and build the dtImportParsed.

If your data-set is large or host memory is low, you need other solutions that is not your problem as mentioned 重排数据表

Good luck

huangapple
  • 本文由 发表于 2023年2月20日 00:55:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/75501806.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定