英文:
Rearranging a datatable
问题
我正在将一个Excel文件导入到一个数据表(dtImport),然后将该数据重新排列到另一个数据表(dtImportParsed)中。
这是我试图重新排列数据表(dtImportParsed)的方式。
我目前是通过使用一些嵌套的for循环来完成这个操作,但这需要很长时间。例如,一个具有36列和4,000行的工作表大约需要30-40分钟才能完成。是否有一种替代方法可以加快速度?
这是我的代码:
for (int c = 2; c < dtImport.Columns.Count; c++) //对于每个日期列
{
for (int r = 1; r < dtImport.Rows.Count; r++)
{
if (dtImportParsed.Rows.Count == 0)
{
DataRow dataRowImport = dtImportParsed.NewRow();
dataRowImport["Date"] = dtImport.Columns[c].ColumnName.ToString().Trim();
dataRowImport["account_id"] = dtImport.Rows[r]["account_id"].ToString().Trim();
dataRowImport[dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
dtImportParsed.Rows.Add(dataRowImport);
}
else
{
for (int i = 0; i < dtImportParsed.Rows.Count; i++)
{
if (dtImportParsed.Rows[i]["account_id"].ToString() == dtImport.Rows[r]["account_id"].ToString())
{
if (dtImportParsed.Rows[i]["Date"].ToString() == dtImport.Columns[c].ColumnName.ToString())
{
dtImportParsed.Rows[i][dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
break;
}
}
else if (i == dtImportParsed.Rows.Count - 1)
{
DataRow dataRowImport = dtImportParsed.NewRow();
dataRowImport["Date"] = dtImport.Columns[c].ColumnName.ToString().Trim();
dataRowImport["account_id"] = dtImport.Rows[r]["account_id"].ToString().Trim();
dataRowImport[dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
dtImportParsed.Rows.Add(dataRowImport);
}
}
}
}
}
英文:
I'm importing an excel file into a datatable (dtImport) and rearranging that data into another datatable (dtImportParsed).
<br><br>Here's what that datatable (dtImport) looks like when I first import it.<br>
And this is how I'm trying to rearrange that datatable (dtImportParsed):<br>
<br><br>
I'm currently accomplishing this by using some nested for loops, but this takes a very long time. For example, a sheet with 36 columns and 4,000 rows takes about 30-40 minutes to complete. Is there an alternative method of accomplishing this that would speed things up?<br><br>
Here's my code:<br>
for (int c = 2; c < dtImport.Columns.Count; c++) //for each date column
{
for (int r = 1; r < dtImport.Rows.Count; r++)
{
if (dtImportParsed.Rows.Count == 0)
{
DataRow dataRowImport = dtImportParsed.NewRow();
dataRowImport["Date"] = dtImport.Columns[c].ColumnName.ToString().Trim();
dataRowImport["account_id"] = dtImport.Rows[r]["account_id"].ToString().Trim();
dataRowImport[dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
dtImportParsed.Rows.Add(dataRowImport);
}
else
{
for (int i = 0; i < dtImportParsed.Rows.Count; i++)
{
if (dtImportParsed.Rows[i]["account_id"].ToString() == dtImport.Rows[r]["account_id"].ToString())
{
if (dtImportParsed.Rows[i]["Date"].ToString() == dtImport.Columns[c].ColumnName.ToString())
{
dtImportParsed.Rows[i][dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
break;
}
}
else if (i == dtImportParsed.Rows.Count - 1)
{
DataRow dataRowImport = dtImportParsed.NewRow();
dataRowImport["Date"] = dtImport.Columns[c].ColumnName.ToString().Trim();
dataRowImport["account_id"] = dtImport.Rows[r]["account_id"].ToString().Trim();
dataRowImport[dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
dtImportParsed.Rows.Add(dataRowImport);
}
}
}
}
}
答案1
得分: 1
你使用的算法生成预期结果的成本太高了!它将按顺序执行 (c x r x i)
,其中 i > r
,因为在最终表中插入了空字段;实际上,这是一个 O(n^3) 算法!而且你通过迭代 DataRow
来执行它,这对你的需求可能不高效。
如果你的源数据集不是很大(正如你所提到的),而且你没有内存限制,我建议你使用基于索引的数据结构在内存中安排预期的数据集。类似于这样:
var arrangeDragon = new Dictionary<string, Dictionary<string, Dictionary<string, string>>();
巨龙进场!吞噬内部的 for
。
for (int c = 2; c < dtImport.Columns.Count; c++) // 对于每个日期列
{
for (int r = 1; r < dtImport.Rows.Count; r++)
{
// ...
// 代替: for (int i = 0; i < dtImportParsed.Rows.Count; i++) ...
string date = dtImport.Columns[c].ColumnName.ToString().Trim();
string accountId = dtImport.Rows[r]["account_id"].ToString();
string eventName = dtImport.Rows[r]["Event Name"].ToString().Trim();
if (!arrangeDragon.ContainsKey(date))
arrangeDragon.Add(date, new Dictionary<string, Dictionary<string, string>>());
if (!arrangeDragon[date].ContainsKey(accountId))
arrangeDragon[date][accountId] = new Dictionary<string, string>();
if (!arrangeDragon[date][accountId].ContainsKey(eventName))
arrangeDragon[date][accountId][eventName] = dtImport.Rows[r][c].ToString().Trim();
// ...
}
}
这些检查将在 O(1) 内执行,而不是 O(i),因此总体开销将降低到 O(n^2),这是迭代表的本质
此外,检索顺序是 O(1):
string data_field = arrangeDragon["1/1/2022"]["account1"]["Event1"];
Assert.AreEqual(data_field, "42");
现在你可以一次迭代嵌套的 Dictionary
并构建 dtImportParsed
。
如果你的数据集很大或内存有限,你需要其他解决方案,这不是你的问题,如前所述
祝你好运!
英文:
The algorithm you use to generate your expected result is too expensive! It will execute in order (c x r x i)
where i > r
because empty fields are injected in the final table; Actually it is an O(n<sup>3</sup>) algorithm! Also you preform it on DataTable
s via iterating DataRow
s that probably are not efficient for your requirement.
If your source data-set is not large (as you mentioned) and you have not memory restriction, I propose you to arrange expected data-set in memory using index-based data structures. Something like this:
var arrangeDragon = new Dictionary<string, Dictionary<string, Dictionary<string, string>>>();
The dragon enters! And eats the inner for
.
for (int c = 2; c < dtImport.Columns.Count; c++) //for each date column
{
for (int r = 1; r < dtImport.Rows.Count; r++)
{
// ...
// instead of: for (int i = 0; i < dtImportParsed.Rows.Count; i++) ...
string date = dtImport.Columns[c].ColumnName.ToString().Trim();
string accountId = dtImport.Rows[r]["account_id"].ToString();
string eventName = dtImport.Rows [r]["Event Name"].ToString().Trim();
if (!arrangeDragon.ContainsKey(date))
arrangeDragon.Add(date, new Dictionary<string, Dictionary<string, string>>());
if (!arrangeDragon[date].ContainsKey(accountId))
arrangeDragon[date][accountId] = new Dictionary<string, string>();
if (!arrangeDragon[date][accountId].ContainsKey(eventName))
arrangeDragon[date][accountId][eventName] = dtImport.Rows[r][c].ToString().Trim();
// ...
}
}
These checks will execute in O(1) instead of O(i), so total overhead will decrease to O(n<sup>2</sup>) that is the nature of iterating table
Also retrieve order is O(1):
string data_field = arrangeDragon["1/1/2022"]["account1"]["Event1"];
Assert.AreEqual(data_field, "42");
Now you can iterate nested Dictionary
s once and build the dtImportParsed
.
If your data-set is large or host memory is low, you need other solutions that is not your problem as mentioned
Good luck
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论