英文:
How to reduce the time of calculations in VB.net?
问题
我必须对近5000个项目执行一些计算。我已经使用了并行for循环,这些计算所花费的时间几乎为5秒。我希望在1秒以内完成。任何建议都将非常感激。
以下是代码部分的翻译:
Dim AverageVal As New Double
Dim maxVal, minVal As New Double
Dim error_ As New Double
Dim delta As New List(Of Double)
With gOutputStresses_LC(0)
Dim UniqueTagsList As List(Of Integer) = gOutputStresses_LC(0).stresses.Select(Function(c) c.Tag0D).Distinct.ToList
Parallel.ForEach(UniqueTagsList, Sub(Uniq)
Dim tstresses As New List(Of clsMSHStress)
tstresses = .stresses.FindAll(Function(s) s.Tag0D = Uniq)
tstresses.Sort(Function(x, y) x.Sxx.CompareTo(y.Sxx))
tstresses.Reverse()
'maxVal = tstresses(0).Sxx
'minVal = tstresses.Last.Sxx
error_ = Math.Abs((tstresses(0).Sxx - tstresses.Last.Sxx) / (tstresses(0).Sxx + tstresses.Last.Sxx))
If error_ < 10 Then
AverageVal = tstresses.Sum(Function(s) s.Sxx) / tstresses.Count
.stresses.FindAll(Function(s) s.Tag0D = Uniq).ForEach(Sub(c) c.Sxx = AverageVal)
Else
AverageVal = 0
delta = New List(Of Double)
For k As Integer = 1 To tstresses.Count - 1
delta.Add(tstresses(k).Sxx - tstresses(k - 1).Sxx)
Next
Dim idx As Integer = delta.FindIndex(Function(c) c = delta.Max)
For g As Integer = 0 To idx - 1
AverageVal += tstresses(g).Sxx
Next
AverageVal = AverageVal / idx
For g As Integer = 0 To idx - 1
tstresses(g).Sxx = AverageVal
Next
AverageVal = 0
For g As Integer = idx To tstresses.Count - 1
AverageVal += tstresses(g).Sxx
Next
AverageVal = AverageVal / (tstresses.Count - idx)
For g As Integer = idx To tstresses.Count - 1
tstresses(g).Sxx = AverageVal
Next
For p As Integer = 0 To tstresses.Count - 1
''.Stresses.Where(Function(s) s.Tag0D = tstresses(p).Tag0D And s.Tag2D = tstresses(p).Tag2D) = tstresses(p)
'Dim idx As Integer =
.stresses(.stresses.FindIndex(Function(s) s.Tag0D = tstresses(p).Tag0D And s.Tag2D = tstresses(p).Tag2D)) = tstresses(p)
Next
End If
End Sub)
End With
UniqueTagsList
包含了近5000个项目。
英文:
I have to perform some calculations for almost 5000 items. I have used parallel for loop and the time taken for these calculations is almost 5 sec. I want to do it in below 1 sec. Any suggestions will be highly appreciated.
The code is this
Dim AverageVal As New Double
Dim maxVal, minVal As New Double
Dim error_ As New Double
Dim delta As New List(Of Double)
With gOutputStresses_LC(0)
Dim UniqueTagsList As List(Of Integer) = gOutputStresses_LC(0).stresses.Select(Function(c) c.Tag0D).Distinct.ToList
Parallel.ForEach(UniqueTagsList, Sub(Uniq)
Dim tstresses As New List(Of clsMSHStress)
tstresses = .stresses.FindAll(Function(s) s.Tag0D = Uniq)
tstresses.Sort(Function(x, y) x.Sxx.CompareTo(y.Sxx))
tstresses.Reverse()
'maxVal = tstresses(0).Sxx
'minVal = tstresses.Last.Sxx
error_ = Math.Abs((tstresses(0).Sxx - tstresses.Last.Sxx) / (tstresses(0).Sxx + tstresses.Last.Sxx))
If error_ < 10 Then
AverageVal = tstresses.Sum(Function(s) s.Sxx) / tstresses.Count
.stresses.FindAll(Function(s) s.Tag0D = Uniq).ForEach(Sub(c) c.Sxx = AverageVal)
Else
AverageVal = 0
delta = New List(Of Double)
For k As Integer = 1 To tstresses.Count - 1
delta.Add(tstresses(k).Sxx - tstresses(k - 1).Sxx)
Next
Dim idx As Integer = delta.FindIndex(Function(c) c = delta.Max)
For g As Integer = 0 To idx - 1
AverageVal += tstresses(g).Sxx
Next
AverageVal = AverageVal / idx
For g As Integer = 0 To idx - 1
tstresses(g).Sxx = AverageVal
Next
AverageVal = 0
For g As Integer = idx To tstresses.Count - 1
AverageVal += tstresses(g).Sxx
Next
AverageVal = AverageVal / (tstresses.Count - idx)
For g As Integer = idx To tstresses.Count - 1
tstresses(g).Sxx = AverageVal
Next
For p As Integer = 0 To tstresses.Count - 1
''.Stresses.Where(Function(s) s.Tag0D = tstresses(p).Tag0D And s.Tag2D = tstresses(p).Tag2D) = tstresses(p)
'Dim idx As Integer =
.stresses(.stresses.FindIndex(Function(s) s.Tag0D = tstresses(p).Tag0D And s.Tag2D = tstresses(p).Tag2D)) = tstresses(p)
Next
End If
End Sub)
End With
The UniqueTagsList
contains almost 5000 items.
答案1
得分: 2
5000 项对于算法来说根本不算什么,如果算法正确实现的话。我们可能可以在不需要并行的情况下在不到一秒的时间内完成。假设使用现代常见的2 GHz 的商品 CPU,那就是 20 亿个时钟周期。甚至更好的是,你可能会获得每个时钟周期至少 3 条指令。总共是 60 亿条指令。简单的除法意味着我们可以在单个非并行核上的一秒钟内完成这个操作,只要我们对每个项目的指令数少于 120 万。
从原始代码中可以看出,节省的一个重要潜力是这个循环(以及类似的循环):
它似乎试图将每个 tstresses
项目与原始的 gOutputStresses_LC
数组中的相同项目进行匹配。但由于它似乎我们正在使用引用类型(cls
前缀通常不用于 Structure
),我们可以知道_这些已经是相同的对象_,并且不需要查找回原始数组。这样做只会浪费大量时间,将每个数组元素设置为它已经具有的相同值。
在其他地方也有类似的代码,运行了 FindIndex()
或 FindAll()
来在先前的数组中进行额外查找。这额外的工作很重要且_不需要_,将本来可能接近于 O(n)(在排序之后)的东西转换为 O(n2)。
一旦我们理解了这一点,简化就会打开其他改进的途径。简而言之,通过避免创建额外不必要的列表/数组,并显著减少对数据的遍历次数,下面的代码应该会快得多。
我需要添加一个声明,我是直接在回复窗口中输入的,没有任何样本数据可以进行测试。很可能会有一两个错误,你可能仍然需要解决一些bug,包括围绕最终的 IdxAtMaxDelta
中断可能有的一个偏差,所以请务必进行彻底的测试。
Dim TagGroups = gOutputStresses_LC(0).stresses.GroupBy(Function(c) c.Tag0D)
For Each grp In TagGroups
Dim sorted = grp.OrderByDesc(Function(i) i).ToList()
Dim last = sorted(sorted.Count - 1)
Dim error_ As Double = Math.Abs( (sorted(0).Sxx - last.Sxx) /
(sorted(0).Sxx + last.Sxx) )
If error_ < 10.0 Then
Dim AverageVal As Double = sorted.Sum(Function(s) s.Sxx) / sorted.Count
For Each item In grp
item.Sxx = AverageVal
Next
Else
' 只需要对数据进行 2 次循环,而且没有嵌套循环
' (原始代码使用了 4 个基本循环,其中有 2 个嵌套循环)
Dim Total As Double = 0.0
Dim idx As Integer = 0
Dim prior As clsMSHStress = sorted(0)
Dim MaxDelta As Integer = 0.0
Dim IdxAtMaxDelta As Integer = 1
Dim TotalAtMaxDelta As Double = 0.0
' 循环 1
For Each item In sorted.Skip(1)
total += item.Sxx
idx += 1
Dim delta As Double = item.Vss - prior.Vss
prior = s
If delta > MaxDelta Then
MaxDelta = delta
MaxIdx = idx
TotalAtMaxDelta = total
End If
Next
Dim lowAverage As Double = TotalAtMaxDelta / IdxAtMaxDelta
Dim highAverage As Double = (Total - TotalATMaxDelta) / (sorted.Count - IdxAtMaxDelta)
' 循环 2(前半部分)
Dim i As Integer = 0
For i = 0 To IdxAtMaxDelta
sorted(i).Vss = lowAverage
Next
' 循环 2(后半部分)
For i = i To sorted.Count - 1
sorted(i).Vss = highAverage
Next
End If
Next
英文:
5000 items is nothing if the algorithm is done correctly. We can probably do this in less than a second without needing to go parallel at all. Given a modern commodity CPU clocked at a modest, say, 2 GHz, that's 2 billion clock cycles. Even better, you probably get at least 3 instructions per tick. That's 6 billion total. Simple division means we can fit this in a single second on a single non-parallel core as long as we're spending less than 1.2 million instructions per item. In fact, memory latency is probably our limiting factor here, and parallel execution might not even fix that.
A significant savings potential from the original is this loop (and similar):
For p As Integer = 0 To tstresses.Count - 1
''.Stresses.Where(Function(s) s.Tag0D = tstresses(p).Tag0D And s.Tag2D = tstresses(p).Tag2D) = tstresses(p)
'Dim idx As Integer =
.stresses(.stresses.FindIndex(Function(s) s.Tag0D = tstresses(p).Tag0D And s.Tag2D = tstresses(p).Tag2D)) = tstresses(p)
Next
It seems to be trying to match up each tstresses
item with the same item from the original gOutputStresses_LC
array. But since it also appears we are working with reference types (the cls
prefix is not typically used for Structure
), we can know these are already the same object, and the lookups back into the original array are not needed. All this will ever do is waste a bunch of time setting each array element to the same value it already has.
There is similar code elsewhere running FindIndex()
or FindAll()
doing extra lookups back in an earlier array. This extra work is significant and not needed, converting something that would be close to O(n) (after the sort) to something that's instead O(n<sup>2</sup>).
Once we understand this, the simplification opens up other avenues for improvement as well. In short, the code below should be MUCH faster by avoiding the creation of extra needless Lists/arrays and by significantly reducing the number of passes through the data.
I need to add a disclaimer that I typed this directly into the reply window, and without the benefit of any sample data to test against. It's likely there's a bug or two you'll still need to work through, including a potential off-by-one error around the final IdxAtMaxDelta
break, so be sure to test thoroughly.
Dim TagGroups = gOutputStresses_LC(0).stresses.GroupBy(Function(c) c.Tag0D)
For Each grp In TagGroups
Dim sorted = grp.OrderByDesc(Function(i) i).ToList()
Dim last = sorted(sorted.Count - 1)
Dim error_ As Double = Math.Abs( (sorted(0).Sxx - last.Sxx) /
(sorted(0).Sxx + last.Sxx) )
If error_ < 10.0 Then
Dim AverageVal As Double = sorted.Sum(Function(s) s.Sxx) / sorted.Count
For Each item In grp
item.Sxx = AverageVal
Next
Else
' Only need 2 loops through the data and NO NESTED PASSES
' (original code used 4 base passes, of which 2 had nested passes)
Dim Total As Double = 0.0
Dim idx As Integer = 0
Dim prior As clsMSHStress = sorted(0)
Dim MaxDelta As Integer = 0.0
Dim IdxAtMaxDelta As Integer = 1
Dim TotalAtMaxDelta As Double = 0.0
' Loop 1
For Each item In sorted.Skip(1)
total += item.Sxx
idx += 1
Dim delta As Double = item.Vss - prior.Vss
prior = s
If delta > MaxDelta Then
MaxDelta = delta
MaxIdx = idx
TotalAtMaxDelta = total
End If
Next
Dim lowAverage As Double = TotalAtMaxDelta / IdxAtMaxDelta
Dim highAverage As Double = (Total - TotalATMaxDelta) /
(sorted.Count - IdxAtMaxDelta)
' Loop 2 (first half)
Dim i As Integer = 0
For i = 0 To IdxAtMaxDelta
sorted(i).Vss = lowAverage
Next
' Loop 2 (second half)
For i = i To sorted.Count - 1
sorted(i).Vss = highAverage
Next
End If
Next
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论