英文:
Parallel.For is only utilising the CPU 10% all the time
问题
所以,我试图通过将并行性应用于我的算法来加快工作流程。我有一个数据列表,并且我有一个窗口大小,它小于该数据列表的数量。
我的任务是在我的数据上应用一个窗口,然后我需要进行一些比较,以确定该窗口是否有坏数据,通过返回一个布尔值来判断。然后,我将该窗口向下滑动1个单位,重复这个过程,直到列表完成。
问题是当我的窗口大小很大时,速度变得非常慢。时间复杂度将会是O(N*W^2),其中N是数据大小,W是窗口大小。我需要将W大小的数据复制到一个`sublist(GetRange)`中,然后循环遍历该`sublist`进行比较。
所以我尝试通过将该代码放入`Parallel.For`中来应用并行性。然而,当我查看CPU核心使用情况时,它只显示约10%的利用率,并且我可以看到一些核心正在空闲。在这种情况下,如何最大化使用所有的CPU性能呢?我在我的机器上有16个核心。
public static double CountBadData(List<double> data, int someWindowSize, double limit)
{
var badCount = 0;
var totalSlidings = data.Count - someWindowSize + 1;
Parallel.For(0, totalSlidings, i =>
{
if (!DetermineGoodOrBad(data.GetRange(i,someWindowSize), limit))
Interlocked.Increment(ref badCount);
});
return badCount;
}
private bool DetermineGoodOrBad(List<double> subData, double limit)
{
foreach(var data in subData)
{
if (data > limit) return false;
}
return true;
}
<details>
<summary>英文:</summary>
So, I was trying speed up the my workflow by applying parallelism to my algorithm. I have a list of data, and I am given a window size that is less than the count of that data list.
My task is to apply a window on my data, and then I need to do some comparison to determine whether that window has bad data or not by returning a boolean. Then, I slide that window down by just 1, repeat, and continue the process until the list is finish.
The problem is it starts to get very slow when my window size is big. It will be something like O(N*W^2), where N is the data size, and W is the window size. I need to copy W size of data into a `sublist(GetRange)`, and then loop through that `sublist` to compare.
So I try to apply parallelism by putting that code in a `Parallel.For`. However, when I look at the CPU core usage it just shows ~10% utilisation, and I can see some of the cores are just being idle. How do I maximize all my CPU power in this case? I have got like 16 cores on my machine.
public static double CountBadData(List<double> data, int someWindowSize, double limit)
{
var badCount = 0;
var totalSlidings = data.Count - someWindowSize + 1;
Parallel.For(0, totalSlidings, i =>
{
if (!DetermineGoodOrBad(data.GetRange(i,someWindowSize), limit))
Interlocked.Increment(ref badCount);
});
return badCount;
}
private bool DetermineGoodOrBad(List<double> subData, double limit)
{
foreach(var data in subData)
{
if (data > limit) return false;
}
return true;
}
</details>
# 答案1
**得分**: 3
我不打算试图回答为什么并行性效果不佳。可能存在带宽或缓存问题,或者CPU花费时间在`badCount`变量的所有权转移上。您可能需要对程序进行性能分析以获得真正的答案。
我打算回答其中隐含的问题,即如何使代码更快。
一种方法是跟踪滑动窗口中存在多少个坏值,因此当我们将窗口移动一步时,我们只需要检查已添加到窗口并从窗口中删除的值,并相应地减少/增加运行计数:
```csharp
public static double CountBadData(List<double> data, int someWindowSize, double limit)
{
var badValuesInWindow = 0;
var totalSlidings = data.Count - someWindowSize + 1;
var runningBadCount = 0;
for (int i = 0; i < someWindowSize; i++)
{
if (data[i] > limit)
{
runningBadCount++;
}
}
for (var i = 0; i < totalSlidings; i++)
{
if (data[i] > limit)
{
runningBadCount--;
}
if (data[i + someWindowSize] > limit)
{
runningBadCount++;
}
if (runningBadCount > 0)
{
badValuesInWindow++;
}
}
return badValuesInWindow;
}
请注意,这段代码未经过测试。我假设可能存在一些偏差错误。
英文:
I'm not going to try to answer why the parallelism works poorly. There might be bandwith or caching issues, or that the CPUs spend time transferring ownership of the badCount
variable. You will likely need to profile your program to get a real answer.
I'm instead going to try answering the implied question, how to make the code faster.
One way to do this is to keep track of how many bad values exist within the sliding window, so when we move the window one step we only need to check the values that where added and removed from the window, and decriment/increment the running count accordingly:
public static double CountBadData(List<double> data, int someWindowSize, double limit)
{
var badValuesInWindow = 0;
var totalSlidings = data.Count - someWindowSize + 1;
var runningBadCount = 0;
for (int i = 0; i < someWindowSize; i++)
{
if (data[i] > limit)
{
runningBadCount++;
}
}
for (var i = 0; i < totalSlidings; i++)
{
if (data[i] > limit)
{
runningBadCount--;
}
if (data[i + someWindowSize] > limit)
{
runningBadCount++;
}
if (runningBadCount > 0)
{
badValuesInWindow++;
}
}
return badValuesInWindow;
}
Be aware that this code is not tested. I would assume that there are a few of-by-one errors.
答案2
得分: 2
我同意JonasH的观点,即优化算法比尝试并行化未经优化的算法更有前途,可以解决您的问题。为了完整起见,可以通过避免使用List<T>.GetRange
方法创建List<double> data
的物理切片,而是传递整个列表,并使用索引表示虚拟切片的开始和结束来改进并行化尝试:
private bool IsGoodData(List<double> data, int from, int toExclusive, double limit)
{
for (int i = from; i < toExclusive; i++)
if (data[i] > limit) return false;
return true;
}
然后`CountBadData`可以这样实现:
```csharp
public double CountBadData(List<double> data, int windowSize, double limit)
{
var badCount = 0;
var totalSlidings = data.Count - windowSize + 1;
ParallelOptions options = new() { MaxDegreeOfParallelism = Environment.ProcessorCount };
Parallel.For(0, totalSlidings, options, i =>
{
if (!IsGoodData(data, i, i + windowSize, limit))
Interlocked.Increment(ref badCount);
});
return badCount;
}
假定IsGoodData
方法仅限于读取data
列表。否则,同时从多个线程修改相同的List<T>
会导致未定义的行为。
英文:
I agree with JonasH that optimizing the algorithm is a more promising path for solving your problem, than trying to parallelize an unoptimized algorithm. For the sake of completeness, the parallelization attempt could be improved by avoiding to create physical slices of the List<double> data
with the List<T>.GetRange
method, and instead passing the whole list with indexes denoting the beginning and end of a virtual slice:
private bool IsGoodData(List<double> data, int from, int toExclusive, double limit)
{
for (int i = from; i < toExclusive; i++)
if (data[i] > limit) return false;
return true;
}
Then the CountBadData
can be implemented like this:
public double CountBadData(List<double> data, int windowSize, double limit)
{
var badCount = 0;
var totalSlidings = data.Count - windowSize + 1;
ParallelOptions options = new() { MaxDegreeOfParallelism = Environment.ProcessorCount };
Parallel.For(0, totalSlidings, options, i =>
{
if (!IsGoodData(data, i, i + windowSize, limit))
Interlocked.Increment(ref badCount);
});
return badCount;
}
It is assumed that the IsGoodData
method is restricted to just reading the data
list. Otherwise, modifying the same List<T>
from multiple threads concurrently would result in undefined behavior.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论