Parallel.ForEach和foreach返回不同的引用变量。

huangapple go评论61阅读模式
英文:

Parallel.ForEach and foreach returning different reference vars

问题

(编辑:如果标题让人困惑,我乐意听取更好的建议)

我目前正在为C#课程中的一个小项目工作,遇到了一个奇怪的问题。该项目的目的是在给定目录中计算所有文件夹、文件的数量以及文件的总大小,分别使用foreachParallel.ForEach实现。

最初,我尝试使用递归返回函数,但由于它必须考虑所有可能的代码路径,并带有返回语句,所以在处理基本情况时遇到了问题。然后我切换到使用ref参数。以下是我目前用于这两种方法的代码。

/*
* 使用单个foreach计算给定路径下的所有目录、文件数量和所有文件的大小。
* 更新传递的引用参数。
*/
static void singRecurse(DirectoryInfo di, ref int countFolder, ref int countFile,
    ref long countByte)
{
    try{
        DirectoryInfo[] directories = di.GetDirectories();
        foreach(DirectoryInfo d in directories){
            countFolder += 1;
            foreach(FileInfo f in d.GetFiles()){
                countFile += 1;
                countByte += f.Length;
            }
            singRecurse(d, ref countFolder, ref countFile, ref countByte);
        }
    } catch (UnauthorizedAccessException){
        Console.WriteLine("您无权访问此目录");
    }
}

/*
* 使用并行foreach计算给定路径下的所有目录、文件数量和所有文件的大小。
* 更新传递的引用参数。
*/
static void parRecurse(DirectoryInfo di, ref int countFolder, ref int countFile,
    ref long countByte)
{
    int countFolderinLambda = countFolder;
    int countFileinLambda = countFile;
    long countByteinLambda = countByte;

    try{
        DirectoryInfo[] directories = di.GetDirectories();
        Parallel.ForEach(directories, d => {
            countFolderinLambda += 1;
            foreach(FileInfo f in d.GetFiles()){
                countFileinLambda += 1;
                countByteinLambda += f.Length;
            }
            parRecurse(d, ref countFolderinLambda, ref countFileinLambda,
                ref countByteinLambda);
        });
    } catch (UnauthorizedAccessException){
        Console.WriteLine("您无权访问此目录");
    }

    countFolder = countFolderinLambda;
    countFile = countFileinLambda;
    countByte = countByteinLambda;
}

分别作为单独进程运行时的当前输出结果如下:

并行计算耗时44毫秒
6个文件夹,20个文件,250498字节

单个计算耗时11毫秒
8个文件夹,25个文件,405153字节

为什么会有这样大的差异?

英文:

(EDIT: If the title is confusing I'm all ears for a better one)

I'm currently working on a small project for class in C#, and I've come into something strange. The purpose of the project is to count all folders, files, and the size of files in a given directory, both in foreach and Parallel.ForEach.

Initially I was making it a recursive return function, but was having issues with the base case given that it had to account for all possible code paths with returns. I then switched to using ref params. Below I have current code for the methods.

/*
* Calculate total directories, count of files, and size of all files from
* a given path using a singular foreach. Update passed reference parameters.
*/
static void singRecurse(DirectoryInfo di, ref int countFolder, ref int countFile,
    ref long countByte)
{
    try{
        DirectoryInfo[] directories = di.GetDirectories();
        foreach(DirectoryInfo d in directories){
            countFolder += 1;
            foreach(FileInfo f in d.GetFiles()){
                countFile += 1;
                countByte += f.Length;
            }
            singRecurse(d, ref countFolder, ref countFile, ref countByte);
        }
    } catch (UnauthorizedAccessException){
        Console.WriteLine("You do not have access to this directory");
    }
}

/*
* Calculate total directories, count of files, and size of all files from
* a given path using a parallel foreach. Update passed reference parameters.
*/
static void parRecurse(DirectoryInfo di, ref int countFolder, ref int countFile,
    ref long countByte)
{
    int countFolderinLambda = countFolder;
    int countFileinLambda = countFile;
    long countByteinLambda = countByte;

    try{
        DirectoryInfo[] directories = di.GetDirectories();
        Parallel.ForEach(directories, d => {
            countFolderinLambda += 1;
            foreach(FileInfo f in d.GetFiles()){
                countFileinLambda += 1;
                countByteinLambda += f.Length;
            }
            parRecurse(d, ref countFolderinLambda, ref countFileinLambda,
                ref countByteinLambda);
        });
    } catch (UnauthorizedAccessException){
        Console.WriteLine("You do not have access to this directory");
    }

    countFolder = countFolderinLambda;
    countFile = countFileinLambda;
    countByte = countByteinLambda;
}

Current output from running both as separate processes results in:

Parallel calculated in 44ms
6 folders, 20 files, 250498 bytes
    
Single calculated in 11ms
8 folders, 25 files, 405153 bytes

Why is there such a discrepancy?

答案1

得分: 1

以下是您要翻译的代码部分:

The recursive function is assigning a relative total to the actual total.
递归函数正在为实际总数分配一个相对总数

countFolder = countFolderinLambda;
countFile = countFileinLambda;
countByte = countByteinLambda;

Consider this function is running in parallel for multiple directories, and they're all assigning their own number to countFile without the knowing anything about other parallel directories.
考虑到此函数并行运行于多个目录,它们都在不知道其他并行目录的情况下为countFile分配自己的数字。

For example,
例如,

w/ has 5 files
w/x/ has 3 files
w/y/ has 10 files
w/z/ has 7 files
  • The parallel branch for w/x/ is setting the total to 8 files
  • The parallel branch for w/y/ is setting the total to 15 files
  • The parallel branch for w/z/ is setting the total to 12 files
  • w/x/的并行分支将总数设置为8个文件
  • w/y/的并行分支将总数设置为15个文件
  • w/z/的并行分支将总数设置为12个文件

Each parallel is undoing the work of the previous.
每个并行都在撤销前一个的工作。

Instead, make the recursive function count the relative files and then add to the totals instead of reassigning to them. See below - note that I've renamed your variables to relativeX and totalX for added clarity.
相反,使递归函数计算相对文件数,然后添加到总数,而不是重新分配给它们。请参见下文 - 请注意,我已将您的变量重命名为relativeXtotalX以增加清晰度。

void parRecurse(DirectoryInfo di, ref int totalFolders, ref int totalFiles, ref long totalBytes)
{
    int relativeFolders = 0;
    int relativeFiles = 0;
    long relativeBytes = 0;

    try
    {
        DirectoryInfo[] directories = di.GetDirectories();
        Parallel.ForEach(directories, d =>
        {
            Interlocked.Increment(ref relativeFolders);
            foreach (FileInfo f in d.GetFiles())
            {
                Interlocked.Increment(ref relativeFiles);
                Interlocked.Add(ref relativeBytes, f.Length);
            }
            parRecurse(d, ref relativeFolders, ref relativeFiles, ref relativeBytes);
        });
    }
    catch (UnauthorizedAccessException)
    {
        Console.WriteLine("You do not have access to this directory");
    }

    Interlocked.Add(ref totalFolders, relativeFolders);
    Interlocked.Add(ref totalFiles, relativeFiles);
    Interlocked.Add(ref totalBytes, relativeBytes);
}

As an aside, after writing various recursive functions I tend to gravitate towards creating a container class for the data instead of passing refs. Classes are passed by reference, which simplifies things. Something like this:

作为一个附注,在编写各种递归函数之后,我倾向于创建一个数据的容器类,而不是传递引用。类是通过引用传递的,这简化了事情。类似于这样的方式:

public class FileSystemCountContext
{
    private int _directories = 0;
    private int _files = 0;
    private long _bytes = 0;

    public int Directories => _directories;
    public int Files => _files;
    public long Bytes => _bytes;

    public override string ToString()
    {
        return $"{Directories} directories, {Files} files, {Bytes} bytes";
    }

    public void IncrementDirectories()
    {
        Interlocked.Increment(ref _directories);
    }

    public void IncrementFiles()
    {
        Interlocked.Increment(ref _files);
    }

    public void IncrementBytes(long amount)
    {
        Interlocked.Add(ref _bytes, amount);
    }
}

FileSystemCountContext CountFileSystem(DirectoryInfo directory, bool parallel = false, FileSystemCountContext? context = null)
{
    if (context == null)
    {
        context = new();
    }

    foreach (FileInfo fi in directory.GetFiles())
    {
        context.IncrementFiles();
        context.IncrementBytes(fi.Length);
    }

    if (parallel)
    {
        Parallel.ForEach(directory.GetDirectories(), di =>
        {
            context.IncrementDirectories();
            CountFileSystem(di, parallel, context);
        });
    }
    else
    {
        foreach (DirectoryInfo di in directory.GetDirectories())
        {
            context.IncrementDirectories();
            CountFileSystem(di, parallel, context);
        }
    }

    return context;
}

希望这对您有所帮助。如果您有任何其他问题,请随时提出。

英文:

The recursive function is assigning a relative total to the actual total.

countFolder = countFolderinLambda;
countFile = countFileinLambda;
countByte = countByteinLambda;

Consider this function is running in parallel for multiple directories, and they're all assigning their own number to countFile without the knowing anything about other parallel directories.

For example,

w/ has 5 files
w/x/ has 3 files
w/y/ has 10 files
w/z/ has 7 files
  • The parallel branch for w/x/ is setting the total to 8 files
  • The parallel branch for w/y/ is setting the total to 15 files
  • The parallel branch for w/z/ is setting the total to 12 files

Each parallel is undoing the work of the previous.

Instead, make the recursive function count the relative files and then add to the totals instead of reassigning to them. See below - note that I've renamed your variables to relativeX and totalX for added clarity.

void parRecurse(DirectoryInfo di, ref int totalFolders, ref int totalFiles, ref long totalBytes)
{
    int relativeFolders = 0;
    int relativeFiles = 0;
    long relativeBytes = 0;

    try
    {
        DirectoryInfo[] directories = di.GetDirectories();
        Parallel.ForEach(directories, d =>
        {
            Interlocked.Increment(ref relativeFolders);
            foreach (FileInfo f in d.GetFiles())
            {
                Interlocked.Increment(ref relativeFiles);
                Interlocked.Add(ref relativeBytes, f.Length);
            }
            parRecurse(d, ref relativeFolders, ref relativeFiles, ref relativeBytes);
        });
    }
    catch (UnauthorizedAccessException)
    {
        Console.WriteLine("You do not have access to this directory");
    }

    Interlocked.Add(ref totalFolders, relativeFolders);
    Interlocked.Add(ref totalFiles, relativeFiles);
    Interlocked.Add(ref totalBytes, relativeBytes);
}

As an aside, after writing various recursive functions I tend to gravitate towards creating a container class for the data instead of passing refs. Classes are passed by reference, which simplifies things. Something like this:

public class FileSystemCountContext
{
    private int _directories = 0;
    private int _files = 0;
    private long _bytes = 0;

    public int Directories => _directories;
    public int Files => _files;
    public long Bytes => _bytes;

    public override string ToString()
    {
        return $"{Directories} directories, {Files} files, {Bytes} bytes";
    }

    public void IncrementDirectories()
    {
        Interlocked.Increment(ref _directories);
    }

    public void IncrementFiles()
    {
        Interlocked.Increment(ref _files);
    }

    public void IncrementBytes(long amount)
    {
        Interlocked.Add(ref _bytes, amount);
    }
}


FileSystemCountContext CountFileSystem(DirectoryInfo directory, bool parallel = false, FileSystemCountContext? context = null)
{
    if (context == null)
    {
        context = new();
    }

    foreach (FileInfo fi in directory.GetFiles())
    {
        context.IncrementFiles();
        context.IncrementBytes(fi.Length);
    }

    if (parallel)
    {
        Parallel.ForEach(directory.GetDirectories(), di =>
        {
            context.IncrementDirectories();
            CountFileSystem(di, parallel, context);
        });
    }
    else
    {
        foreach (DirectoryInfo di in directory.GetDirectories())
        {
            context.IncrementDirectories();
            CountFileSystem(di, parallel, context);
        }
    }

    return context;
}

huangapple
  • 本文由 发表于 2023年2月10日 04:39:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/75404204.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定