2023年6月8日 19:56:30go评论71阅读模式

英文:

File::Find::Rule - duplicated output

问题

我的子程序正确解析我的哈希表并返回（基本上）正确的结果。

问题是我得到了两次返回。
就是这两次的部分我不明白。

示例数据：

$local_h{$local}{name} = "somefile.txt";
$local_h{$local}{size} = 12345;

sub already_here {
    foreach my $local (keys(%local_h)) {
        my $tmp = $local_h{$local}{name};
        my $FFR_rule = File::Find::Rule
            ->size($local_h{$local}{size})
            ->start( @volumes );
        while ( defined ( my $match = $FFR_rule->match ) ) {
            my ( $name, $path, $suffix ) = fileparse($match);
            if ( $name =~ /$local_h{$local}{name}/ ) {
                say "\t$name $name has been matched by size and name to:\n\t $path$name\n";
                #   匹配可能会发生多次，稍后/其他地方处理
            } else {
                say "$match  Matched by size only\n";
                #   也许这真的是位置，但在本地重命名了。
                #   现在我会考虑它是边缘情况。
            }
        }
    }
}

输出：

somefile.txt has been matched by size and name to: a/path/to/somefile.txt
somefile.txt has been matched by size and name to: some/other/path/to/somefile.txt 

34thx.foo   Matched by size only

somefile.txt has been matched by size and name to: a/path/to/somefile.txt
somefile.txt has been matched by size and name to: some/other/path/to/somefile.txt 

34thx.foo   Matched by size only

我期望看到3行输出（如果计算空白行则为4行）。
我完全看不出重复的来源。

英文:

My subroutine parses my hashes correctly and returns (basically) correct.

The problem is I get the return twice.
It is the twice part that I do not understand.

Sample data:

$local_h{$local}{name} = &quot;somefile.txt&quot;;
$local_h{$local}{size} = 12345;

sub already_here {
    foreach my $local (keys(%local_h)) {
        my $tmp = $local_h{$local}{name};
        my $FFR_rule = File::Find::Rule
            -&gt;size($local_h{$local}{size})
            -&gt;start( @volumes );
        while ( defined ( my $match = $FFR_rule-&gt;match ) ) {
            my ( $name, $path, $suffix ) = fileparse($match);
            if ( $name =~ /$local_h{$local}{name}/ ) {
say &quot;\t$name $name has been matched by size and name to:\n\t $path$name\n&quot;;
#   Matches can occur multiple times, to be dealt with later/elsewhere
            } else {

say &quot;$match  Matched by size only\n&quot;;
#   Maybe this really is the location but got renamed locally.
#   For now I will consider it an edge-case.
            }
        }
    }
}

output:

somefile.txt has been matched by size and name to: a/path/to/somefile.txt
somefile.txt has been matched by size and name to: some/other/path/to/somefile.txt 

34thx.foo   Matched by size only

somefile.txt has been matched by size and name to: a/path/to/somefile.txt
somefile.txt has been matched by size and name to: some/other/path/to/somefile.txt 

34thx.foo   Matched by size only

I am expecting to see 3 lines of output (4 if you count the blank line).
I am completely failing to see the source of the duplication.

答案1

得分: 2

以下是已翻译的部分：

有两种可能性。在%local_h中有两个相同名称和大小的条目，或者您两次扫描了同一个目录。如果@volumes中有两个条目，其中一个是另一个的祖先，那么您可能会两次扫描相同的目录。

话虽如此，您所采用的方法很糟糕。

假设%local_h中有10个文件，磁盘上也有10个文件。您将遍历树10次。这意味着您将对这10个文件进行每次10次检查，总共调用stat函数100次！对于20和20，这将是400次调用！没有理由这样做。在这两种情况下，以下代码只分别调用stat函数20次（而不是100次）和40次（而不是400次）：

my %interesting;
for ( values( %local_h ) ) {
   ++$interesting{ $_->{ size } }{ $_->{ name } };
}

my $ffr = File::Find::Rule->start( @volumes );
while ( defined( my $qfn = $ffr->match ) ) {
   my $fn = basename( $qfn );
   my $size = -s $qfn;

  if ( my $r = $interesting{ $size } ) {
      if ( $r->{ $fn } ) {
         say "$fn size+name $qfn";
      } else {
         say "$fn size $qfn";
      }
   }
}

英文:

There are two possibilities. There are two entries in %local_h with the same name and size, or you scan the same directory twice. You could scan the same directory twice if two entries in @volumes are it or an ancestor.

That said, the approach you are taking is awful.

Let's say there are 10 files in %local_h and 10 files on disk. You will be traversing the tree 10 times. Which means you will check the 10 files 10 times each, for a total of 100 calls to stat! For 20 and 20, that's 400 calls! There's no reason to do that. In each of those scenarios, the following only does 20 (instead of 100) and 40 (instead of 400) calls to stat respectively:

my %interesting;
for ( values( %local_h ) ) {
   ++$interesting{ $_-&gt;{ size } }{ $_-&gt;{ name } };
}

my $ffr = File::Find::Rule-&gt;start( @volumes );
while ( defined( my $qfn = $ffr-&gt;match ) ) {
   my $fn = basename( $qfn );
   my $size = -s $qfn;

  if ( my $r = $interesting{ $size } ) {
      if ( $r-&gt;{ $fn } ) {
         say &quot;$fn size+name $qfn&quot;;
      } else {
         say &quot;$fn size $qfn&quot;;
      }
   }
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

File::Find::Rule – 重复的输出

问题

答案1

Errors with git clone in perl script

非贪婪匹配的正则表达式行为不同

Perl Mojolicious：将参数传递给代码引用

安装和更新 Perl 模块为 “universal”（x86_64、arm64）？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论