File::Find::Rule – 重复的输出

huangapple go评论58阅读模式
英文:

File::Find::Rule - duplicated output

问题

我的子程序正确解析我的哈希表并返回(基本上)正确的结果。

问题是我得到了两次返回。
就是这两次的部分我不明白。

示例数据:

$local_h{$local}{name} = "somefile.txt";
$local_h{$local}{size} = 12345;
sub already_here {
    foreach my $local (keys(%local_h)) {
        my $tmp = $local_h{$local}{name};
        my $FFR_rule = File::Find::Rule
            ->size($local_h{$local}{size})
            ->start( @volumes );
        while ( defined ( my $match = $FFR_rule->match ) ) {
            my ( $name, $path, $suffix ) = fileparse($match);
            if ( $name =~ /$local_h{$local}{name}/ ) {
                say "\t$name $name has been matched by size and name to:\n\t $path$name\n";
                #   匹配可能会发生多次,稍后/其他地方处理
            } else {
                say "$match  Matched by size only\n";
                #   也许这真的是位置,但在本地重命名了。
                #   现在我会考虑它是边缘情况。
            }
        }
    }
}

输出:

somefile.txt has been matched by size and name to: a/path/to/somefile.txt
somefile.txt has been matched by size and name to: some/other/path/to/somefile.txt 

34thx.foo   Matched by size only

somefile.txt has been matched by size and name to: a/path/to/somefile.txt
somefile.txt has been matched by size and name to: some/other/path/to/somefile.txt 

34thx.foo   Matched by size only

我期望看到3行输出(如果计算空白行则为4行)。
我完全看不出重复的来源。

英文:

My subroutine parses my hashes correctly and returns (basically) correct.

The problem is I get the return twice.
It is the twice part that I do not understand.

Sample data:

$local_h{$local}{name} = "somefile.txt";
$local_h{$local}{size} = 12345;
sub already_here {
    foreach my $local (keys(%local_h)) {
        my $tmp = $local_h{$local}{name};
        my $FFR_rule = File::Find::Rule
            ->size($local_h{$local}{size})
            ->start( @volumes );
        while ( defined ( my $match = $FFR_rule->match ) ) {
            my ( $name, $path, $suffix ) = fileparse($match);
            if ( $name =~ /$local_h{$local}{name}/ ) {
say "\t$name $name has been matched by size and name to:\n\t $path$name\n";
#   Matches can occur multiple times, to be dealt with later/elsewhere
            } else {

say "$match  Matched by size only\n";
#   Maybe this really is the location but got renamed locally.
#   For now I will consider it an edge-case.
            }
        }
    }
}

output:

somefile.txt has been matched by size and name to: a/path/to/somefile.txt
somefile.txt has been matched by size and name to: some/other/path/to/somefile.txt 

34thx.foo   Matched by size only

somefile.txt has been matched by size and name to: a/path/to/somefile.txt
somefile.txt has been matched by size and name to: some/other/path/to/somefile.txt 

34thx.foo   Matched by size only

I am expecting to see 3 lines of output (4 if you count the blank line).
I am completely failing to see the source of the duplication.

答案1

得分: 2

以下是已翻译的部分:

有两种可能性。在%local_h中有两个相同名称和大小的条目,或者您两次扫描了同一个目录。如果@volumes中有两个条目,其中一个是另一个的祖先,那么您可能会两次扫描相同的目录。


话虽如此,您所采用的方法很糟糕。

假设%local_h中有10个文件,磁盘上也有10个文件。您将遍历树10次。这意味着您将对这10个文件进行每次10次检查,总共调用stat函数100次!对于20和20,这将是400次调用!没有理由这样做。在这两种情况下,以下代码只分别调用stat函数20次(而不是100次)和40次(而不是400次):

my %interesting;
for ( values( %local_h ) ) {
   ++$interesting{ $_->{ size } }{ $_->{ name } };
}

my $ffr = File::Find::Rule->start( @volumes );
while ( defined( my $qfn = $ffr->match ) ) {
   my $fn = basename( $qfn );
   my $size = -s $qfn;

  if ( my $r = $interesting{ $size } ) {
      if ( $r->{ $fn } ) {
         say "$fn size+name $qfn";
      } else {
         say "$fn size $qfn";
      }
   }
}
英文:

There are two possibilities. There are two entries in %local_h with the same name and size, or you scan the same directory twice. You could scan the same directory twice if two entries in @volumes are it or an ancestor.


That said, the approach you are taking is awful.

Let's say there are 10 files in %local_h and 10 files on disk. You will be traversing the tree 10 times. Which means you will check the 10 files 10 times each, for a total of 100 calls to stat! For 20 and 20, that's 400 calls! There's no reason to do that. In each of those scenarios, the following only does 20 (instead of 100) and 40 (instead of 400) calls to stat respectively:

my %interesting;
for ( values( %local_h ) ) {
   ++$interesting{ $_->{ size } }{ $_->{ name } };
}

my $ffr = File::Find::Rule->start( @volumes );
while ( defined( my $qfn = $ffr->match ) ) {
   my $fn = basename( $qfn );
   my $size = -s $qfn;

  if ( my $r = $interesting{ $size } ) {
      if ( $r->{ $fn } ) {
         say "$fn size+name $qfn";
      } else {
         say "$fn size $qfn";
      }
   }
}

huangapple
  • 本文由 发表于 2023年6月8日 19:56:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76431616.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定