2023年5月29日 21:44:52go评论59阅读模式

英文:

perl read and filter input from file

问题

以下是您要翻译的内容：

我有一个数据输入文件，格式如下所示，

    &lt;name&gt; &lt;attr1&gt; &lt;attr2&gt; &lt;attr3&gt; &lt;working_area&gt; &lt;date&gt;
    alan x x x /path/to/alan_work/a Wed_May_17_04:17:40_2023
    alan x x x /path/to/alan_work/b Sun_May_28_21:22:52_2023
    alan x a x /path/to/alan_work/c Sun_May_28_22:25:47_2023
    ben x x x /path/to/ben_work/a Wed_May_17_04:18:44_2023
    ben a b x /path/to/ben_work/b Wed_May_17_08:19:47_2023
    charles a a a /path/to/charles_work/a Wed_May_17_04:17:40_2023
    charles a a a /path/to/charles_work/b Thurs_May_18_04:17:40_2023
    ben x x x /path/to/ben_work/c Fri_May_19_04:18:44_2023

我正在编写Perl脚本，希望实现以下标准：

1. 对于相同的用户，如果2个或更多不同工作区中的所有属性1、2和3都相同，则获取具有最新日期属性的工作区路径


预期输出：

    /path/to/alan_work/b
    /path/to/alan_work/c
    /path/to/ben_work/c
    /path/to/ben_work/b
    /path/to/charles_work/b

短小代码片段（我不知道如何继续）

    open(FF, &#39;&lt;&#39;, $temp_file) or die &quot;cannot open $temp_file&quot;;
        while (my $line = &lt;FF&gt;) {
          chomp $line;
          my @split_type = split(&#39; &#39;, $line);
    	#在这里不知道该怎么做
        }

英文:

I have data input file having format as below example,

&lt;name&gt; &lt;attr1&gt; &lt;attr2&gt; &lt;attr3&gt; &lt;working_area&gt; &lt;date&gt;
alan x x x /path/to/alan_work/a Wed_May_17_04:17:40_2023
alan x x x /path/to/alan_work/b Sun_May_28_21:22:52_2023
alan x a x /path/to/alan_work/c Sun_May_28_22:25:47_2023
ben x x x /path/to/ben_work/a Wed_May_17_04:18:44_2023
ben a b x /path/to/ben_work/b Wed_May_17_08:19:47_2023
charles a a a /path/to/charles_work/a Wed_May_17_04:17:40_2023
charles a a a /path/to/charles_work/b Thurs_May_18_04:17:40_2023
ben x x x /path/to/ben_work/c Fri_May_19_04:18:44_2023

I am writing perl script and want to achieve below criteria:

For same user, if all attributes 1, 2 and 3 are the same among 2 or more different working area, get the working area path that with latest date attribute

Expected Output:

/path/to/alan_work/b
/path/to/alan_work/c
/path/to/ben_work/c
/path/to/ben_work/b
/path/to/charles_work/b

Short snippet (I have no idea how to proceed)

open(FF, &#39;&lt;&#39;, $temp_file) or die &quot;cannot open $temp_file&quot;;
    while (my $line = &lt;FF&gt;) {
      chomp $line;
      my @split_type = split(&#39; &#39;, $line);
	#no idea here
    }

答案1

得分: 1

由于值之间用空格分隔，日期组件之间用下划线分隔，因此处理这个问题相当直接。

我们将使用用户名和属性作为哈希的键，并将哈希的值替换为具有最高日期值的工作路径。

为了使这个工作，我们必须将日期转换为可以进行比较的标准形式：

use strict;
use warnings;
use v5.10;

my $file = 'input.txt';
open my $fh, '<', $file or die "Could not open $file: $!\n";

my %paths;
while(<$fh>){
    /^</ and next;     # 跳过标题
    my ($name, $attr1, $attr2, $attr3, $workpath, $date) = split;
    my $key = "$name|$attr1$attr2$attr3";
    $date = transformDate($date);

    $paths{$key} = [$date, $workpath]
        if !defined $paths{$key} || $date gt $paths{$key}[0];
}

say $paths{$_}[1] for sort keys %paths;

# 将日期从：Wed_May_17_04:17:40_2023
# 转换为：2023051704:17:40
sub transformDate {
    my $date = shift;
    state $monthindex = {
        Jan => 1,  Feb => 2,  Mar => 3,
        Apr => 4,  May => 5,  Jun => 6,
        Jul => 7,  Aug => 8,  Sep => 9,
        Oct => 10, Nov => 11, Dec => 12,
    };
    my (undef, $month, $day, $time, $year) = split/_/, $date;
    sprintf('%d%02d%02d%s', $year, $monthindex->{$month}, $day, $time);
}

编辑： 删除了备用日期解析，因为在澄清日期格式后不再需要。

英文:

Since the values are separated with spaces, and the date components are separated with underscores, processing this is fairly straight forward.

We'll use the username and attributes as key to a hash, and replace the value of the hash with the workpath for the highest date value.

To make this work, we have to transform the date to a standard form that can be compared:

use strict;
use warnings;
use v5.10;

my $file = &#39;input.txt&#39;;
open my $fh, &#39;&lt;&#39;, $file or die &quot;Could not open $file: $!\n&quot;;

my %paths;
while(&lt;$fh&gt;){
    /^&lt;/ and next;     # skip the header
    my ($name, $attr1, $attr2, $attr3, $workpath, $date) = split;
    my $key = &quot;$name|$attr1$attr2$attr3&quot;;
    $date = transformDate($date);

    $paths{$key} = [$date, $workpath]
        if !defined $paths{$key} || $date gt $paths{$key}[0];
}

say $paths{$_}[1] for sort keys %paths;

# change date from: Wed_May_17_04:17:40_2023
#          to this: 2023051704:17:40
sub transformDate {
    my $date = shift;
    state $monthindex = {
        Jan =&gt; 1,  Feb =&gt; 2,  Mar =&gt; 3,
        Apr =&gt; 4,  May =&gt; 5,  Jun =&gt; 6,
        Jul =&gt; 7,  Aug =&gt; 8,  Sep =&gt; 9,
        Oct =&gt; 10, Nov =&gt; 11, Dec =&gt; 12,
    };
    my (undef, $month, $day, $time, $year) = split/_/, $date;
    sprintf(&#39;%d%02d%02d%s&#39;, $year, $monthindex-&gt;{$month}, $day, $time);
}

Edit: removed the alternative date parsing, since it was not needed after clarifying the date format.

答案2

得分: 0

将数据存储在一个由名称、属性和区域作为键，使用日期作为值的哈希表中。按值对区域进行排序（您需要为您的日期格式实现日期比较，或在填充哈希表时解析它，并使用可比较的值填充哈希表），然后返回最后一个。

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

# 这需要正确解析日期，但对于示例而言，它有效，因为要比较的日期始终在同一个月且不会在同一天。
sub by_date {
    my ($dates_by_area, $A, $B) = @_;
    $dates_by_area->{$A} =~ /May_?([0-9]+)/;
    my $day_a = $1;
    $dates_by_area->{$B} =~ /May_?([0-9]+)/;
    my $day_b = $1;
    $day_a <=> $day_b
}

my $temp_file = shift;

open my $in, '<', $temp_file or die "cannot open $temp_file";
my %dates;
while (my $line = <$in>) {
    next if $line =~ /^</;

    my ($name, $attr1, $attr2, $attr3, $area, $date) = split ' ', $line;
    $dates{$name}{$attr1}{$attr2}{$attr3}{$area} = $date;
}

for my $name (keys %dates) {
    for my $attr1 (keys %{ $dates{$name} }) {
        for my $attr2 (keys %{ $dates{$name}{$attr1} }) {
            for my $attr3 (keys %{ $dates{$name}{$attr1}{$attr2} }) {
                my %dates_by_area = %{ $dates{$name}{$attr1}{$attr2}{$attr3} };
                my @sorted = sort { by_date(\%dates_by_area, $a, $b) }
                             keys %dates_by_area;
                say $sorted[-1];
            }
        }
    }
}

在 %data 中收集的结构可以使用以下代码进行检查：

use Data::Dumper;
warn Dumper \%data;

对于示例，它会产生以下输出：

$VAR1 = {
          'alan' => {
                      'x' => {
                               'x' => {
                                        'x' => {
                                                 '/path/to/alan_work/a' => 'Wed_May17_04:17:40_2023',
                                                 '/path/to/alan_work/b' => 'Sun_May_28_21:22:52_2023'
                                               }
                                      },
                               'a' => {
                                        'x' => {
                                                 '/path/to/alan_work/c' => 'Sun_May_28_22:25:47_2023'
                                               }
                                      }
                             }
                    },
          'ben' => {
                     'x' => {
                              'x' => {
                                       'x' => {
                                                '/path/to/ben_work/a' => 'Wed_May17_04:18:44_2023',
                                                '/path/to/ben_work/c' => 'Fri_May19_04:18:44_2023'
                                              }
                                     }
                            },
                     'a' => {
                              'b' => {
                                       'x' => {
                                                '/path/to/ben_work/b' => 'Wed_May17_08:19:47_2023'
                                              }
                                     }
                            }
                   },
          'charles' => {
                         'a' => {
                                  'a' => {
                                           'a' => {
                                                    '/path/to/charles_work/a' => 'Wed_May17_04:17:40_2023',
                                                    '/path/to/charles_work/b' => 'Thurs_May18_04:17:40_2023'
                                                  }
                                         }
                                }
                       }
        };

如果相同名称、属性和区域有两个不同的日期，您没有提供任何指示应该发生什么。当前的实现只使用输入中相应的最后一行。

此外，您可以注意我切换到词法文件句柄以避免裸字文件句柄带来的问题。使用 split ' ' 时，不需要 chomp，因为这种特殊形式的 split 会删除包括换行符在内的尾随空白。

英文:

Store the data in a hash keyed by the name, attributes, and the area, use dates as the values. Sort the areas by the values (you need to implement date comparison for your format, or parse it when populating the hash and populate the hash with comparable values) and return the last one.

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

# This needs to properly parse the dates, but for the example it
# works, as the dates to compare are always in the same month and never
# on the same day.
sub by_date {
    my ($dates_by_area, $A, $B) = @_;
    $dates_by_area-&gt;{$A} =~ /May_?([0-9]+)/;
    my $day_a = $1;
    $dates_by_area-&gt;{$B} =~ /May_?([0-9]+)/;
    my $day_b = $1;
    $day_a &lt;=&gt; $day_b
}

my $temp_file = shift;

open my $in, &#39;&lt;&#39;, $temp_file or die &quot;cannot open $temp_file&quot;;
my %dates;
while (my $line = &lt;$in&gt;) {
    next if $line =~ /^&lt;/;

    my ($name, $attr1, $attr2, $attr3, $area, $date) = split &#39; &#39;, $line;
    $dates{$name}{$attr1}{$attr2}{$attr3}{$area} = $date;
}

for my $name (keys %dates) {
    for my $attr1 (keys %{ $dates{$name} }) {
        for my $attr2 (keys %{ $dates{$name}{$attr1} }) {
            for my $attr3 (keys %{ $dates{$name}{$attr1}{$attr2} }) {
                my %dates_by_area = %{ $dates{$name}{$attr1}{$attr2}{$attr3} };
                my @sorted = sort { by_date(\%dates_by_area, $a, $b) }
                             keys %dates_by_area;
                say $sorted[-1];
            }
        }
    }
}

The structure collected in %data can be inspected using

use Data::Dumper;
warn Dumper \%data;

witch gives the following output for the sample:

$VAR1 = {
          &#39;alan&#39; =&gt; {
                      &#39;x&#39; =&gt; {
                               &#39;x&#39; =&gt; {
                                        &#39;x&#39; =&gt; {
                                                 &#39;/path/to/alan_work/a&#39; =&gt; &#39;Wed_May17_04:17:40_2023&#39;,
                                                 &#39;/path/to/alan_work/b&#39; =&gt; &#39;Sun_May_28_21:22:52_2023&#39;
                                               }
                                      },
                               &#39;a&#39; =&gt; {
                                        &#39;x&#39; =&gt; {
                                                 &#39;/path/to/alan_work/c&#39; =&gt; &#39;Sun_May_28_22:25:47_2023&#39;
                                               }
                                      }
                             }
                    },
          &#39;ben&#39; =&gt; {
                     &#39;x&#39; =&gt; {
                              &#39;x&#39; =&gt; {
                                       &#39;x&#39; =&gt; {
                                                &#39;/path/to/ben_work/a&#39; =&gt; &#39;Wed_May17_04:18:44_2023&#39;,
                                                &#39;/path/to/ben_work/c&#39; =&gt; &#39;Fri_May19_04:18:44_2023&#39;
                                              }
                                     }
                            },
                     &#39;a&#39; =&gt; {
                              &#39;b&#39; =&gt; {
                                       &#39;x&#39; =&gt; {
                                                &#39;/path/to/ben_work/b&#39; =&gt; &#39;Wed_May17_08:19:47_2023&#39;
                                              }
                                     }
                            }
                   },
          &#39;charles&#39; =&gt; {
                         &#39;a&#39; =&gt; {
                                  &#39;a&#39; =&gt; {
                                           &#39;a&#39; =&gt; {
                                                    &#39;/path/to/charles_work/a&#39; =&gt; &#39;Wed_May17_04:17:40_2023&#39;,
                                                    &#39;/path/to/charles_work/b&#39; =&gt; &#39;Thurs_May18_04:17:40_2023&#39;
                                                  }
                                         }
                                }
                       }
        };

You gave no instructions what should happen if there are two different days for the same name, attributes, and area. The current implementation just uses the last corresponding line from the input.

Also, you can notice I switched to lexical filehandles to avoid problems bareword filehandles bring.
When using split ' ', you don't need to chomp, as this special form of split removes the trailing whitespace including a newline.

答案3

得分: 0

    # 提取字段
    ($u, $a1, $a2, $a3, $p, $_) = split;
    $id = "$u $a1 $a2 $a3";

    # 调整日期格式为标准形式
    y/[A-Za-z0-9]//cd;
    s/.*([A-Z][a-z]{2})[^\d]*/$1/;
    eval {
        $t = Time::Piece->strptime($_, "%b%d%H%M%S%Y")->datetime;
    } or do {
        # 添加错误处理
        # (这也会捕获任何标题行)
        next;
    };

    # 如果日期“更好”则保存路径
    if ($t ge $ts{$id}) {
        $ts{$id} = $t;
        $ps{$id} = $p;
    }

    # 打印结果
    END { say for sort values %ps }
' datafile

英文:

perl -MTime::Piece -nE &#39;
    # extract fields
    ($u,$a1,$a2,$a3,$p,$_) = split;
    $id = &quot;$u $a1 $a2 $a3&quot;;

    # massage date format into standard form
    y/[A-Za-z0-9]//cd;
    s/.*([A-Z][a-z]{2})[^\d]*/$1/;
    eval {
        $t = Time::Piece-&gt;strptime($_,&quot;%b%d%H%M%S%Y&quot;)-&gt;datetime;
    } or do {
        # add error handling
        # (this also catches any header)
        next;
    };

    # save path if &quot;better&quot;
    if ($t ge $ts{$id}) {
        $ts{$id} = $t;
        $ps{$id} = $p;
    }

    # print results
    END { say for sort values %ps }
&#39; datafile

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

用Perl读取并筛选来自文件的输入

问题

答案1

答案2

答案3

DBIx::Class如何在创建时检索生成的UUID？

Alpaca卖出订单在Perl中

无法在Docker文件中安装Perl模块。

`use utf8` 对 `Encode` 有什么影响？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论