问题

这是您要翻译的内容：

I have a problem that I seem unable to solve, with a Perl program I coded to parse a particular results file.

Its aim is to capture two values from a table embedded in a `.txt` results file, along with many other lines of information.

Since this table doesn't have a fixed number of lines in every file, the only way I found to try to parse it, is detecting the two consecutive newlines after the table and then go "backward" to capture the values of interest placed in the last line.

Here is the regex I'm using unsuccessfully and an example of the results table.

$_=~/([\d+\,]+)\s+([\d+\,]+\.\d+)\r\n\r\n/)

Table (values of interest in bold+italic):

 | begin | 1,699,932 | 10,136.45 |
 |:---- |:------:| -----:|
 | 1 | 1,712,388 | 12,455.32 | 
 | 2 | 1,712,605 | 12,484.85 | 
 | 3 | ***1,712,611*** | ***12,513.51*** | 

I tried several regex tools online where my regex matches correctly, but once incorporated into my code, it just doesn't work...

Example of reproducible code:

        #!usr/bin/perl -w
        use strict;
        use Getopt::Long;

        my ($path);
        GetOptions(
            'path=s'          => $path,
              );

        chdir $path or die "ERROR: Unable to enter $path: $!\n";
        opendir (TEMP , ".");
        my @files = readdir (TEMP);
        closedir TEMP;

    for my $file (@files) {
        my $mAssize;
        my $qualAssize;

        if($file=~/(\w+)\_LRassembly.unicycler.log/){

              my$sample=$1;

              open(INFILE,"$file") or die ("ERROR: Unable to open Log to parse file $!\n");
              chomp(my @data = <INFILE>);

            for (@data) {

              if($_=~/([\d+\,]+)\s+([\d+\,]+\.\d+)\r\n\r\n/){

                print"Matched\n";
                        $mAssize=$1;
                        $qualAssize=$2;
                        print "MaxAssemblySize $mAssize\n";
                        print "QualAssembly $qualAssize\n";
                }
            }

            print OUT ("$sample\t$mAssize\t$qualAssize\n") or die ("ERROR: Unable to write log parsing file $!\n");

        }
      close INFILE;
    }

Find attached an example of a full results file containing the table:
[https://file.io/H8kCqE3gRov0][1]


  [1]: https://file.io/H8kCqE3gRov0

Sample of the output file:

        Polishing miniasm assembly with Racon (2023-07-08 00:32:20)
    -----------------------------------------------------------
        Unicycler now uses Racon to polish the miniasm assembly. It does multiple rounds of polishing to get the best consensus. Circular unitigs are rotated between rounds such that all parts (including the ends) are polished well.

    Saving to /storage/ONT/NETRAM_Campy/Filtered_reads/NKC1231_LRassembly/miniasm_assembly/racon_polish/polishing_reads.fastq:
      38,855 long reads

    Polish       Assembly          Mapping
    round            size          quality
    begin       1,671,271        29,207.18
    1           1,685,412        33,629.12
    2           1,685,573        33,654.73
    3           1,685,628        33,682.91

    Best polish: /storage/ONT/NETRAM_Campy/Filtered_reads/NKC1231_LRassembly/miniasm_assembly/racon_polish/016_rotated.fasta
    Saving /storage/ONT/NETRAM_Campy/Filtered_reads/NKC1231_LRassembly/miniasm_assembly/13_racon_polished.gfa
    Saving /storage/ONT/NETRAM_Campy/Filtered_reads/NKC1231_LRassembly/003_racon_polished.gfa

The expected output would be ti print the following line (taking the result sample as guide):

| sample  | Assembly | Mapping |
|:---- |:------:| -----:|
| NKC1231 | 1,685,628 | 33,682.91|

英文:

I have a problem that I seem unable to solve, with a Perl program I coded to parse a particular results file.

Its aim is to capture two values from a table embedded in a .txt results file, along with many other lines of information.

Since this table doesn't have a fixed number of lines in every file, the only way I found to try to parse it, is detecting the two consecutive newlines after the table and then go "backward" to capture the values of interest placed in the last line.

Here is the regex I'm using unsuccessfully and an example of the results table.

$_=~/([\d+\,]+)\s+([\d+\,]+\.\d+)\r\n\r\n/)

Table (values of interest in bold+italic):

begin	1,699,932	10,136.45
1	1,712,388	12,455.32
2	1,712,605	12,484.85
3	1,712,611	12,513.51

I tried several regex tools online where my regex matches correctly, but once incorporated into my code, it just doesn't work...

Example of reproducible code:

    #!usr/bin/perl -w
    use strict;
    use Getopt::Long;
    
    my ($path);
    GetOptions(
        &#39;path=s&#39;          =&gt; $path,
          );
        
    chdir $path or die &quot;ERROR: Unable to enter $path: $!\n&quot;;
    opendir (TEMP , &quot;.&quot;);
    my @files = readdir (TEMP);
    closedir TEMP;

for my $file (@files) {
    my $mAssize;
    my $qualAssize;

    if($file=~/(\w+)\_LRassembly.unicycler.log/){

          my$sample=$1;
          
          open(INFILE,&quot;$file&quot;) or die (&quot;ERROR: Unable to open Log to parse file $!\n&quot;);
          chomp(my @data = &lt;INFILE&gt;);

		for (@data) {
            
          if($_=~/([\d+\,]+)\s+([\d+\,]+\.\d+)\r\n\r\n/){
               
            print&quot;Matched\n&quot;;
                    $mAssize=$1;
                    $qualAssize=$2;
                    print &quot;MaxAssemblySize $mAssize\n&quot;;
                    print &quot;QualAssembly $qualAssize\n&quot;;
            }
		}
         
        print OUT (&quot;$sample\t$mAssize\t$qualAssize\n&quot;) or die (&quot;ERROR: Unable to write log parsing file $!\n&quot;);
     
	}
  close INFILE;
}

Find attached an example of a full results file containing the table:
https://file.io/H8kCqE3gRov0

Sample of the output file:

    Polishing miniasm assembly with Racon (2023-07-08 00:32:20)
-----------------------------------------------------------
    Unicycler now uses Racon to polish the miniasm assembly. It does multiple rounds of polishing to get the best consensus. Circular unitigs are rotated between rounds such that all parts (including the ends) are polished well.

Saving to /storage/ONT/NETRAM_Campy/Filtered_reads/NKC1231_LRassembly/miniasm_assembly/racon_polish/polishing_reads.fastq:
  38,855 long reads

Polish       Assembly          Mapping
round            size          quality
begin       1,671,271        29,207.18
1           1,685,412        33,629.12
2           1,685,573        33,654.73
3           1,685,628        33,682.91

Best polish: /storage/ONT/NETRAM_Campy/Filtered_reads/NKC1231_LRassembly/miniasm_assembly/racon_polish/016_rotated.fasta
Saving /storage/ONT/NETRAM_Campy/Filtered_reads/NKC1231_LRassembly/miniasm_assembly/13_racon_polished.gfa
Saving /storage/ONT/NETRAM_Campy/Filtered_reads/NKC1231_LRassembly/003_racon_polished.gfa

The expected output would be ti print the following line (taking the result sample as guide):

sample	Assembly	Mapping
NKC1231	1,685,628	33,682.91

答案1

得分: 2

@data`的每个元素都是一行。所以没有一个可能包含两个LF！

如果你要逐行读取文件，你需要在一次通行中传递信息。

打开我的文件句柄，避免使用全局变量，并始终使用三参数open。
或者死掉：“ERROR: 无法打开日志文件`$file`：$！”;包括文件名

我的（$found，$mAssize，$qualAssize）;
while（< $INFILE >）{ //不需要将整个文件加载到内存中。
   s / \ s + \ z / /; //删除行尾。处理`\ n`和`\ r\n`都可以。

   如果（$found &&！length（$ _））{
      打印“匹配”
      打印“MaxAssemblySize $mAssize”
      打印“QualAssembly $qualAssize”
   }

   $found =（$mAssize，$qualAssize）= /（[\d+\,]+）\s+（[\d+\,]+\.\d+）\z /;
}

英文:

Each element of @data is a single line. So none could possibly contains two LF!

If you're going to read the file a line at a time, you will need to carry information from one pass to the next.

open( my $INFILE, &quot;&lt;&quot;, $file )           # Avoid globals, and always use 3-arg open
   or die( &quot;ERROR: Can&#39;t open log file `$file`: $!\n&quot; );  # Incl file name

my ( $found, $mAssize, $qualAssize );
while ( &lt;$INFILE&gt; ) {                    # No need to load entire file into mem.
   s/\s+\z//;  # Remove line endings. Handles both `\n` and `\r\n`.

   if ( $found &amp;&amp; !length( $_ ) ) {
      print &quot;Matched\n&quot;;
      print &quot;MaxAssemblySize $mAssize\n&quot;;
      print &quot;QualAssembly $qualAssize\n&quot;;
   }

   $found = ( $mAssize, $qualAssize ) = /([\d+\,]+)\s+([\d+\,]+\.\d+)\z/;
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Perl匹配两个连续的换行符

问题

答案1

从字符串中提取基于特定键值对的数值。

JavaScript正则表达式似乎忽略了锚点。

字符串表示方法调用的正则表达式

Go – Getting the text of a single particular HTML element from a document with a known structure

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论