awk:忽略文件中的注释

huangapple go评论54阅读模式
英文:

awk: ignore the comment in file

问题

我正在做一个 awk 作业
并获得文件 IN-arg

# 这是一个注释
joejoejoe 10 20 30 # 这也是一个注释
OAK -999 10 # 10000
joeJOE 2000
oak 10
milk 1000 # 2000

我想忽略文件 IN-arg 中的注释
并获得类似于以下内容

joejoejoe 10 20 30 
OAK -999 10 
joeJOE 2000
oak 10
milk 1000 

如何做到这一点

以下是我的代码

#! /bin/awk -f

{
  for(i=1;i<=NF;i++)
  {
    if($i == "#")
      i = NF + 1
    print $i
  }
}
英文:

i'm doing a awk homework
and get the file IN-arg

      # this is a comment
      joejoejoe 10 20 30 #this is also a comment
      OAK -999 10 # 10000
      joeJOE 2000
      oak 10
      milk 1000 # 2000

and I want to ignore the comment in the file IN-arg
and get something like this

      joejoejoe 10 20 30 
      OAK -999 10 
      joeJOE 2000
      oak 10
      milk 1000 

how can i do it

below is my code

#! /bin/awk -f

{
  for(i=1;i&lt;=NF;i++)
  {
    if($i == &quot;#&quot;)
      i = NF + 1
    print $i
  }

答案1

得分: 2

将字段分隔符更改为#,并在第一个字段$1不为空时打印它。

请尝试以下内容:

#!/bin/awk -f
BEGIN { FS="#" }
$1 ~ /./ { print $1 }
英文:

What about changing the field separator to # and printing $1 (first field) when it is not empty?

Give this a try:

#!/bin/awk -f
BEGIN { FS=&quot;#&quot; }
$1 ~ /./ { print $1 }

答案2

得分: 1

使用任何POSIX awk:

$ awk '$1 !~ /^#/{sub(/[[:space:]]*#.*/,""); print}' 'IN-arg'
      joejoejoe 10 20 30
      OAK -999 10
      joeJOE 2000
      oak 10
      milk 1000

我假设你不希望在从行尾删除注释时保留尾随的空白,但如果你希望保留尾随的空白,请从命令中删除[[:space:]]*

英文:

Using any POSIX awk:

$ awk &#39;$1 !~ /^#/{sub(/[[:space:]]*#.*/,&quot;&quot;); print}&#39; &#39;IN-arg&#39;
      joejoejoe 10 20 30
      OAK -999 10
      joeJOE 2000
      oak 10
      milk 1000

I'm assuming above that you don't really want to leave trailing white space when comments are removed from the end of lines but if you do then remove [[:space:]]* from the command.

答案3

得分: 0

1 joejoejoe 10 20 30
2 OAK -999 10
3 joeJOE 2000
4 oak 10
5 milk 1000

英文:
echo &#39;
      # this is a comment
      joejoejoe 10 20 30 #this is also a comment
      OAK -999 10 # 10000
      joeJOE 2000
      oak 10
      milk 1000 # 2000&#39; | 

awk &#39;NF = $1 !~ &quot;^[ \t-\r]*$&quot;&#39; FS=&#39;#.*&#39;

 1	      joejoejoe 10 20 30 
 2	      OAK -999 10 
 3	      joeJOE 2000
 4	      oak 10
 5	      milk 1000 

答案4

得分: 0

$ awk '!/#/ || !/^ *#/ && gsub(/ *#.*$/,"")'  file 
          joejoejoe 10 20 30
          OAK -999 10
          joeJOE 2000
          oak 10
          milk 1000
英文:
$ awk &#39;!/#/ || !/^ *#/ &amp;&amp; gsub(/ *#.*$/,&quot;&quot;)&#39;  file 
      joejoejoe 10 20 30
      OAK -999 10
      joeJOE 2000
      oak 10
      milk 1000

答案5

得分: 0

如果您能选择GNU Awk来完成这个任务,它支持在RS(记录分隔符变量)中使用正则表达式,这是一种很好的方法:

$ echo 'line1 # comment
line2 # comment
# line3
line4 #comment
line5' | awk -v RS='(#[^\\n]*)?\\n' '{ printf("rec[%d] = %s\n", NR, $0) }'
rec[1] = line1
rec[2] = line2
rec[3] =
rec[4] = line4
rec[5] = line5

我们在记录分隔符中包含了行末注释,因此在GNU Awk将输入分隔成记录时,它会消失。

我们的记录分隔符是"可选的井号注释,后跟换行符",其中可选的井号注释是"井号,后跟任意数量的非换行字符"。

POSIX规定,如果给RS赋一个长度超过一个字符的值,其行为是未指定的;POSIX不支持在RS中使用正则表达式。

英文:

If you're able to choose GNU Awk for the task, which supports a regular expression in the RS (record separator variable), a nice way to do this is:

$ echo &#39;line1 # comment
line2 # comment
# line3
line4 #comment
line5&#39; | awk -v RS=&#39;(#[^\\n]*)?\\n&#39; &#39;{ printf(&quot;rec[%d] = %s\n&quot;, NR, $0) }&#39;
rec[1] = line1
rec[2] = line2
rec[3] =
rec[4] = line4
rec[5] = line5

We include the end-of-line comment in the record separator, so it disappears at the low level, when GNU Awk is delimiting the input into records.

Our record separator is "optional hash comment, followed by newline", where the optional hash comment is "hash mark, followed by any number of non-newline characters".

POSIX says that the behavior is unspecified if RS is given a value that is more than one character long; POSIX doesn't support regular expressions in RS.

答案6

得分: 0

#!/bin/awk -f

{
  for(i=1; i<=NF; i++)
  {
    if($i == "#")
      i = NF + 1
    print $i
  }
}

这段代码不会按预期工作,因为它会在每个字段后都输出换行符,您可以使用 printf 来避免这种情况。另外,由于您允许类似 #this is also a comment 的注释,所以 # 不必作为单独的字段出现,同时使用默认字段分隔符(一个或多个空白字符)。如果您想将每个字符作为字段,可以将 FPAT 内置变量设置为.。您还可以使用 break 来结束循环,而不必更改用于条件的变量。还需要忽略只包含注释的行,或者换句话说,只为不以空白字符和 # 开头的行打印内容。

应用这些更改后,您的代码将如下所示:

#!/usr/awk -f
BEGIN{FPAT="."}
!/^[[:space:]]*#/{
  for(i=1; i<=NF; i++)
  {
    if($i == "#")
      break
    printf "%s",$i
  }
  print ""
}

(在 GNU Awk 5.1.0 中测试通过)


<details>
<summary>英文:</summary>

    #! /bin/awk -f
    
    {
      for(i=1;i&lt;=NF;i++)
      {
        if($i == &quot;#&quot;)
          i = NF + 1
        print $i
      }
    }

This would not work as intended as you `print` each field, you would get newline after each field, you can use `printf` to avoid that, also as you allows comment like `#this is also a comment` then `#` does not to have be separate field whilst using default field separator of one-or-more white-space characters. If you want to have each character as field set `FPAT` built-in variable to `.`, you might also use `break` to end loop rather than tinkering with variable used in condition. You also need to ignore line where there is only comment - or in other words only print anything for lines which are not starting with white-space characters and `#`.

After applying these changes your code would become

    #!/bin/awk -f
    BEGIN{FPAT=&quot;.&quot;}
    !/^[[:space:]]*#/{
      for(i=1;i&lt;=NF;i++)
      {
        if($i == &quot;#&quot;)
          break
        printf &quot;%s&quot;,$i
      }
      print &quot;&quot;
    }

*(tested in GNU Awk 5.1.0)*

</details>



huangapple
  • 本文由 发表于 2023年6月1日 16:54:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76380205.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定