AWK – 重新编译 $0,其中记录只是空格,会导致意外行为。

huangapple go评论154阅读模式
英文:

AWK - Recompiling $0 where the record is just spaces results in unexpected behavior

问题

AWK的默认"FS"变量是一个单个空格" "。

人们通常使用以下常见技巧来移除记录中的不必要空格,强制重新编译记录:

{ $1 = $1 }

这确实可以修剪空格。

然而,如果你有一行完全由空格组成,AWK会将其修剪为空(这很奇怪,因为你可能期望最少会保留一个空格),并且由于某种原因,AWK现在会报告一个字段存在,即使实际上什么都没有。

输入文件:

    hello
    charles
           <--- 这是一系列空格
one
two
 three

AWK脚本:

#!/usr/bin/awk -f

{ $1 = $1; print }

输出:

hello
charles

one
two
three

这看起来是正确的,直到你让AWK使用"NF"变量报告字段计数:

新的AWK脚本:

#!/usr/bin/awk -f

{ $1 = $1; print NF }

输出:

1
1
1
1
1
1

这个字段从哪里来?当我通过cat实用程序运行输出以检查换行符时,它报告没有内容:

dev@pop-os:~/Scripts/awk$ ./space_test.awk spaces.txt | cat -e
hello$
charles$
$
one$
two$
three$

如你所见,什么都没有。

更奇怪的是,AWK还报告这个记录的长度为零:

修改后的AWK脚本:

#!/usr/bin/awk -f

{ $1 = $1; print length($0) }

输出:

5
7
0
3
3
5

这是怎么回事?

英文:

AWK's default "FS" variable is a single space " "
A common trick people use to remove unnecessary whitespace in a record is to force a recompilation of the record using:

{ $1 = $1 }

Which indeed trims whitespace.

However, if you have a line which is comprised ENTIRELY of whitespace AWK will trim it down to nothing (Which is odd as you'd expect a single ' ' to be left) and for some reason AWK will now report that a field now exists, when there's literally nothing there.

Input File:

    hello
    charles
           <--- This is a series of spaces
one
two
 three

AWK Script:

#!/usr/bin/awk -f

{ $1 = $1; print }

Output:

hello
charles

one
two
three

Now this looks correct, until you get AWK to report on it's field count using the "NF" variable:

New AWK Script:

#!/usr/bin/awk -f

{ $1 = $1; print NF }

Output:

1
1
1
1
1
1

Where the heck is this field coming from? When I run the output through the cat utility to check for line endings it reports nothing there:

dev@pop-os:~/Scripts/awk$ ./space_test.awk spaces.txt | cat -e
hello$
charles$
$
one$
two$
three$

As you can see, nothing is there.

What's extra odd is that AWK also reports that the length of this record is zero:

Modified AWK Script:


#!/usr/bin/awk -f

{ $1 = $1; print length($0) }

Output:

5
7
0
3
3
5

What is going on here?

答案1

得分: 3

当您分配给字段号时,该字段及其之前的所有字段都将被创建。如果字段号大于原始字段数,这将增加 NF

不管这些字段是否为空,它们仍然会被计数。

$ awk '{print NF; $2 = ""; print NF}' <<< ""
0
2
英文:

When you assign to a field number, that field and all the fields before it are created. This will increase NF if it's more than the original number of fields.

It doesn't matter that the fields are empty, they're still counted.

$ awk &#39;{print NF; $2 = &quot;&quot;; print NF}&#39; &lt;&lt;&lt; &quot;&quot;
0
2

答案2

得分: 0

AWK的默认“FS”变量是一个单个空格“ ”。

是的,但这会提示GNU AWK将一个或多个空格字符视为字段分隔符,如果你有以三个空格分隔的file.txt文件:

1   2   3
4   5   6
7   8   9

那么

awk '{print NF,$3}' file.txt

会产生以下输出:

3 3
3 6
3 9

因此,$1=$1不仅修剪前导和尾随的空格字符,还会将所有空格字符更改为空格,并将多个空格字符压缩在一起。考虑一个以制表符分隔的文件file.tsv,例如:

1	2	3
4	5	6
7	8	9

那么

awk '{$1=$1;print}' file.tsv

会产生以下输出:

1 2 3
4 5 6
7 8 9

请注意输出中没有制表符字符。

英文:

> AWK's default "FS" variable is a single space " "

Yes, but this prompt GNU AWK to consider one-or-more whitespace characters to be field separator, consider that if you have file.txt with fields delimitied by tripled spaces

1   2   3
4   5   6
7   8   9

then

awk &#39;{print NF,$3}&#39; file.txt

gives output

3 3
3 6
3 9

therefore $1=$1 does not only trim leading and trailing whitespace characters but also will change all whitespace characters with space and then squeeze multiple spaces together, consider that for example if you have TAB-separated file file.tsv like

1	2	3
4	5	6
7	8	9

then

awk &#39;{$1=$1;print}&#39; file.tsv

gives output

1 2 3
4 5 6
7 8 9

Observe that there is not TAB character in output.

答案3

得分: 0

根据

当您分配给字段号时,该字段及其之前的所有字段都会被创建...

如@Bramar所提到的,使用awk的一种较少讨论的方法是利用这些空字段,轻松重复内容:

seq 17 | mawk '(OFS = ((__=$_) + 9) % 10)^!(NF += __)' FS='^$'

10
211
3222
43333
544444
6555555
76666666
877777777
9888888888
109999999999
1100000000000
12111111111111
132222222222222
1433333333333333
15444444444444444
165555555555555555
1766666666666666666

或者只是重复表情符号:

jot 17 | mawk '(OFS = "\360\237\244\241")^!(NF += $!_)' FS='^$'

1🤁
2🤁🤁
3🤁🤁🤁
4🤁🤁🤁🤁
5🤁🤁🤁🤁🤁
6🤁🤁🤁🤁🤁🤁
7🤁🤁🤁🤁🤁🤁🤁

英文:

as a consequence of

> When you assign to a field number, that field and all the fields before it are created...

as mentioned by @Bramar, one of the less-discussed ways to use awk is leveraging those empty fields and repeat stuff with ease :

> seq 17 | mawk '(OFS = ((__=$_) + 9) % 10)^!(NF += __)' FS='^$'

10
211
3222
43333
544444
6555555
76666666
877777777
9888888888
109999999999
1100000000000
12111111111111
132222222222222
1433333333333333
15444444444444444
165555555555555555
1766666666666666666

or just repeat emojis :

> jot 17 | mawk '(OFS = "\360\237\244\241")^!(NF += $!_)' FS='^$'

1&#129313;
2&#129313;&#129313;
3&#129313;&#129313;&#129313;
4&#129313;&#129313;&#129313;&#129313;
5&#129313;&#129313;&#129313;&#129313;&#129313;
6&#129313;&#129313;&#129313;&#129313;&#129313;&#129313;
7&#129313;&#129313;&#129313;&#129313;&#129313;&#129313;&#129313;

huangapple
  • 本文由 发表于 2023年3月8日 16:41:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/75670902.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定