英文:
AWK - Recompiling $0 where the record is just spaces results in unexpected behavior
问题
AWK的默认"FS"变量是一个单个空格" "。
人们通常使用以下常见技巧来移除记录中的不必要空格,强制重新编译记录:
{ $1 = $1 }
这确实可以修剪空格。
然而,如果你有一行完全由空格组成,AWK会将其修剪为空(这很奇怪,因为你可能期望最少会保留一个空格),并且由于某种原因,AWK现在会报告一个字段存在,即使实际上什么都没有。
输入文件:
hello
charles
<--- 这是一系列空格
one
two
three
AWK脚本:
#!/usr/bin/awk -f
{ $1 = $1; print }
输出:
hello
charles
one
two
three
这看起来是正确的,直到你让AWK使用"NF"变量报告字段计数:
新的AWK脚本:
#!/usr/bin/awk -f
{ $1 = $1; print NF }
输出:
1
1
1
1
1
1
这个字段从哪里来?当我通过cat实用程序运行输出以检查换行符时,它报告没有内容:
dev@pop-os:~/Scripts/awk$ ./space_test.awk spaces.txt | cat -e
hello$
charles$
$
one$
two$
three$
如你所见,什么都没有。
更奇怪的是,AWK还报告这个记录的长度为零:
修改后的AWK脚本:
#!/usr/bin/awk -f
{ $1 = $1; print length($0) }
输出:
5
7
0
3
3
5
这是怎么回事?
英文:
AWK's default "FS" variable is a single space " "
A common trick people use to remove unnecessary whitespace in a record is to force a recompilation of the record using:
{ $1 = $1 }
Which indeed trims whitespace.
However, if you have a line which is comprised ENTIRELY of whitespace AWK will trim it down to nothing (Which is odd as you'd expect a single ' ' to be left) and for some reason AWK will now report that a field now exists, when there's literally nothing there.
Input File:
hello
charles
<--- This is a series of spaces
one
two
three
AWK Script:
#!/usr/bin/awk -f
{ $1 = $1; print }
Output:
hello
charles
one
two
three
Now this looks correct, until you get AWK to report on it's field count using the "NF" variable:
New AWK Script:
#!/usr/bin/awk -f
{ $1 = $1; print NF }
Output:
1
1
1
1
1
1
Where the heck is this field coming from? When I run the output through the cat utility to check for line endings it reports nothing there:
dev@pop-os:~/Scripts/awk$ ./space_test.awk spaces.txt | cat -e
hello$
charles$
$
one$
two$
three$
As you can see, nothing is there.
What's extra odd is that AWK also reports that the length of this record is zero:
Modified AWK Script:
#!/usr/bin/awk -f
{ $1 = $1; print length($0) }
Output:
5
7
0
3
3
5
What is going on here?
答案1
得分: 3
当您分配给字段号时,该字段及其之前的所有字段都将被创建。如果字段号大于原始字段数,这将增加 NF
。
不管这些字段是否为空,它们仍然会被计数。
$ awk '{print NF; $2 = ""; print NF}' <<< ""
0
2
英文:
When you assign to a field number, that field and all the fields before it are created. This will increase NF
if it's more than the original number of fields.
It doesn't matter that the fields are empty, they're still counted.
$ awk '{print NF; $2 = ""; print NF}' <<< ""
0
2
答案2
得分: 0
AWK的默认“FS”变量是一个单个空格“ ”。
是的,但这会提示GNU AWK
将一个或多个空格字符视为字段分隔符,如果你有以三个空格分隔的file.txt
文件:
1 2 3
4 5 6
7 8 9
那么
awk '{print NF,$3}' file.txt
会产生以下输出:
3 3
3 6
3 9
因此,$1=$1
不仅修剪前导和尾随的空格字符,还会将所有空格字符更改为空格,并将多个空格字符压缩在一起。考虑一个以制表符分隔的文件file.tsv
,例如:
1 2 3
4 5 6
7 8 9
那么
awk '{$1=$1;print}' file.tsv
会产生以下输出:
1 2 3
4 5 6
7 8 9
请注意输出中没有制表符字符。
英文:
> AWK's default "FS" variable is a single space " "
Yes, but this prompt GNU AWK
to consider one-or-more whitespace characters to be field separator, consider that if you have file.txt
with fields delimitied by tripled spaces
1 2 3
4 5 6
7 8 9
then
awk '{print NF,$3}' file.txt
gives output
3 3
3 6
3 9
therefore $1=$1
does not only trim leading and trailing whitespace characters but also will change all whitespace characters with space and then squeeze multiple spaces together, consider that for example if you have TAB-separated file file.tsv
like
1 2 3
4 5 6
7 8 9
then
awk '{$1=$1;print}' file.tsv
gives output
1 2 3
4 5 6
7 8 9
Observe that there is not TAB character in output.
答案3
得分: 0
根据
当您分配给字段号时,该字段及其之前的所有字段都会被创建...
如@Bramar所提到的,使用awk
的一种较少讨论的方法是利用这些空字段,轻松重复内容:
seq 17 | mawk '(OFS = ((__=$_) + 9) % 10)^!(NF += __)' FS='^$'
10
211
3222
43333
544444
6555555
76666666
877777777
9888888888
109999999999
1100000000000
12111111111111
132222222222222
1433333333333333
15444444444444444
165555555555555555
1766666666666666666
或者只是重复表情符号:
jot 17 | mawk '(OFS = "\360\237\244\241")^!(NF += $!_)' FS='^$'
1🤁
2🤁🤁
3🤁🤁🤁
4🤁🤁🤁🤁
5🤁🤁🤁🤁🤁
6🤁🤁🤁🤁🤁🤁
7🤁🤁🤁🤁🤁🤁🤁
英文:
as a consequence of
> When you assign to a field number, that field and all the fields before it are created...
as mentioned by @Bramar, one of the less-discussed ways to use awk
is leveraging those empty fields and repeat stuff with ease :
> seq 17 | mawk '(OFS = ((__=$_) + 9) % 10)^!(NF += __)' FS='^$'
10
211
3222
43333
544444
6555555
76666666
877777777
9888888888
109999999999
1100000000000
12111111111111
132222222222222
1433333333333333
15444444444444444
165555555555555555
1766666666666666666
or just repeat emojis :
> jot 17 | mawk '(OFS = "\360\237\244\241")^!(NF += $!_)' FS='^$'
1🤡
2🤡🤡
3🤡🤡🤡
4🤡🤡🤡🤡
5🤡🤡🤡🤡🤡
6🤡🤡🤡🤡🤡🤡
7🤡🤡🤡🤡🤡🤡🤡
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论