英文:
I need to sum all the values in a column across multiple files
问题
我有一个包含多个 CSV 文本文件的目录,每个文件都是单行格式为:
'field1,field2,field3,560'
我需要输出目录中所有文件第四个字段的总和(可能有数百或数千个文件)。举例来说:
file1.txt
field1,field2,field3,560
file2.txt
field1,field2,field3,415
file3.txt
field1,field2,field3,672
输出应该是:
1647
我尝试了几种不同的方法,最有希望的是我在这里找到的一个 awk 命令,是对另一个用户问题的回答。它并不能完全满足我的需求,而且我是 awk 的新手,不太确定如何修改才能让它适用于我的目的:
awk -F"," 'NR==FNR{a[NR]=$4;next}{print $4+a[FNR]}' file1.txt file2.txt
这条命令能正确输出 '975'。
然而,如果我尝试传递第三个文件,它不是将三个文件的第四个字段相加,而是先将文件1加到文件2,然后将文件1加到文件3:
awk -F"," 'NR==FNR{a[NR]=$4;next}{print $4+a[FNR]}' file1.txt file2.txt file3.txt
975
1232
是否有人能展示我如何修改这个 awk 语句以接受超过两个文件,或者更理想的是,因为有成千上万个文件需要求和,如何使用 * 来输出目录中所有文件第四个字段的总和?
谢谢你的时间和帮助。
英文:
I have a directory with multiple csv text files, each with a single line in the format:
field1,field2,field3,560
I need to output the sum of the fourth field across all files in a directory (can be hundreds or thousands of files). So for an example of:
file1.txt
field1,field2,field3,560
file2.txt
field1,field2,field3,415
file3.txt
field1,field2,field3,672
The output would simply be:
1647
I've been trying a few different things, with the most promising being an awk
command that I found here in response to another user's question. It doesn't quite do what I need it to do, and I am an awk newb so I'm unsure how to modify it to work for my purpose:
awk -F"," 'NR==FNR{a[NR]=$4;next}{print $4+a[FNR]:' file1.txt file2.txt
This correctly outputs 975
.
However if I try pass it a 3rd file, rather than add field 4 from all 3 files, it adds file1 to file2, then file1 to file3:
awk -F"," 'NR==FNR{a[NR]=$4;next}{print $4+a[FNR]:' file1.txt file2.txt file3.txt
975
1232
Can anyone show me how I can modify this awk statement to accept more than two files or, ideally because there are thousands of files to sum up, an * to output the sum of the fourth field of all files in the directory?
Thank you for your time and assistance.
答案1
得分: 2
这是翻译好的代码部分:
这里我们开始(在连接中不需要测试`NR==FNR`):
$ cat file{1,2,3}.txt | awk -F, '{count+=$4}END{print count}'
1647
或者同样的 🇹🇭(不浪费一些管道(s)):
$ awk -F, '{count+=$4}END{print count}' file{1,2,3}.txt
1647
英文:
Here we go (no need to test NR==FNR
in a concatenation):
$ cat file{1,2,3}.txt | awk -F, '{count+=$4}END{print count}'
1647
Or same-same 🇹🇭 (without wasting some pipe(s)):
$ awk -F, '{count+=$4}END{print count}' file{1,2,3}.txt
1647
答案2
得分: 2
关于当前代码的一些问题:
NR==FNR
用于指示对第一个文件进行特殊处理;在这种情况下,对于第一个文件没有什么需要特殊处理的(即,所有文件都要进行相同的处理)- 使用数组(例如,
a[NR]
)来维护一组值;在这种情况下,您只需要维护一个全局值,所以不需要数组
由于您只寻找一个全局总和,更简单的代码应该足够:
$ awk -F',' '{sum+=$4} END {print sum+0}' file{1..3}.txt
1647
注意:
- 在(不太可能的情况下?)所有文件都为空的情况下,
sum
将未定义,因此print sum
将显示一个空链接;sum+0
确保如果sum
保持未定义(即,所有文件都为空),我们会打印0
- 对于不定数量的文件,
file{1..3}.txt
可以替换为匹配所需文件集的任何模式,例如file*.txt
,*.txt
等。
英文:
A couple issues with the current code:
NR==FNR
is used to indicate special processing for the 1st file; in this case there is no processing that is 'special' for just the 1st file (ie, all files are to be processed the same)- an array (eg,
a[NR]
) is used to maintain a set of values; in this case you only have one global value to maintain so there is no need for an array
Since you're only looking for one global sum, a bit more simpler code should suffice:
$ awk -F',' '{sum+=$4} END {print sum+0}' file{1..3}.txt
1647
NOTES:
- in the (unlikely?) case all files are empty,
sum
will be undefined soprint sum
will display a blank link;sum+0
insures we print0
ifsum
remains undefined (ie, all files are empty) - for a variable number of files
file{1..3}.txt
can be replaced with whatever pattern will match on the desired set of files, eg,file*.txt
,*.txt
, etc
答案3
得分: 0
$ perl -MList::Util=sum0 -F, -lane'push @a,$F[3];END{print sum0 @a}' file{1..3}.txt
1647
$ perl -F, -lane'push @a,$F[3];END{foreach(@a){ $sum +=$_ };print "$sum"}' file{1..3}.txt
1647
英文:
$ perl -MList::Util=sum0 -F, -lane'push @a,$F[3];END{print sum0 @a}' file{1..3}.txt
1647
$ perl -F, -lane'push @a,$F[3];END{foreach(@a){ $sum +=$_ };print "$sum"}' file{1..3}.txt
1647
答案4
得分: 0
$ cut -d, -f4 file{1..3}.txt | paste -sd+ - | bc
1647
英文:
$ cut -d, -f4 file{1..3}.txt | paste -sd+ - | bc
1647
答案5
得分: 0
代码部分已被排除,以下是您要翻译的内容:
"The paste method is only viable for smaller inputs. When row count gets somewhat larger, like the example below of 34,999,999
, compared to awk
,
gnu-paste
is ~2.83x slowerbsd-paste
is ~3.72x slower
** adding perl
for completeness ::: perl 5.36.0
is ~1.54x slower
perl -MList::Util=sum0 -F, -lane'push @a,$_;END{print sum0 @a}'
__='34999999'
( time ( jot "$__" | pvE0 | mawk2 'END { print +_ }{ _+=$__ }' ) )
sleep 1
( time ( jot "$__" | pvE0 | LC_ALL=C paste -sd+ - | bc ) )
sleep 1
( time ( jot "$__" | pvE0 | LC_ALL=C gpaste -sd+ - | bc ) )
|"
"在处理较小的输入时,paste
方法才是可行的。当行数变得相对较大,比如下面的示例中的 34,999,999
,与 awk
相比,
gnu-paste
较慢,大约2.83倍的速度bsd-paste
较慢,大约3.72倍的速度
** 为完整性添加 perl
::: perl 5.36.0
较慢,大约1.54倍的速度
perl -MList::Util=sum0 -F, -lane'push @a,$_;END{print sum0 @a}'
__='34999999'
( time ( jot "$__" | pvE0 | mawk2 'END { print +_ }{ _+=$__ }' ) )
等待 1 秒
( time ( jot "$__" | pvE0 | LC_ALL=C paste -sd+ - | bc ) )
等待 1 秒
( time ( jot "$__" | pvE0 | LC_ALL=C gpaste -sd+ - | bc ) )
|"
英文:
The paste method is only viable for smaller inputs. When row count gets somewhat larger, like the example below of 34,999,999
, compared to awk
,
gnu-paste
is ~2.83x slowerbsd-paste
is ~3.72x slower
** adding perl
for completeness ::: perl 5.36.0
is ~1.54x slower
perl -MList::Util=sum0 -F, -lane'push @a,$_;END{print sum0 @a}'
__='34999999'
( time ( jot "$__" | pvE0 | mawk2 'END { print +_ }{ _+=$__ }' ) )
sleep 1
( time ( jot "$__" | pvE0 | LC_ALL=C paste -sd+ - | bc ) )
sleep 1
( time ( jot "$__" | pvE0 | LC_ALL=C gpaste -sd+ - | bc ) )
|
in0: 289MiB 0:00:06 [47.9MiB/s] [47.9MiB/s] [<=> ]
1 612499982500000
( jot "$__" | pvE 0.1 in0 | mawk2 'END { print +_ }{ _+=$__ }'; )
8.51s user 0.22s system 143% cpu 6.072 total
in0: 289MiB 0:00:11 [25.5MiB/s] [25.5MiB/s] [ <=> ]
1 612499982500000
( jot "$__" | pvE 0.1 in0 | LC_ALL=C paste -sd+ - | bc; )
27.87s user 1.15s system 128% cpu 22.594 total
in0: 289MiB 0:00:06 [48.3MiB/s] [48.3MiB/s] [ <=> ]
1 612499982500000
( jot "$__" | pvE 0.1 in0 | LC_ALL=C gpaste -sd+ - | bc; )
17.14s user 1.10s system 106% cpu 17.185 total
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论