英文:
I need to sum all the values in a column across multiple files
问题
我有一个包含多个 CSV 文本文件的目录,每个文件都是单行格式为:
'field1,field2,field3,560'
我需要输出目录中所有文件第四个字段的总和(可能有数百或数千个文件)。举例来说:
file1.txt
field1,field2,field3,560
file2.txt
field1,field2,field3,415
file3.txt
field1,field2,field3,672
输出应该是:
1647
我尝试了几种不同的方法,最有希望的是我在这里找到的一个 awk 命令,是对另一个用户问题的回答。它并不能完全满足我的需求,而且我是 awk 的新手,不太确定如何修改才能让它适用于我的目的:
awk -F"," 'NR==FNR{a[NR]=$4;next}{print $4+a[FNR]}' file1.txt file2.txt
这条命令能正确输出 '975'。
然而,如果我尝试传递第三个文件,它不是将三个文件的第四个字段相加,而是先将文件1加到文件2,然后将文件1加到文件3:
awk -F"," 'NR==FNR{a[NR]=$4;next}{print $4+a[FNR]}' file1.txt file2.txt file3.txt
975
1232
是否有人能展示我如何修改这个 awk 语句以接受超过两个文件,或者更理想的是,因为有成千上万个文件需要求和,如何使用 * 来输出目录中所有文件第四个字段的总和?
谢谢你的时间和帮助。
英文:
I have a directory with multiple csv text files, each with a single line in the format:
field1,field2,field3,560
I need to output the sum of the fourth field across all files in a directory (can be hundreds or thousands of files). So for an example of:
file1.txt
field1,field2,field3,560
file2.txt
field1,field2,field3,415
file3.txt
field1,field2,field3,672
The output would simply be:
1647
I've been trying a few different things, with the most promising being an awk command that I found here in response to another user's question.  It doesn't quite do what I need it to do, and I am an awk newb so I'm unsure how to modify it to work for my purpose:
awk -F"," 'NR==FNR{a[NR]=$4;next}{print $4+a[FNR]:' file1.txt file2.txt
This correctly outputs 975.
However if I try pass it a 3rd file, rather than add field 4 from all 3 files, it adds file1 to file2, then file1 to file3:
awk -F"," 'NR==FNR{a[NR]=$4;next}{print $4+a[FNR]:' file1.txt file2.txt file3.txt
975
1232
Can anyone show me how I can modify this awk statement to accept more than two files or, ideally because there are thousands of files to sum up, an * to output the sum of the fourth field of all files in the directory?
Thank you for your time and assistance.
答案1
得分: 2
这是翻译好的代码部分:
这里我们开始(在连接中不需要测试`NR==FNR`):
$ cat file{1,2,3}.txt | awk -F, '{count+=$4}END{print count}'
1647
或者同样的 🇹🇭(不浪费一些管道(s)):
$ awk -F, '{count+=$4}END{print count}' file{1,2,3}.txt
1647
英文:
Here we go (no need to test NR==FNR in a concatenation):
$ cat file{1,2,3}.txt | awk -F, '{count+=$4}END{print count}' 
1647
Or same-same 🇹🇭 (without wasting some pipe(s)):
$ awk -F, '{count+=$4}END{print count}' file{1,2,3}.txt
1647
答案2
得分: 2
关于当前代码的一些问题:
NR==FNR用于指示对第一个文件进行特殊处理;在这种情况下,对于第一个文件没有什么需要特殊处理的(即,所有文件都要进行相同的处理)- 使用数组(例如,
a[NR])来维护一组值;在这种情况下,您只需要维护一个全局值,所以不需要数组 
由于您只寻找一个全局总和,更简单的代码应该足够:
$ awk -F',' '{sum+=$4} END {print sum+0}' file{1..3}.txt
1647
注意:
- 在(不太可能的情况下?)所有文件都为空的情况下,
sum将未定义,因此print sum将显示一个空链接;sum+0确保如果sum保持未定义(即,所有文件都为空),我们会打印0 - 对于不定数量的文件,
file{1..3}.txt可以替换为匹配所需文件集的任何模式,例如file*.txt,*.txt等。 
英文:
A couple issues with the current code:
NR==FNRis used to indicate special processing for the 1st file; in this case there is no processing that is 'special' for just the 1st file (ie, all files are to be processed the same)- an array (eg, 
a[NR]) is used to maintain a set of values; in this case you only have one global value to maintain so there is no need for an array 
Since you're only looking for one global sum, a bit more simpler code should suffice:
$ awk -F',' '{sum+=$4} END {print sum+0}' file{1..3}.txt
1647
NOTES:
- in the (unlikely?) case all files are empty, 
sumwill be undefined soprint sumwill display a blank link;sum+0insures we print0ifsumremains undefined (ie, all files are empty) - for a variable number of files 
file{1..3}.txtcan be replaced with whatever pattern will match on the desired set of files, eg,file*.txt,*.txt, etc 
答案3
得分: 0
$ perl -MList::Util=sum0 -F, -lane'push @a,$F[3];END{print sum0 @a}' file{1..3}.txt
1647
$ perl -F, -lane'push @a,$F[3];END{foreach(@a){ $sum +=$_ };print "$sum"}' file{1..3}.txt
1647
英文:
$ perl -MList::Util=sum0 -F, -lane'push @a,$F[3];END{print sum0 @a}' file{1..3}.txt
1647
$ perl -F, -lane'push @a,$F[3];END{foreach(@a){ $sum +=$_ };print "$sum"}' file{1..3}.txt
1647
答案4
得分: 0
$ cut -d, -f4 file{1..3}.txt | paste -sd+ - | bc
1647
英文:
$ cut -d, -f4 file{1..3}.txt | paste -sd+ - | bc
1647
答案5
得分: 0
代码部分已被排除,以下是您要翻译的内容:
"The paste method is only viable for smaller inputs. When row count gets somewhat larger, like the example below of 34,999,999, compared to awk,
gnu-pasteis ~2.83x slowerbsd-pasteis ~3.72x slower
** adding perl for completeness :::  perl 5.36.0 is ~1.54x slower
perl -MList::Util=sum0 -F, -lane'push @a,$_;END{print sum0 @a}'
__='34999999'
( time ( jot "$__" | pvE0 | mawk2 'END { print +_ }{ _+=$__ }' ) )
sleep 1
( time ( jot "$__" | pvE0 | LC_ALL=C paste -sd+ - | bc ) )
sleep 1
( time ( jot "$__" | pvE0 | LC_ALL=C gpaste -sd+ - | bc ) )
|"
"在处理较小的输入时,paste 方法才是可行的。当行数变得相对较大,比如下面的示例中的 34,999,999,与 awk 相比,
gnu-paste较慢,大约2.83倍的速度bsd-paste较慢,大约3.72倍的速度
** 为完整性添加 perl :::  perl 5.36.0 较慢,大约1.54倍的速度
perl -MList::Util=sum0 -F, -lane'push @a,$_;END{print sum0 @a}'
__='34999999'
( time ( jot "$__" | pvE0 | mawk2 'END { print +_ }{ _+=$__ }' ) )
等待 1 秒
( time ( jot "$__" | pvE0 | LC_ALL=C paste -sd+ - | bc ) )
等待 1 秒
( time ( jot "$__" | pvE0 | LC_ALL=C gpaste -sd+ - | bc ) )
|"
英文:
The paste method is only viable for smaller inputs. When row count gets somewhat larger, like the example below of 34,999,999, compared to awk,
gnu-pasteis ~2.83x slowerbsd-pasteis ~3.72x slower
** adding perl for completeness :::  perl 5.36.0 is ~1.54x slower
perl -MList::Util=sum0 -F, -lane'push @a,$_;END{print sum0 @a}'
__='34999999'
( time ( jot "$__" | pvE0 | mawk2 'END { print +_ }{ _+=$__ }' ) )
sleep 1
( time ( jot "$__" | pvE0 | LC_ALL=C paste -sd+ - | bc ) )
sleep 1
( time ( jot "$__" | pvE0 | LC_ALL=C gpaste -sd+ - | bc ) )
|
      in0:  289MiB 0:00:06 [47.9MiB/s] [47.9MiB/s] [<=> ]
     1	612499982500000
( jot "$__" | pvE 0.1 in0 | mawk2 'END { print +_ }{ _+=$__ }'; )
  8.51s user 0.22s system 143% cpu 6.072 total
      in0:  289MiB 0:00:11 [25.5MiB/s] [25.5MiB/s] [ <=> ]
     1	612499982500000
( jot "$__" | pvE 0.1 in0 | LC_ALL=C paste -sd+ - | bc; ) 
27.87s user 1.15s system 128% cpu 22.594 total
      in0:  289MiB 0:00:06 [48.3MiB/s] [48.3MiB/s] [ <=> ]
     1	612499982500000
( jot "$__" | pvE 0.1 in0 | LC_ALL=C gpaste -sd+ - | bc; )
17.14s user 1.10s system 106% cpu 17.185 total
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论