英文:
How to validate whether all csv files have the same first line?
问题
我有一个包含许多 CSV 文件的目录。我想要一个 Shell 脚本来检查每个文件的第一行是否相同。例如,这些文件有相同的标题,所以检查应该返回 True。
我认为这应该是一个已经被问过的问题,但我在 Stackoverflow 上没有看到过。
这个问题类似于检查一个文件中是否所有行都相同的问题:https://unix.stackexchange.com/questions/533915/check-if-all-lines-in-a-file-are-same。
我尝试过的方法:
echo "$(ls -AU | head -1)"
可以获取我一个文件的第一行。- 我考虑尝试断言所有文件都有这个值作为第一行(最好是一个简洁的管道而不是
for
循环),但我无法弄清楚如何做到这一点。 - 我尝试使用这里的答案https://unix.stackexchange.com/a/533917,它使用
uniq
和wc
,但这两个命令似乎是针对单个文件的行进行迭代的(而不是迭代一个通用列表的输出)。
英文:
I have a directory containing many csv files. I want a shell script to check whether the first line of each file is the same. For example, these files have the same header, so the check should return True.
❯ cat file1.csv
column1,column2,column3
3,1,3
4,3,9
❯ cat file2.csv
column1,column2,column3
5,4,1
1,8,2
I thought this would be an already asked question, but I haven't seen it on Stackoverflow.
This question is similar, which checks whether all lines in one file are the same: https://unix.stackexchange.com/questions/533915/check-if-all-lines-in-a-file-are-same.
What I've tried:
echo "$(ls -AU | head -1)"
gets me the first line of one file- I thought about trying to assert that all files have this value as a first line (preferably as a concise pipe rather than
for
loop), but couldn't figure out how to do this - I've tried to use the answer here https://unix.stackexchange.com/a/533917, which uses
uniq
andwc
, but both commands seem specific to iterating through lines of a single file (rather than iterating over a generic list output)
答案1
得分: 6
使用GNU的sed
和bash
:
rows=$(sed -s '1!d' *.csv | sort -u | wc -l)
if [[ "$rows" -eq 1 ]]; then echo "true"; else echo "false"; fi
sed -s '1!d' *.csv
会将所有当前目录中具有 .csv
后缀的文件的第一行输出到标准输出。
英文:
With GNU sed
and bash
:
rows=$(sed -s '1!d' *.csv | sort -u | wc -l)
if [[ "$rows" -eq 1 ]]; then echo "true"; else echo "false"; fi
sed -s '1!d' *.csv
outputs to stdout first line of all files in current directory with suffix .csv
.
答案2
得分: 4
一种 awk
的想法:
```awk{ headers[$0]; nextfile } # 使用第一行作为数组索引;跳到下一个文件 END { if ( length(headers)==1 ) # 如果数组只有一个条目,则 ... print "true" # 所有文件都有相同的标题行 else # 否则 ... print "false" # 存在不止一个唯一的标题 }
*.csv
<details>
<summary>英文:</summary>
One `awk` idea:
awk '
{ headers[$0]; nextfile } # use 1st line as array index; skip to next file
END { if ( length(headers)==1 ) # if array only has one entry then ...
print "true" # all files have the same header line
else # else ...
print "false" # there is more than one unique header
}
' *.csv
</details>
# 答案3
**得分**: 2
==> file1.csv <==
column1,column2,column3
3,1,3
==> file2.csv <==
column1,column2,column3
5,4,1
==> file3.csv <==
column_diff,column2,column3
5,4,1
==> file4.csv <==
column1,column2,column3
5,4,1
你可以使用以下的Ruby代码来确定哪个文件不同:
```ruby
ruby -e 'keys=ARGV.each_with_object(Hash.new {|h,k| h[k] = []}){|fn,h|
h[File.open(fn).readline.chomp]<<fn
}
keys.each{|k,v| puts "#{k}\n\t#{v.join("\n\t")}"}
' *.csv
输出:
column1,column2,column3
file1.csv
file2.csv
file4.csv
column_diff,column2,column3
file3.csv
如果你想从Bash中测试结果:
ruby -e 'keys=ARGV.each_with_object(Hash.new {|h,k| h[k] = []}){|fn,h|
h[File.open(fn).readline.chomp]<<fn
}
if keys.length>1 then
keys.each{|k,v| puts "#{k}\n\t#{v.join("\n\t")}"}
exit(false)
else
puts "All files equal"
end
' *.csv
然后你可以测试退出代码。
英文:
It is helpful to know WHICH file is different.
Given:
head -n 2 *.csv
==> file1.csv <==
column1,column2,column3
3,1,3
==> file2.csv <==
column1,column2,column3
5,4,1
==> file3.csv <==
column_diff,column2,column3
5,4,1
==> file4.csv <==
column1,column2,column3
5,4,1
You can use this Ruby to determine the offender:
ruby -e 'keys=ARGV.each_with_object(Hash.new {|h,k| h[k] = []}){|fn,h|
h[File.open(fn).readline.chomp]<<fn
}
keys.each{|k,v| puts "#{k}\n\t#{v.join("\n\t")}"}
' *.csv
Prints:
column1,column2,column3
file1.csv
file2.csv
file4.csv
column_diff,column2,column3
file3.csv
If you want to be able to test from Bash the result:
ruby -e 'keys=ARGV.each_with_object(Hash.new {|h,k| h[k] = []}){|fn,h|
h[File.open(fn).readline.chomp]<<fn
}
if keys.length>1 then
keys.each{|k,v| puts "#{k}\n\t#{v.join("\n\t")}"}
exit(false)
else
puts "All files equal"
end
' *.csv
Then just test the exit code.
答案4
得分: 1
使用GNU head
(使用-q
选项),这个一行命令应该可以完成任务:
if [ -z "$(head -q -n1 *.csv | uniq -u)" ]; then echo true; else echo false; fi
GNU head
的-q
选项将抑制文件名的打印。uniq -u
只有在所有行都相同时才不会输出任何内容。sort
对于这个任务来说是不必要的。
英文:
Using GNU head
(for the -q
option), this one-liner should do the trick:
if [ -z "$(head -q -n1 *.csv | uniq -u)" ]; then echo true; else echo false; fi
The -q
option of GNU head
will suppress printing of filenames. uniq -u
will give no output only if all lines are the same. sort
isn't necessary for this task.
答案5
得分: 0
Using any awk (untested):
FNR == 1 {
if ( $0 != prev ) {
status = 1
exit
}
prev = $0
nextfile
}
END {
print ( status ? "false" : "true" )
exit status
}
' *.csv
It'll run faster if your awk supports nextfile
but it'll work either way.
I'm assuming above that you want your script both print "true"/"false" and to exit with a success/fail status.
英文:
Using any awk (untested):
awk '
FNR == 1 {
if ( $0 != prev ) {
status = 1
exit
}
prev = $0
nextfile
}
END {
print ( status ? "false" : "true" )
exit status
}
' *.csv
It'll run faster if your awk supports nextfile
but it'll work either way.
I'm assuming above that you want your script both print "true"/"false" and to exit with a success/fail status.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论