如何验证所有CSV文件的第一行是否相同?

huangapple go评论49阅读模式
英文:

How to validate whether all csv files have the same first line?

问题

我有一个包含许多 CSV 文件的目录。我想要一个 Shell 脚本来检查每个文件的第一行是否相同。例如,这些文件有相同的标题,所以检查应该返回 True。

我认为这应该是一个已经被问过的问题,但我在 Stackoverflow 上没有看到过。

这个问题类似于检查一个文件中是否所有行都相同的问题:https://unix.stackexchange.com/questions/533915/check-if-all-lines-in-a-file-are-same。

我尝试过的方法:

  • echo "$(ls -AU | head -1)" 可以获取我一个文件的第一行。
  • 我考虑尝试断言所有文件都有这个值作为第一行(最好是一个简洁的管道而不是 for 循环),但我无法弄清楚如何做到这一点。
  • 我尝试使用这里的答案https://unix.stackexchange.com/a/533917,它使用uniqwc,但这两个命令似乎是针对单个文件的行进行迭代的(而不是迭代一个通用列表的输出)。
英文:

I have a directory containing many csv files. I want a shell script to check whether the first line of each file is the same. For example, these files have the same header, so the check should return True.

❯ cat file1.csv
column1,column2,column3
3,1,3
4,3,9
❯ cat file2.csv
column1,column2,column3
5,4,1
1,8,2

I thought this would be an already asked question, but I haven't seen it on Stackoverflow.

This question is similar, which checks whether all lines in one file are the same: https://unix.stackexchange.com/questions/533915/check-if-all-lines-in-a-file-are-same.

What I've tried:

  • echo "$(ls -AU | head -1)" gets me the first line of one file
  • I thought about trying to assert that all files have this value as a first line (preferably as a concise pipe rather than for loop), but couldn't figure out how to do this
  • I've tried to use the answer here https://unix.stackexchange.com/a/533917, which uses uniq and wc, but both commands seem specific to iterating through lines of a single file (rather than iterating over a generic list output)

答案1

得分: 6

使用GNU的sedbash

rows=$(sed -s '1!d' *.csv | sort -u | wc -l)
if [[ "$rows" -eq 1 ]]; then echo "true"; else echo "false"; fi

sed -s '1!d' *.csv 会将所有当前目录中具有 .csv 后缀的文件的第一行输出到标准输出。

英文:

With GNU sed and bash:

rows=$(sed -s '1!d' *.csv | sort -u | wc -l)
if [[ "$rows" -eq 1 ]]; then echo "true"; else echo "false"; fi

sed -s '1!d' *.csv outputs to stdout first line of all files in current directory with suffix .csv.

答案2

得分: 4

一种 awk 的想法:

```awk{ headers[$0]; nextfile } # 使用第一行作为数组索引;跳到下一个文件 END { if ( length(headers)==1 ) # 如果数组只有一个条目,则 ... print "true" # 所有文件都有相同的标题行 else # 否则 ... print "false" # 存在不止一个唯一的标题 } *.csv


<details>
<summary>英文:</summary>

One `awk` idea:

    awk &#39;
        { headers[$0]; nextfile }             # use 1st line as array index; skip to next file
    END { if ( length(headers)==1 )           # if array only has one entry then ...
             print &quot;true&quot;                     # all files have the same header line
          else                                # else ...
             print &quot;false&quot;                    # there is more than one unique header
        }
    &#39; *.csv



</details>



# 答案3
**得分**: 2

==&gt; file1.csv &lt;==
column1,column2,column3
3,1,3

==&gt; file2.csv &lt;==
column1,column2,column3
5,4,1

==&gt; file3.csv &lt;==
column_diff,column2,column3
5,4,1

==&gt; file4.csv &lt;==
column1,column2,column3
5,4,1

你可以使用以下的Ruby代码来确定哪个文件不同:

```ruby
ruby -e 'keys=ARGV.each_with_object(Hash.new {|h,k| h[k] = []}){|fn,h| 
	h[File.open(fn).readline.chomp]<<fn
}
keys.each{|k,v| puts "#{k}\n\t#{v.join("\n\t")}"}
' *.csv

输出:

column1,column2,column3
	file1.csv
	file2.csv
	file4.csv
column_diff,column2,column3
	file3.csv

如果你想从Bash中测试结果:

ruby -e 'keys=ARGV.each_with_object(Hash.new {|h,k| h[k] = []}){|fn,h| 
	h[File.open(fn).readline.chomp]<<fn
}
if keys.length>1 then
	keys.each{|k,v| puts "#{k}\n\t#{v.join("\n\t")}"}
	exit(false)
else
	puts "All files equal"
end
' *.csv

然后你可以测试退出代码。

英文:

It is helpful to know WHICH file is different.

Given:

head -n 2 *.csv
==&gt; file1.csv &lt;==
column1,column2,column3
3,1,3

==&gt; file2.csv &lt;==
column1,column2,column3
5,4,1

==&gt; file3.csv &lt;==
column_diff,column2,column3
5,4,1

==&gt; file4.csv &lt;==
column1,column2,column3
5,4,1

You can use this Ruby to determine the offender:

ruby -e &#39;keys=ARGV.each_with_object(Hash.new {|h,k| h[k] = []}){|fn,h| 
	h[File.open(fn).readline.chomp]&lt;&lt;fn
}
keys.each{|k,v| puts &quot;#{k}\n\t#{v.join(&quot;\n\t&quot;)}&quot;}
&#39; *.csv

Prints:

column1,column2,column3
	file1.csv
	file2.csv
	file4.csv
column_diff,column2,column3
	file3.csv

If you want to be able to test from Bash the result:

ruby -e &#39;keys=ARGV.each_with_object(Hash.new {|h,k| h[k] = []}){|fn,h| 
	h[File.open(fn).readline.chomp]&lt;&lt;fn
}
if keys.length&gt;1 then
	keys.each{|k,v| puts &quot;#{k}\n\t#{v.join(&quot;\n\t&quot;)}&quot;}
	exit(false)
else
	puts &quot;All files equal&quot;
end
&#39; *.csv

Then just test the exit code.

答案4

得分: 1

使用GNU head(使用-q选项),这个一行命令应该可以完成任务:

if [ -z "$(head -q -n1 *.csv | uniq -u)" ]; then echo true; else echo false; fi

GNU head-q选项将抑制文件名的打印。uniq -u只有在所有行都相同时才不会输出任何内容。sort对于这个任务来说是不必要的。

英文:

Using GNU head (for the -q option), this one-liner should do the trick:

if [ -z &quot;$(head -q -n1 *.csv | uniq -u)&quot; ]; then echo true; else echo false; fi

The -q option of GNU head will suppress printing of filenames. uniq -u will give no output only if all lines are the same. sort isn't necessary for this task.

答案5

得分: 0

Using any awk (untested):

    FNR == 1 {
        if ( $0 != prev ) {
            status = 1
            exit
        }
        prev = $0
        nextfile
    }
    END {
        print ( status ? "false" : "true" )
        exit status
    }
' *.csv

It'll run faster if your awk supports nextfile but it'll work either way.

I'm assuming above that you want your script both print "true"/"false" and to exit with a success/fail status.

英文:

Using any awk (untested):

awk &#39;
    FNR == 1 {
        if ( $0 != prev ) {
            status = 1
            exit
        }
        prev = $0
        nextfile
    }
    END {
        print ( status ? &quot;false&quot; : &quot;true&quot; )
        exit status
    }
&#39; *.csv

It'll run faster if your awk supports nextfile but it'll work either way.

I'm assuming above that you want your script both print "true"/"false" and to exit with a success/fail status.

huangapple
  • 本文由 发表于 2023年4月4日 05:58:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75924055.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定