2023年4月4日 05:58:40go评论66阅读模式

英文:

How to validate whether all csv files have the same first line?

问题

我有一个包含许多 CSV 文件的目录。我想要一个 Shell 脚本来检查每个文件的第一行是否相同。例如，这些文件有相同的标题，所以检查应该返回 True。

我认为这应该是一个已经被问过的问题，但我在 Stackoverflow 上没有看到过。

这个问题类似于检查一个文件中是否所有行都相同的问题：https://unix.stackexchange.com/questions/533915/check-if-all-lines-in-a-file-are-same。

我尝试过的方法：

echo "$(ls -AU | head -1)" 可以获取我一个文件的第一行。
我考虑尝试断言所有文件都有这个值作为第一行（最好是一个简洁的管道而不是 for 循环），但我无法弄清楚如何做到这一点。
我尝试使用这里的答案https://unix.stackexchange.com/a/533917，它使用uniq和wc，但这两个命令似乎是针对单个文件的行进行迭代的（而不是迭代一个通用列表的输出）。

英文:

I have a directory containing many csv files. I want a shell script to check whether the first line of each file is the same. For example, these files have the same header, so the check should return True.

❯ cat file1.csv
column1,column2,column3
3,1,3
4,3,9
❯ cat file2.csv
column1,column2,column3
5,4,1
1,8,2

I thought this would be an already asked question, but I haven't seen it on Stackoverflow.

This question is similar, which checks whether all lines in one file are the same: https://unix.stackexchange.com/questions/533915/check-if-all-lines-in-a-file-are-same.

What I've tried:

echo "$(ls -AU | head -1)" gets me the first line of one file
I thought about trying to assert that all files have this value as a first line (preferably as a concise pipe rather than for loop), but couldn't figure out how to do this
I've tried to use the answer here https://unix.stackexchange.com/a/533917, which uses uniq and wc, but both commands seem specific to iterating through lines of a single file (rather than iterating over a generic list output)

答案1

得分: 6

使用GNU的sed和bash：

rows=$(sed -s '1!d' *.csv | sort -u | wc -l)
if [[ "$rows" -eq 1 ]]; then echo "true"; else echo "false"; fi

sed -s '1!d' *.csv 会将所有当前目录中具有 .csv 后缀的文件的第一行输出到标准输出。

英文:

With GNU sed and bash:

rows=$(sed -s &#39;1!d&#39; *.csv | sort -u | wc -l)
if [[ &quot;$rows&quot; -eq 1 ]]; then echo &quot;true&quot;; else echo &quot;false&quot;; fi

sed -s '1!d' *.csv outputs to stdout first line of all files in current directory with suffix .csv.

答案2

得分: 4

一种 awk 的想法：

```awk{ headers[$0]; nextfile } # 使用第一行作为数组索引；跳到下一个文件 END { if ( length(headers)==1 ) # 如果数组只有一个条目，则 ... print "true" # 所有文件都有相同的标题行 else # 否则 ... print "false" # 存在不止一个唯一的标题 } *.csv


<details>
<summary>英文:</summary>

One `awk` idea:

    awk &#39;
        { headers[$0]; nextfile }             # use 1st line as array index; skip to next file
    END { if ( length(headers)==1 )           # if array only has one entry then ...
             print &quot;true&quot;                     # all files have the same header line
          else                                # else ...
             print &quot;false&quot;                    # there is more than one unique header
        }
    &#39; *.csv



</details>



# 答案3
**得分**: 2

==&gt; file1.csv &lt;==
column1,column2,column3
3,1,3

==&gt; file2.csv &lt;==
column1,column2,column3
5,4,1

==&gt; file3.csv &lt;==
column_diff,column2,column3
5,4,1

==&gt; file4.csv &lt;==
column1,column2,column3
5,4,1

你可以使用以下的Ruby代码来确定哪个文件不同:

```ruby
ruby -e 'keys=ARGV.each_with_object(Hash.new {|h,k| h[k] = []}){|fn,h| 
	h[File.open(fn).readline.chomp]<<fn
}
keys.each{|k,v| puts "#{k}\n\t#{v.join("\n\t")}"}
' *.csv

输出:

column1,column2,column3
	file1.csv
	file2.csv
	file4.csv
column_diff,column2,column3
	file3.csv

如果你想从Bash中测试结果:

ruby -e 'keys=ARGV.each_with_object(Hash.new {|h,k| h[k] = []}){|fn,h| 
	h[File.open(fn).readline.chomp]<<fn
}
if keys.length>1 then
	keys.each{|k,v| puts "#{k}\n\t#{v.join("\n\t")}"}
	exit(false)
else
	puts "All files equal"
end
' *.csv

然后你可以测试退出代码。

英文:

It is helpful to know WHICH file is different.

Given:

head -n 2 *.csv
==&gt; file1.csv &lt;==
column1,column2,column3
3,1,3

==&gt; file2.csv &lt;==
column1,column2,column3
5,4,1

==&gt; file3.csv &lt;==
column_diff,column2,column3
5,4,1

==&gt; file4.csv &lt;==
column1,column2,column3
5,4,1

You can use this Ruby to determine the offender:

ruby -e &#39;keys=ARGV.each_with_object(Hash.new {|h,k| h[k] = []}){|fn,h| 
	h[File.open(fn).readline.chomp]&lt;&lt;fn
}
keys.each{|k,v| puts &quot;#{k}\n\t#{v.join(&quot;\n\t&quot;)}&quot;}
&#39; *.csv

Prints:

column1,column2,column3
	file1.csv
	file2.csv
	file4.csv
column_diff,column2,column3
	file3.csv

If you want to be able to test from Bash the result:

ruby -e &#39;keys=ARGV.each_with_object(Hash.new {|h,k| h[k] = []}){|fn,h| 
	h[File.open(fn).readline.chomp]&lt;&lt;fn
}
if keys.length&gt;1 then
	keys.each{|k,v| puts &quot;#{k}\n\t#{v.join(&quot;\n\t&quot;)}&quot;}
	exit(false)
else
	puts &quot;All files equal&quot;
end
&#39; *.csv

Then just test the exit code.

答案4

得分: 1

使用GNU head（使用-q选项），这个一行命令应该可以完成任务：

if [ -z "$(head -q -n1 *.csv | uniq -u)" ]; then echo true; else echo false; fi

GNU head的-q选项将抑制文件名的打印。uniq -u只有在所有行都相同时才不会输出任何内容。sort对于这个任务来说是不必要的。

英文:

Using GNU head (for the -q option), this one-liner should do the trick:

if [ -z &quot;$(head -q -n1 *.csv | uniq -u)&quot; ]; then echo true; else echo false; fi

The -q option of GNU head will suppress printing of filenames. uniq -u will give no output only if all lines are the same. sort isn't necessary for this task.

答案5

得分: 0

Using any awk (untested):

    FNR == 1 {
        if ( $0 != prev ) {
            status = 1
            exit
        }
        prev = $0
        nextfile
    }
    END {
        print ( status ? "false" : "true" )
        exit status
    }
' *.csv

It'll run faster if your awk supports nextfile but it'll work either way.

I'm assuming above that you want your script both print "true"/"false" and to exit with a success/fail status.

英文:

Using any awk (untested):

awk &#39;
    FNR == 1 {
        if ( $0 != prev ) {
            status = 1
            exit
        }
        prev = $0
        nextfile
    }
    END {
        print ( status ? &quot;false&quot; : &quot;true&quot; )
        exit status
    }
&#39; *.csv

It'll run faster if your awk supports nextfile but it'll work either way.

I'm assuming above that you want your script both print "true"/"false" and to exit with a success/fail status.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何验证所有CSV文件的第一行是否相同？

问题

答案1

答案2

答案4

答案5

使用Linux的’column’实用工具。

测量网络中跳跃之间的时间（JAVA）

AWK: print ALL rows with MAX value in one field Per the other field including Identical Rows with Max value AND multiple columns

刷新在bash中运行程序时的输出

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论