2023年6月19日 19:58:00go评论68阅读模式

英文:

Generalize an awk program that concatenates csv files and adds a column

问题

我有这两个CSV文件：

$ cat numbers.csv
h1,h2
1,2
3,4

$ cat letters.csv
h1,h2
a,b
c,d

我想要将这两个CSV文件连接起来，并且添加一个新列，如果行来自letters.csv则该列的值为letters，如果来自numbers.csv则该列的值为columns，因此我编写了以下awk命令（因为我正在尝试使用awk）来完成此任务：

$ awk 'NR==1 {print new_column "," $0} fname != FILENAME { fname = FILENAME; idx++ } FNR > 1 && idx == 1 { print group1 "," $0 } FNR > 1 && idx == 2 { print group2 "," $0 }' new_column=type group1=letters group2=numbers letters.csv numbers.csv

其中：

new_column是要添加的新列的名称。
group1是如果行来自letters.csv文件时，新列的值。
group2是如果行来自numbers.csv文件时，新列的值。

我得到的输出是：

type,h1,h2
letters,a,b
letters,c,d
numbers,1,2
numbers,3,4

问题在于我不知道是否有一种方法可以使awk程序接受n个参数和n个文件，而不必每次都重写它，像这样：

$ awk '....' new_column=group_name group1=letters group2=numbers ... groupN=whatever  letters.csv numbers.csv .... morefiles_to_N.csv

英文:

I have these two csv files:

$ cat numbers.csv
h1,h2
1, 2
3, 4

$ cat letters.csv
h1,h2
a, b
c, d

I want to concatenate both csv files and I want to add a new column with the value letters or columns if the row comes from letters.csv or numbers.csv so I wrote this awk command (because I'm trying out awk) that does the job:

$ awk &#39;NR==1 {print new_column&quot;,&quot;$0} fname != FILENAME { fname = FILENAME; idx++ } FNR &gt; 1 &amp;&amp; idx == 1 { print group1&quot;,&quot;$0 } FNR &gt; 1 &amp;&amp; idx == 2 { print group2&quot;,&quot;$0 }&#39; new_column=type group1=letters group2=numbers letters.csv numbers.csv

new_column is of the column that is going to be added

group1 is the is the value of new_column if the row is added from the letters.csv file

group2 is the is the value of new_column if the row is added from the numbers.csv file

The output I get is:

type,h1,h2
letters,a, b
letters,c, d
numbers,1, 2
numbers,3, 4

The problem is that I don't know if there is a way to make the awk program run with n arguments and n files without having to rewrite it everytime like so:

$ awk &#39;....&#39; new_column=group_name group1=letters group2=numbers ... groupN=whatever  letters.csv numbers.csv .... morefiles_to_N.csv

答案1

得分: 2

也许像这样

% awk 'NR==1{print "类型,",$0} 
       FNR==1{nm=FILENAME; sub(/\..*/,"",nm); next} 
             {print nm",",$0}' num.csv let.csv
类型, h1, h2
num, 1, 2
num, 3, 4
let, a, b
let, c, d

英文:

Maybe like this

% awk &#39;NR==1{print &quot;type,&quot;$0} 
       FNR==1{nm=FILENAME; sub(/\..*/,&quot;&quot;,nm); next} 
             {print nm&quot;,&quot;$0}&#39; num.csv let.csv
type,h1,h2
num,1, 2
num,3, 4
let,a, b
let,c, d

答案2

得分: 2

建议使用单个变量来存储所有所需的名称，然后使用split函数进行反序列化，考虑一个简单的示例，假设我有一个名为file1.txt的文件，内容如下：

1
2
3

名为file10.txt的文件，内容如下：

10
20

以及名为file100.txt的文件，内容如下：

并且想要将它们合并，分别为来自各个文件的数据提供名称ABLE，BAKER和CHARLIE，然后我可以执行以下操作：

awk -v names=ABLE:BAKER:CHARLIE 'BEGIN{split(names,arr,/:/)}FNR==1{i+=1}{print arr[i] "," $0}' file1.txt file10.txt file100.txt

输出如下：

ABLE,1
ABLE,2
ABLE,3
BAKER,10
BAKER,20
CHARLIE,100

说明：我选择了:作为分隔符，选择分隔符时请注意分隔符在任何名称中都不得存在。在BEGIN中，我使用:将名称分割成数组arr，所以arr[1]变成了ABLE，arr[2]变成了BAKER，arr[3]变成了CHARLIE。对于每个文件的第一行（FNR），我将i增加1。对于每一行，我使用,字符将名称与整行（$0）连接并进行print。

（在GNU Awk 5.1.0中测试通过）

英文:

I suggest taking look at using single variable for ramming all needed names and then deserialize it using split function, consider simple example, let say I have file1.txt with following content

1
2
3

file10.txt with following content

10
20

and file100.txt with following content

and want to fuse them, giving names ABLE, BAKER, CHARLIE for data from respective filess, then I could do

awk -v names=ABLE:BAKER:CHARLIE &#39;BEGIN{split(names,arr,/:/)}FNR==1{i+=1}{print arr[i] &quot;,&quot; $0}&#39; file1.txt file10.txt file100.txt

gives output

ABLE,1
ABLE,2
ABLE,3
BAKER,10
BAKER,20
CHARLIE,100

Explanation: I elected : for separating names, when making choice keep in mind separator MUST NOT be present in any name. In BEGIN I split names at : into array arr, so arr[1] becomes ABLE, arr2 becomes BAKER, arr3 becomes CHARLIE. For each first row of file (FNR) I increase i by 1. For each row I print name concatenated with whole line ($0) using , character.

(tested in GNU Awk 5.1.0)

答案3

得分: 1

awk -v header=type -v groups=letters,numbers '
BEGIN{ OFS=","; split(groups, grps,",") }
FNR==1{i++}
NR==1{ print header,$0; next }
FNR!=1{ print grps[i],$0 }
' letters.csv numbers.csv

type,h1,h2
letters,a, b
letters,c, d
numbers,1, 2
numbers,3, 4

英文:

awk -v header=type -v groups=letters,numbers &#39;
    BEGIN{ OFS=&quot;,&quot;; split(groups, grps,&quot;,&quot;) } 
    FNR==1{i++} 
    NR==1{ print header,$0; next } 
    FNR!=1{ print grps[i],$0 }
&#39; letters.csv numbers.csv

type,h1,h2
letters,a, b
letters,c, d
numbers,1, 2
numbers,3, 4

答案4

得分: 1

我添加了一个不使用awk的答案，因为我知道你正在寻找使用其他工具（如qsv）的解决方案。

我用于此类任务的工具是Miller。运行以下命令：

mlr --csv put '$type=sub(FILENAME,"\..+","")' letters.csv numbers.csv

你将得到：

h1,h2,type
a, b,letters
c, d,letters
1, 2,numbers
3, 4,numbers

put 是用于构建Miller表达式的动词。在这里，我创建了新字段type，将其设置为FILENAME内置变量。我使用sub函数来删除文件扩展名。

如果你想将type作为第一个字段添加，只需添加reorder -f type：

mlr --csv put '$type=sub(FILENAME,"\..+","")' then reorder -f type letters.csv numbers.csv

英文:

I'm adding a no awk answer because I know you are looking for a solution using also other tools (as qsv).

The tool I use for this kind of task is Miller. Running

mlr --csv put &#39;$type=sub(FILENAME,&quot;\..+&quot;,&quot;&quot;)&#39; letters.csv numbers.csv

you get

h1,h2,type
a, b,letters
c, d,letters
1, 2,numbers
3, 4,numbers

put is the verb to build Miller expressions. Here I create the new field type, I put it equal to the FILENAME built-in variable. I use the sub function to remove the file extension.

If you want type ad the first field simply add reorder -f type

mlr --csv put &#39;$type=sub(FILENAME,&quot;\..+&quot;,&quot;&quot;)&#39; then reorder -f type letters.csv numbers.csv

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将一个将CSV文件连接起来并添加一列的awk程序泛化。

问题

答案1

答案2

答案3

答案4

将查询数据加载到工作表中使用 VBA。

使用PowerShell根据多个条件筛选CSV文件。

如何使用sed、awk或perl替换/etc/skel/.bashrc或.bashrc文件中的PS1变量？

在Python字典中将特定列中的项目作为键时出现关键错误。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论