英文:
Generalize an awk program that concatenates csv files and adds a column
问题
我有这两个CSV文件:
$ cat numbers.csv
h1,h2
1,2
3,4
$ cat letters.csv
h1,h2
a,b
c,d
我想要将这两个CSV文件连接起来,并且添加一个新列,如果行来自letters.csv
则该列的值为letters
,如果来自numbers.csv
则该列的值为columns
,因此我编写了以下awk命令(因为我正在尝试使用awk)来完成此任务:
$ awk 'NR==1 {print new_column "," $0} fname != FILENAME { fname = FILENAME; idx++ } FNR > 1 && idx == 1 { print group1 "," $0 } FNR > 1 && idx == 2 { print group2 "," $0 }' new_column=type group1=letters group2=numbers letters.csv numbers.csv
其中:
new_column
是要添加的新列的名称。group1
是如果行来自letters.csv
文件时,新列的值。group2
是如果行来自numbers.csv
文件时,新列的值。
我得到的输出是:
type,h1,h2
letters,a,b
letters,c,d
numbers,1,2
numbers,3,4
问题在于我不知道是否有一种方法可以使awk程序接受n个参数和n个文件,而不必每次都重写它,像这样:
$ awk '....' new_column=group_name group1=letters group2=numbers ... groupN=whatever letters.csv numbers.csv .... morefiles_to_N.csv
英文:
I have these two csv files:
$ cat numbers.csv
h1,h2
1, 2
3, 4
$ cat letters.csv
h1,h2
a, b
c, d
I want to concatenate both csv files and I want to add a new column with the value letters
or columns
if the row comes from letters.csv
or numbers.csv
so I wrote this awk command (because I'm trying out awk) that does the job:
$ awk 'NR==1 {print new_column","$0} fname != FILENAME { fname = FILENAME; idx++ } FNR > 1 && idx == 1 { print group1","$0 } FNR > 1 && idx == 2 { print group2","$0 }' new_column=type group1=letters group2=numbers letters.csv numbers.csv
new_column
is of the column that is going to be added
group1
is the is the value of new_column
if the row is added from the letters.csv
file
group2
is the is the value of new_column
if the row is added from the numbers.csv
file
The output I get is:
type,h1,h2
letters,a, b
letters,c, d
numbers,1, 2
numbers,3, 4
The problem is that I don't know if there is a way to make the awk program run with n arguments and n files without having to rewrite it everytime like so:
$ awk '....' new_column=group_name group1=letters group2=numbers ... groupN=whatever letters.csv numbers.csv .... morefiles_to_N.csv
答案1
得分: 2
也许像这样
% awk 'NR==1{print "类型,",$0}
FNR==1{nm=FILENAME; sub(/\..*/,"",nm); next}
{print nm",",$0}' num.csv let.csv
类型, h1, h2
num, 1, 2
num, 3, 4
let, a, b
let, c, d
英文:
Maybe like this
% awk 'NR==1{print "type,"$0}
FNR==1{nm=FILENAME; sub(/\..*/,"",nm); next}
{print nm","$0}' num.csv let.csv
type,h1,h2
num,1, 2
num,3, 4
let,a, b
let,c, d
答案2
得分: 2
建议使用单个变量来存储所有所需的名称,然后使用split
函数进行反序列化,考虑一个简单的示例,假设我有一个名为file1.txt
的文件,内容如下:
1
2
3
名为file10.txt
的文件,内容如下:
10
20
以及名为file100.txt
的文件,内容如下:
100
并且想要将它们合并,分别为来自各个文件的数据提供名称ABLE,BAKER和CHARLIE,然后我可以执行以下操作:
awk -v names=ABLE:BAKER:CHARLIE 'BEGIN{split(names,arr,/:/)}FNR==1{i+=1}{print arr[i] "," $0}' file1.txt file10.txt file100.txt
输出如下:
ABLE,1
ABLE,2
ABLE,3
BAKER,10
BAKER,20
CHARLIE,100
说明:我选择了:
作为分隔符,选择分隔符时请注意分隔符在任何名称中都不得存在。在BEGIN
中,我使用:
将名称分割成数组arr
,所以arr[1]
变成了ABLE
,arr[2]
变成了BAKER
,arr[3]
变成了CHARLIE
。对于每个文件的第一行(FNR
),我将i
增加1。对于每一行,我使用,
字符将名称与整行($0
)连接并进行print
。
(在GNU Awk 5.1.0中测试通过)
英文:
I suggest taking look at using single variable for ramming all needed names and then deserialize it using split
function, consider simple example, let say I have file1.txt
with following content
1
2
3
file10.txt
with following content
10
20
and file100.txt
with following content
100
and want to fuse them, giving names ABLE, BAKER, CHARLIE for data from respective filess, then I could do
awk -v names=ABLE:BAKER:CHARLIE 'BEGIN{split(names,arr,/:/)}FNR==1{i+=1}{print arr[i] "," $0}' file1.txt file10.txt file100.txt
gives output
ABLE,1
ABLE,2
ABLE,3
BAKER,10
BAKER,20
CHARLIE,100
Explanation: I elected :
for separating names, when making choice keep in mind separator MUST NOT be present in any name. In BEGIN
I split names at :
into array arr
, so arr[1]
becomes ABLE
, arr2
becomes BAKER
, arr3
becomes CHARLIE
. For each first row of file (FNR
) I increase i
by 1. For each row I print
name concatenated with whole line ($0
) using ,
character.
(tested in GNU Awk 5.1.0)
答案3
得分: 1
awk -v header=type -v groups=letters,numbers '
BEGIN{ OFS=","; split(groups, grps,",") }
FNR==1{i++}
NR==1{ print header,$0; next }
FNR!=1{ print grps[i],$0 }
' letters.csv numbers.csv
type,h1,h2
letters,a, b
letters,c, d
numbers,1, 2
numbers,3, 4
英文:
awk -v header=type -v groups=letters,numbers '
BEGIN{ OFS=","; split(groups, grps,",") }
FNR==1{i++}
NR==1{ print header,$0; next }
FNR!=1{ print grps[i],$0 }
' letters.csv numbers.csv
type,h1,h2
letters,a, b
letters,c, d
numbers,1, 2
numbers,3, 4
答案4
得分: 1
我添加了一个不使用awk的答案,因为我知道你正在寻找使用其他工具(如qsv)的解决方案。
我用于此类任务的工具是Miller。运行以下命令:
mlr --csv put '$type=sub(FILENAME,"\..+","")' letters.csv numbers.csv
你将得到:
h1,h2,type
a, b,letters
c, d,letters
1, 2,numbers
3, 4,numbers
put
是用于构建Miller表达式的动词。在这里,我创建了新字段type
,将其设置为FILENAME
内置变量。我使用sub
函数来删除文件扩展名。
如果你想将type
作为第一个字段添加,只需添加reorder -f type
:
mlr --csv put '$type=sub(FILENAME,"\..+","")' then reorder -f type letters.csv numbers.csv
英文:
I'm adding a no awk answer because I know you are looking for a solution using also other tools (as qsv).
The tool I use for this kind of task is Miller. Running
mlr --csv put '$type=sub(FILENAME,"\..+","")' letters.csv numbers.csv
you get
h1,h2,type
a, b,letters
c, d,letters
1, 2,numbers
3, 4,numbers
put
is the verb to build Miller expressions. Here I create the new field type
, I put it equal to the FILENAME
built-in variable. I use the sub
function to remove the file extension.
If you want type
ad the first field simply add reorder -f type
mlr --csv put '$type=sub(FILENAME,"\..+","")' then reorder -f type letters.csv numbers.csv
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论