英文:
awk or sed command for columns and rows selection from multiple files
问题
以下是翻译好的部分:
寻找以下任务的命令:
我有三个文件,每个文件都有两列,如下所示。
我想创建`file4`,其中有四列。
输出应该类似于`file1`,`file2`和`file3`的合并排序版本,其中第一列已排序,第二列是`file1`的第二列,第三列是`file2`的第二列,第四列是`file3`的第二列。
第2到3列中的条目不应排序,但应与原始文件的第一列的*键*-值匹配。
我尝试在Linux中使用交集,但没有得到所需的输出。
任何帮助将不胜感激。提前感谢!!
英文:
Looking for a command for the following task:
I have three files, each with two columns, as seen below.
I would like to create file4
with four columns.
The output should resemble a merge-sorted version of file1
, file2
and file3
such that the first column is sorted, the second column is the second column of file1
the third column is the second column of file2
and the fourth column is the second column of file3
.
The entries in column 2 to 3 should not be sorted but should match the key-value in the first column of the original files.
I tried intersection in Linux, but not giving the desired outputs.
Any help will be appreciated. Thanks in advance!!
$ cat -- file1
A1 B5
A10 B2
A3 B15
A15 B6
A2 B10
A6 B19
$ cat -- file2
A10 C4
A4 C8
A6 C5
A3 C10
A12 C14
A15 C18
$ cat -- file 3
A3 D1
A22 D9
A20 D3
A10 D5
A6 D10
A21 D11
$ cat -- file 4
col1 col2 col3 col4
A1 B5
A2 B10
A3 B15 C10 D1
A4 C8
A6 B19 C5 D10
A10 B2 C4 D5
A12 C14
A15 B6 C18
A20 D3
A21 D11
A22 D9
答案1
得分: 3
Awk + Bash version:
( echo "col1, col2, col3, col4" &&
awk 'ARGIND==1 { a[$1]=$2; allkeys[$1]=1 } ARGIND==2 { b[$1]=$2; allkeys[$1]=1 } ARGIND==3 { c[$1]=$2; allkeys[$1]=1 }
END{
for (k in allkeys) {
print k", "a[k]", "b[k]", "c[k]
}
}' file1 file2 file3 | sort -V -k1,1 ) | column -t -s ','
Pure Bash version:
declare -A a
while read key value; do a[$key]="${a[$key]:-}${a[$key]:+, }$value"; done < file1
while read key value; do a[$key]="${a[$key]:-, }${a[$key]:+, }$value"; done < file2
while read key value; do a[$key]="${a[$key]:-, , }${a[$key]:+, }$value"; done < file3
(echo "col1, col2, col3, col4" &&
for i in ${!a[@]}; do
echo $i, ${a[$i]}
done | sort -V -k1,1) | column -t -s ','
Explanation for ${a[$key]:-, , }${a[$key]:+, }$value
please check Shell-Parameter-Expansion
英文:
Awk + Bash version:
( echo "col1, col2, col3, col4" &&
awk 'ARGIND==1 { a[$1]=$2; allkeys[$1]=1 } ARGIND==2 { b[$1]=$2; allkeys[$1]=1 } ARGIND==3 { c[$1]=$2; allkeys[$1]=1 }
END{
for (k in allkeys) {
print k", "a[k]", "b[k]", "c[k]
}
}' file1 file2 file3 | sort -V -k1,1 ) | column -t -s ','
Pure Bash version:
declare -A a
while read key value; do a[$key]="${a[$key]:-}${a[$key]:+, }$value"; done < file1
while read key value; do a[$key]="${a[$key]:-, }${a[$key]:+, }$value"; done < file2
while read key value; do a[$key]="${a[$key]:-, , }${a[$key]:+, }$value"; done < file3
(echo "col1, col2, col3, col4" &&
for i in ${!a[@]}; do
echo $i, ${a[$i]}
done | sort -V -k1,1) | column -t -s ','
Explanation for "${a[$key]:-, , }${a[$key]:+, }$value"
please check Shell-Parameter-Expansion
答案2
得分: 1
使用GNU Awk:
gawk '{ a[$1] = substr($1, 1); b[$1, ARGIND] = $2 }
END {
PROCINFO["sorted_in"] = "@val_num_asc"
for (i in a) {
t = i
for (j = 1; j <= ARGIND; ++j)
t = t OFS b[i, j]
print t
}
}' file{1..3} | column -t
英文:
Using GNU Awk:
gawk '{ a[$1] = substr($1, 1); b[$1, ARGIND] = $2 }
END {
PROCINFO["sorted_in"] = "@val_num_asc"
for (i in a) {
t = i
for (j = 1; j <= ARGIND; ++j)
t = t OFS b[i, j]
print t
}
}' file{1..3} | column -t
答案3
得分: 1
这是一个名为join
的简单工具,允许您执行这个操作:
#!/usr/bin/env bash
cut -d ' ' -f1 file{1,2,3} | sort -k1,1 -u > ftmp
for f in file1 file2 file3; do
mv -- ftmp file4
join -a1 -e "---" -o auto file4 <(sort -k1,1 "$f") > ftmp
done
sort -k1,1V ftmp > file4
cat file4
这将输出:
A1 B5 --- ---
A2 B10 --- ---
A3 B15 C10 D1
A4 --- C8 ---
A6 B19 C5 D10
A10 B2 C4 D5
A12 --- C14 ---
A15 B6 C18 ---
A20 --- --- D3
A21 --- --- D11
A22 --- --- D9
我使用---
表示空字段。如果您想要进行漂亮的打印,您需要使用awk或其他工具重新解析它。
英文:
There is a simple tool called join
that allows you to perform this operation:
#!/usr/bin/env bash
cut -d ' ' -f1 file{1,2,3} | sort -k1,1 -u > ftmp
for f in file1 file2 file3; do
mv -- ftmp file4
join -a1 -e "---" -o auto file4 <(sort -k1,1 "$f") > ftmp
done
sort -k1,1V ftmp > file4
cat file4
This outputs
A1 B5 --- ---
A2 B10 --- ---
A3 B15 C10 D1
A4 --- C8 ---
A6 B19 C5 D10
A10 B2 C4 D5
A12 --- C14 ---
A15 B6 C18 ---
A20 --- --- D3
A21 --- --- D11
A22 --- --- D9
I used ---
to indicate an empty field. If you want to pretty print this, you have to re-parse it with awk or anything else.
答案4
得分: 0
这可能适用于您(使用GNU sed和sort):
s='';
for f in file{1,2,3}; do
s="$s\t";
sed -E "s/\s+/$s/" $f;
done |
sort -V |
sed -Ee '1i\col1\tcol2\tcol3\tcol4' -e ':a;N;s/^((\S+\t).*\S).*\n\t+/\t/;ta;P;D'
将空格替换为制表符,并根据正在处理的文件插入键和值之间的制表符数量。
按键列顺序对输出进行排序。
将每行与其键合并并打印结果。
英文:
This might work for you (GNU sed and sort):
s=''; for f in file{1,2,3}; do s="$s\t"; sed -E "s/\s+/$s/" $f; done |
sort -V |
sed -Ee '1i\col1\tcol2\tcol3\tcol4' -e ':a;N;s/^((\S+\t).*\S).*\n\t+/\t/;ta;P;D'
Replace spaces by tabs and insert the number of tabs between the key and value depending on which file is being processed.
Sort the output by key column order.
Coalesce each line with its key and print the result.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论