awk或sed命令用于从多个文件中选择列和行。

huangapple go评论64阅读模式
英文:

awk or sed command for columns and rows selection from multiple files

问题

以下是翻译好的部分:

寻找以下任务的命令:

我有三个文件,每个文件都有两列,如下所示。
我想创建`file4`,其中有四列。

输出应该类似于`file1`,`file2`和`file3`的合并排序版本,其中第一列已排序,第二列是`file1`的第二列,第三列是`file2`的第二列,第四列是`file3`的第二列。

第2到3列中的条目不应排序,但应与原始文件的第一列的*键*-值匹配。

我尝试在Linux中使用交集,但没有得到所需的输出。

任何帮助将不胜感激。提前感谢!!

英文:

Looking for a command for the following task:

I have three files, each with two columns, as seen below.
I would like to create file4 with four columns.

The output should resemble a merge-sorted version of file1, file2 and file3 such that the first column is sorted, the second column is the second column of file1 the third column is the second column of file2 and the fourth column is the second column of file3.

The entries in column 2 to 3 should not be sorted but should match the key-value in the first column of the original files.

I tried intersection in Linux, but not giving the desired outputs.

Any help will be appreciated. Thanks in advance!!

$ cat -- file1                
A1     B5
A10    B2
A3     B15
A15    B6
A2     B10
A6     B19
$ cat -- file2
A10 C4
A4  C8
A6  C5
A3  C10
A12 C14
A15 C18
$ cat -- file 3
A3  D1
A22 D9
A20 D3
A10 D5
A6  D10
A21 D11

$ cat -- file 4
col1 col2    col3    col4
A1   B5
A2   B10
A3   B15      C10     D1
A4            C8 
A6   B19      C5      D10
A10  B2       C4      D5
A12           C14
A15  B6       C18
A20                   D3
A21                   D11
A22                   D9

答案1

得分: 3

Awk + Bash version:
( echo "col1, col2, col3, col4" &&
awk 'ARGIND==1 { a[$1]=$2; allkeys[$1]=1 } ARGIND==2 { b[$1]=$2; allkeys[$1]=1 } ARGIND==3 { c[$1]=$2; allkeys[$1]=1 }
    END{
        for (k in allkeys) {
            print k", "a[k]", "b[k]", "c[k]
        }
    }' file1 file2 file3 | sort -V -k1,1 ) | column -t -s ','
Pure Bash version:
declare -A a
while read key value; do a[$key]="${a[$key]:-}${a[$key]:+, }$value"; done < file1
while read key value; do a[$key]="${a[$key]:-, }${a[$key]:+, }$value"; done < file2
while read key value; do a[$key]="${a[$key]:-, , }${a[$key]:+, }$value"; done < file3

(echo "col1, col2, col3, col4" &&
for i in ${!a[@]}; do 
    echo $i, ${a[$i]}
done | sort -V -k1,1) | column -t -s ','

Explanation for ${a[$key]:-, , }${a[$key]:+, }$value please check Shell-Parameter-Expansion

英文:
Awk + Bash version:
( echo &quot;col1, col2, col3, col4&quot; &amp;&amp;
awk &#39;ARGIND==1 { a[$1]=$2; allkeys[$1]=1 } ARGIND==2 { b[$1]=$2; allkeys[$1]=1 } ARGIND==3 { c[$1]=$2; allkeys[$1]=1 }
    END{
        for (k in allkeys) {
            print k&quot;, &quot;a[k]&quot;, &quot;b[k]&quot;, &quot;c[k]
        }
    }&#39; file1 file2 file3 | sort -V -k1,1 ) | column -t -s &#39;,&#39; 
Pure Bash version:
declare -A a
while read key value; do a[$key]=&quot;${a[$key]:-}${a[$key]:+, }$value&quot;; done &lt; file1
while read key value; do a[$key]=&quot;${a[$key]:-, }${a[$key]:+, }$value&quot;; done &lt; file2
while read key value; do a[$key]=&quot;${a[$key]:-, , }${a[$key]:+, }$value&quot;; done &lt; file3

(echo &quot;col1, col2, col3, col4&quot; &amp;&amp;
for i in ${!a[@]}; do 
    echo $i, ${a[$i]}
done | sort -V -k1,1) | column -t -s &#39;,&#39;

Explanation for &quot;${a[$key]:-, , }${a[$key]:+, }$value&quot; please check Shell-Parameter-Expansion

答案2

得分: 1

使用GNU Awk:

gawk '{ a[$1] = substr($1, 1); b[$1, ARGIND] = $2 }
	END {
		PROCINFO["sorted_in"] = "@val_num_asc"
		for (i in a) {
			t = i
			for (j = 1; j <= ARGIND; ++j)
				t = t OFS b[i, j]
			print t
		}
	}' file{1..3} | column -t
英文:

Using GNU Awk:

gawk &#39;{ a[$1] = substr($1, 1); b[$1, ARGIND] = $2 }
	END {
		PROCINFO[&quot;sorted_in&quot;] = &quot;@val_num_asc&quot;
		for (i in a) {
			t = i
			for (j = 1; j &lt;= ARGIND; ++j)
				t = t OFS b[i, j]
			print t
		}
	}&#39; file{1..3} | column -t

答案3

得分: 1

这是一个名为join的简单工具,允许您执行这个操作:

#!/usr/bin/env bash
cut -d ' ' -f1 file{1,2,3} | sort -k1,1 -u > ftmp
for f in file1 file2 file3; do
   mv -- ftmp file4
   join -a1 -e "---" -o auto file4 <(sort -k1,1 "$f") > ftmp
done
sort -k1,1V ftmp > file4
cat file4

这将输出:

A1 B5 --- ---
A2 B10 --- ---
A3 B15 C10 D1
A4 --- C8 ---
A6 B19 C5 D10
A10 B2 C4 D5
A12 --- C14 ---
A15 B6 C18 ---
A20 --- --- D3
A21 --- --- D11
A22 --- --- D9

我使用---表示空字段。如果您想要进行漂亮的打印,您需要使用awk或其他工具重新解析它。

英文:

There is a simple tool called join that allows you to perform this operation:

#!/usr/bin/env bash
cut -d &#39; &#39; -f1 file{1,2,3} | sort -k1,1 -u &gt; ftmp
for f in file1 file2 file3; do
   mv -- ftmp file4
   join -a1 -e &quot;---&quot; -o auto file4 &lt;(sort -k1,1 &quot;$f&quot;) &gt; ftmp
done
sort -k1,1V ftmp &gt; file4
cat file4

This outputs

A1 B5 --- ---
A2 B10 --- ---
A3 B15 C10 D1
A4 --- C8 ---
A6 B19 C5 D10
A10 B2 C4 D5
A12 --- C14 ---
A15 B6 C18 ---
A20 --- --- D3
A21 --- --- D11
A22 --- --- D9

I used --- to indicate an empty field. If you want to pretty print this, you have to re-parse it with awk or anything else.

答案4

得分: 0

这可能适用于您(使用GNU sed和sort):

s='';
for f in file{1,2,3}; do
    s="$s\t";
    sed -E "s/\s+/$s/" $f;
done |
sort -V |
sed -Ee '1i\col1\tcol2\tcol3\tcol4' -e ':a;N;s/^((\S+\t).*\S).*\n\t+/\t/;ta;P;D'

将空格替换为制表符,并根据正在处理的文件插入键和值之间的制表符数量。

按键列顺序对输出进行排序。

将每行与其键合并并打印结果。

英文:

This might work for you (GNU sed and sort):

s=&#39;&#39;; for f in file{1,2,3}; do s=&quot;$s\t&quot;; sed -E &quot;s/\s+/$s/&quot; $f; done |
sort -V | 
sed -Ee &#39;1i\col1\tcol2\tcol3\tcol4&#39; -e &#39;:a;N;s/^((\S+\t).*\S).*\n\t+/\t/;ta;P;D&#39;

Replace spaces by tabs and insert the number of tabs between the key and value depending on which file is being processed.

Sort the output by key column order.

Coalesce each line with its key and print the result.

huangapple
  • 本文由 发表于 2023年2月8日 16:52:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75383295.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定