Looking for Unix AWK command to perform full outer join of 2 files based on a common column (1st column in unix files is common)

huangapple go评论99阅读模式
英文:

Looking for Unix AWK command to perform full outer join of 2 files based on a common column (1st column in unix files is common)

问题

以下是您要翻译的内容:

"Looking for Unix AWK command to perform full outer join of 2 files based on a common column (1st column in Unix files is common)

1st File - app.txt

  1. 5.13.0.12S Yes
  2. 5.13.0.11S Yes
  3. 5.13.0.24S Yes
  4. 5.13.0.23S Yes
  5. 5.13.0.22S Yes

1st File - db.txt

  1. 5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current
  2. 5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous

Tried below AWK command. Did not work well (expecting result like - full outer join)

  1. awk 'FNR==NR {data[$1]=$0; next} {$1=data[$1]}1' db.txt app.txt

Output of AWK -

  1. 5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current Yes
  2. 5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous Yes
  3. Yes
  4. Yes
  5. Yes

for last 3 records, expecting Nulls then values (Yes) from app.txt

Please suggest ..
Thanks"

英文:

Looking for Unix AWK command to perform full outer join of 2 files based on a common column (1st column in Unix files is common)

1st File - app.txt

  1. 5.13.0.12S Yes
  2. 5.13.0.11S Yes
  3. 5.13.0.24S Yes
  4. 5.13.0.23S Yes
  5. 5.13.0.22S Yes

1st File - db.txt

  1. 5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current
  2. 5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous

Tried below AWK command. Did not work well (expecting result like - full outer join)

  1. awk 'FNR==NR {data[$1]=$0; next} {$1=data[$1]}1' db.txt app.txt

Output of AWK -

  1. 5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current Yes
  2. 5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous Yes
  3. Yes
  4. Yes
  5. Yes

for last 3 records, expecting Nulls then values (Yes) from app.txt

Please suggest ..
Thanks

答案1

得分: 3

以下是您要翻译的内容:

  1. 不确定我是否误解了,但这里有一个使用AWK的潜在选项:
  2. 或者使用GNU join
英文:

Not sure if I've misunderstood, but here is one potential option using AWK:

  1. awk 'FNR == NR {
  2. num_fields_db = NF
  3. data[$1] = $0
  4. next
  5. }
  6. {
  7. if ($1 in data) {
  8. print data[$1], $2
  9. } else {
  10. for (i = 1; i < num_fields_db; i++) {
  11. output = output "NULL" "\t"
  12. }
  13. print $1 "\t" output $2
  14. output = ""
  15. }
  16. }' db.txt app.txt
  17. 5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current Yes
  18. 5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous Yes
  19. 5.13.0.24S NULL NULL NULL NULL NULL NULL Yes
  20. 5.13.0.23S NULL NULL NULL NULL NULL NULL Yes
  21. 5.13.0.22S NULL NULL NULL NULL NULL NULL Yes

Or with GNU join:

  1. join -e "NULL" -j1 -a1 -o 1.1,2.2,2.3,2.4,2.5,2.6,2.7,1.2 <(sort app.txt) <(sort db.txt)
  2. 5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous Yes
  3. 5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current Yes
  4. 5.13.0.22S NULL NULL NULL NULL NULL NULL Yes
  5. 5.13.0.23S NULL NULL NULL NULL NULL NULL Yes
  6. 5.13.0.24S NULL NULL NULL NULL NULL NULL Yes
  7. # tab delimited
  8. join -e "NULL" -j1 -a1 -o 1.1,2.2,2.3,2.4,2.5,2.6,2.7,1.2 <(sort app.txt) <(sort db.txt) | tr -s " " "\t"
  9. 5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous Yes
  10. 5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current Yes
  11. 5.13.0.22S NULL NULL NULL NULL NULL NULL Yes
  12. 5.13.0.23S NULL NULL NULL NULL NULL NULL Yes
  13. 5.13.0.24S NULL NULL NULL NULL NULL NULL Yes

答案2

得分: 3

使用您提供的示例,请尝试以下awk解决方案。使用column来使输出更清晰。

  1. FNR==1 && NR==1{
  2. for(i=1;i<=NF;i++){
  3. null=(null?null "\t":"") "NULL"
  4. }
  5. }
  6. FNR==NR{
  7. arr[$1]=$0
  8. next
  9. }
  10. {
  11. print ($1 in arr)?arr[$1] "\t" $2:$1 "\t" null
  12. }
  13. ' db.txt app.txt | column -t

请注意,这是您提供的awk脚本的翻译部分,不包括代码部分。

英文:

With your shown samples please try following awk solution. Using column to get the output more clear form.

  1. awk &#39;
  2. FNR==1 &amp;&amp; NR==1{
  3. for(i=1;i&lt;=NF;i++){
  4. null=(null?null &quot;\t&quot;:&quot;&quot;) &quot;NULL&quot;
  5. }
  6. }
  7. FNR==NR{
  8. arr[$1]=$0
  9. next
  10. }
  11. {
  12. print ($1 in arr)?arr[$1] &quot;\t&quot; $2:$1 &quot;\t&quot; null
  13. }
  14. &#39; db.txt app.txt | column -t

答案3

得分: 3

你可以使用以下的awk命令:

  1. awk '
  2. FNR == NR {
  3. if (FNR == 1) {
  4. nulls = sprintf("%*s", NF-1, "");
  5. gsub(/ /, " NULL", nulls)
  6. }
  7. data[$1] = $0
  8. next
  9. }
  10. {
  11. print ($1 in data ? data[$1] : $1 nulls), $2
  12. }' db.txt app.txt | column -t

在这里:

  • 对于第一个文件的第一行,在变量nulls中使用sprintf准备一个具有NF-1个空格的字符串。
  • 然后使用gsub在变量nulls中将每个空格替换为" NULL"
  • db.txt的每一行存储在关联数组data中,键为$1
  • 最后,在处理app.txt时,检查关联数组data中是否存在$1
  • 如果在数组中找到值,则打印$1和映射的值,否则打印变量$1nulls
  • 使用column -t以表格格式显示输出。
英文:

You may use this awk:

  1. awk &#39;
  2. FNR == NR {
  3. if (FNR == 1) {
  4. nulls = sprintf(&quot;%*s&quot;, NF-1, &quot;&quot;);
  5. gsub(/ /, &quot; NULL&quot;, nulls)
  6. }
  7. data[$1] = $0
  8. next
  9. }
  10. {
  11. print ($1 in data ? data[$1] : $1 nulls), $2
  12. }&#39; db.txt app.txt | column -t
  13. 5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current Yes
  14. 5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous Yes
  15. 5.13.0.24S NULL NULL NULL NULL NULL NULL Yes
  16. 5.13.0.23S NULL NULL NULL NULL NULL NULL Yes
  17. 5.13.0.22S NULL NULL NULL NULL NULL NULL Yes

Here:

  • For the first row in first file, using sprintf, we prepare a string with NF-1 number of spaces in variable nulls first.
  • Then using gsub we replace each space with &quot; NULL&quot; in variable nulls
  • Store each row of db.txt in associative array data with key as $1
  • Finally while processing app.txt we check presence of $1 in associative array data
  • If value is found in array then print $1 and mapped value otherwise print variable $1 and nulls
  • Use column -t to display output in tabular format.

答案4

得分: 2

执行全外连接操作使用任何 POSIX awk 均可:

  1. $ cat tst.awk
  2. {
  3. key = $1
  4. vals = $0
  5. sub(/[^[:space:]]+[[:space:]]+/,"",vals)
  6. if ( FNR == 1 ) {
  7. nomatch[++fileNr] = vals
  8. gsub(/[^[:space:]]+/,"Null",nomatch[fileNr])
  9. }
  10. }
  11. NR==FNR {
  12. file1[key] = vals
  13. next
  14. }
  15. {
  16. print $1, ($1 in file1 ? file1[$1] : nomatch[1]), vals
  17. delete file1[$1]
  18. }
  19. END {
  20. for ( key in file1 ) {
  21. print key, file1[key], nomatch[2]
  22. }
  23. }

要测试它,请修改您的示例,向 db.txt 添加不在 app.txt 中具有共享键的行,并向 app.txt 添加额外的字段。

它假定每个文件中的每一行将具有与该文件的第一行相同数量的字段。如果不是这样,请添加任何您想要处理的逻辑。如果任一输入文件可能为空,那么您需要添加逻辑来定义输出中应该存在多少字段。它还假定键值在每个输入文件中是唯一的。

英文:

To do a full outer join using any POSIX awk would be:

  1. $ cat tst.awk
  2. {
  3. key = $1
  4. vals = $0
  5. sub(/[^[:space:]]+[[:space:]]+/,&quot;&quot;,vals)
  6. if ( FNR == 1 ) {
  7. nomatch[++fileNr] = vals
  8. gsub(/[^[:space:]]+/,&quot;Null&quot;,nomatch[fileNr])
  9. }
  10. }
  11. NR==FNR {
  12. file1[key] = vals
  13. next
  14. }
  15. {
  16. print $1, ($1 in file1 ? file1[$1] : nomatch[1]), vals
  17. delete file1[$1]
  18. }
  19. END {
  20. for ( key in file1 ) {
  21. print key, file1[key], nomatch[2]
  22. }
  23. }

To test it modify your example to add lines to db.txt that don't have a shared key in app.txt and add additional fields to app.txt.

It assumes that every line from each file will have the same number of fields as the first line from that file. If that's not true then add whatever logic you want to handle that. If either input file could be empty then you'd need to add logic to define how many fields should be present in the output. It also assumes that the key value is unique in each input file.

答案5

得分: 0

  1. 你可以使用 `join`
  2. ```bash
  3. join -a1 app.txt db.txt

输出

  1. 5.13.0.12S 是的 WLPFO5_13_0_12S WLPFO 增量 2023-07-10_11:57:51 已激活 当前
  2. 5.13.0.11S 是的 WLPFO5_13_0_11S WLPFO 完整 2023-07-10_06:37:32 已激活 之前
  3. 5.13.0.24S 是的
  4. 5.13.0.23S 是的
  5. 5.13.0.22S 是的
英文:

You can use join

  1. join -a1 app.txt db.txt

output

  1. 5.13.0.12S Yes WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current
  2. 5.13.0.11S Yes WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous
  3. 5.13.0.24S Yes
  4. 5.13.0.23S Yes
  5. 5.13.0.22S Yes

huangapple
  • 本文由 发表于 2023年7月11日 14:10:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76659118.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定