2023年7月11日 14:10:56go评论99阅读模式

英文:

Looking for Unix AWK command to perform full outer join of 2 files based on a common column (1st column in unix files is common)

问题

以下是您要翻译的内容：

"Looking for Unix AWK command to perform full outer join of 2 files based on a common column (1st column in Unix files is common)

1st File - app.txt

5.13.0.12S   Yes
5.13.0.11S   Yes
5.13.0.24S   Yes
5.13.0.23S   Yes
5.13.0.22S   Yes

1st File - db.txt

5.13.0.12S   WLPFO5_13_0_12S   WLPFO   incremental   2023-07-10_11:57:51   Activated   Current
5.13.0.11S   WLPFO5_13_0_11S   WLPFO   full          2023-07-10_06:37:32   Activated   Previous

Tried below AWK command. Did not work well (expecting result like - full outer join)

awk 'FNR==NR {data[$1]=$0; next} {$1=data[$1]}1' db.txt app.txt

Output of AWK -

5.13.0.12S   WLPFO5_13_0_12S   WLPFO   incremental   2023-07-10_11:57:51   Activated   Current Yes
5.13.0.11S   WLPFO5_13_0_11S   WLPFO   full          2023-07-10_06:37:32   Activated   Previous Yes
 Yes
 Yes
 Yes

for last 3 records, expecting Nulls then values (Yes) from app.txt

Please suggest ..
Thanks"

英文:

Looking for Unix AWK command to perform full outer join of 2 files based on a common column (1st column in Unix files is common)

1st File - app.txt

5.13.0.12S   Yes
5.13.0.11S   Yes
5.13.0.24S   Yes
5.13.0.23S   Yes
5.13.0.22S   Yes

1st File - db.txt

5.13.0.12S   WLPFO5_13_0_12S   WLPFO   incremental   2023-07-10_11:57:51   Activated   Current
5.13.0.11S   WLPFO5_13_0_11S   WLPFO   full          2023-07-10_06:37:32   Activated   Previous

Tried below AWK command. Did not work well (expecting result like - full outer join)

awk &#39;FNR==NR {data[$1]=$0; next} {$1=data[$1]}1&#39; db.txt app.txt

Output of AWK -

5.13.0.12S   WLPFO5_13_0_12S   WLPFO   incremental   2023-07-10_11:57:51   Activated   Current Yes
5.13.0.11S   WLPFO5_13_0_11S   WLPFO   full          2023-07-10_06:37:32   Activated   Previous Yes
 Yes
 Yes
 Yes

for last 3 records, expecting Nulls then values (Yes) from app.txt

Please suggest ..
Thanks

答案1

得分: 3

以下是您要翻译的内容：

不确定我是否误解了，但这里有一个使用AWK的潜在选项：
或者使用GNU join：

英文:

Not sure if I've misunderstood, but here is one potential option using AWK:

awk &#39;FNR == NR {
	num_fields_db = NF
	data[$1] = $0
	next
}
{
	if ($1 in data) {
		print data[$1], $2
	} else {
		for (i = 1; i &lt; num_fields_db; i++) {
			output = output &quot;NULL&quot; &quot;\t&quot;
		}
		print $1 &quot;\t&quot; output $2
		output = &quot;&quot;
	}
}&#39; db.txt app.txt
5.13.0.12S  WLPFO5_13_0_12S  WLPFO  incremental  2023-07-10_11:57:51  Activated  Current   Yes
5.13.0.11S  WLPFO5_13_0_11S  WLPFO  full         2023-07-10_06:37:32  Activated  Previous  Yes
5.13.0.24S  NULL             NULL   NULL         NULL                 NULL       NULL      Yes
5.13.0.23S  NULL             NULL   NULL         NULL                 NULL       NULL      Yes
5.13.0.22S  NULL             NULL   NULL         NULL                 NULL       NULL      Yes

Or with GNU join:

join -e &quot;NULL&quot; -j1 -a1 -o 1.1,2.2,2.3,2.4,2.5,2.6,2.7,1.2 &lt;(sort app.txt) &lt;(sort db.txt)
5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated  Previous  Yes
5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated  Current   Yes
5.13.0.22S NULL NULL NULL NULL NULL NULL Yes
5.13.0.23S NULL NULL NULL NULL NULL NULL Yes
5.13.0.24S NULL NULL NULL NULL NULL NULL Yes
# tab delimited
join -e &quot;NULL&quot; -j1 -a1 -o 1.1,2.2,2.3,2.4,2.5,2.6,2.7,1.2 &lt;(sort app.txt) &lt;(sort db.txt) | tr -s &quot; &quot; &quot;\t&quot;
5.13.0.11S  WLPFO5_13_0_11S  WLPFO  full         2023-07-10_06:37:32  Activated  Previous  Yes
5.13.0.12S  WLPFO5_13_0_12S  WLPFO  incremental  2023-07-10_11:57:51  Activated  Current   Yes
5.13.0.22S  NULL             NULL   NULL         NULL                 NULL       NULL       Yes
5.13.0.23S  NULL             NULL   NULL         NULL                 NULL       NULL       Yes
5.13.0.24S  NULL             NULL   NULL         NULL                 NULL       NULL       Yes

答案2

得分: 3

使用您提供的示例，请尝试以下awk解决方案。使用column来使输出更清晰。

FNR==1 && NR==1{
  for(i=1;i<=NF;i++){
    null=(null?null "\t":"") "NULL"
  }
}
FNR==NR{
  arr[$1]=$0
  next
}
{
  print ($1 in arr)?arr[$1] "\t" $2:$1 "\t" null
}
' db.txt app.txt | column -t

请注意，这是您提供的awk脚本的翻译部分，不包括代码部分。

英文:

With your shown samples please try following awk solution. Using column to get the output more clear form.

awk &#39;
FNR==1 &amp;&amp; NR==1{
  for(i=1;i&lt;=NF;i++){
    null=(null?null &quot;\t&quot;:&quot;&quot;) &quot;NULL&quot;
  }
}
FNR==NR{
  arr[$1]=$0
  next
}
{
  print ($1 in arr)?arr[$1] &quot;\t&quot; $2:$1 &quot;\t&quot; null
}
&#39; db.txt app.txt | column -t

答案3

得分: 3

你可以使用以下的awk命令：

awk '
FNR == NR {
   if (FNR == 1) {
      nulls = sprintf("%*s", NF-1, "");
      gsub(/ /, " NULL", nulls)
   }
   data[$1] = $0
   next
}
{
   print ($1 in data ? data[$1] : $1 nulls), $2
}' db.txt app.txt | column -t

在这里：

对于第一个文件的第一行，在变量nulls中使用sprintf准备一个具有NF-1个空格的字符串。
然后使用gsub在变量nulls中将每个空格替换为" NULL"。
将db.txt的每一行存储在关联数组data中，键为$1。
最后，在处理app.txt时，检查关联数组data中是否存在$1。
如果在数组中找到值，则打印$1和映射的值，否则打印变量$1和nulls。
使用column -t以表格格式显示输出。

英文:

You may use this awk:

awk &#39;
FNR == NR {
   if (FNR == 1) {
      nulls = sprintf(&quot;%*s&quot;, NF-1, &quot;&quot;);
      gsub(/ /, &quot; NULL&quot;, nulls)
   }
   data[$1] = $0
   next
}
{
   print ($1 in data ? data[$1] : $1 nulls), $2
}&#39; db.txt app.txt | column -t
5.13.0.12S  WLPFO5_13_0_12S  WLPFO  incremental  2023-07-10_11:57:51  Activated  Current   Yes
5.13.0.11S  WLPFO5_13_0_11S  WLPFO  full         2023-07-10_06:37:32  Activated  Previous  Yes
5.13.0.24S  NULL             NULL   NULL         NULL                 NULL       NULL      Yes
5.13.0.23S  NULL             NULL   NULL         NULL                 NULL       NULL      Yes
5.13.0.22S  NULL             NULL   NULL         NULL                 NULL       NULL      Yes

Here:

For the first row in first file, using sprintf, we prepare a string with NF-1 number of spaces in variable nulls first.
Then using gsub we replace each space with " NULL" in variable nulls
Store each row of db.txt in associative array data with key as $1
Finally while processing app.txt we check presence of $1 in associative array data
If value is found in array then print $1 and mapped value otherwise print variable $1 and nulls
Use column -t to display output in tabular format.

答案4

得分: 2

执行全外连接操作使用任何 POSIX awk 均可：

$ cat tst.awk
{
    key = $1
    vals = $0
    sub(/[^[:space:]]+[[:space:]]+/,"",vals)
    if ( FNR == 1 ) {
        nomatch[++fileNr] = vals
        gsub(/[^[:space:]]+/,"Null",nomatch[fileNr])
    }
}
NR==FNR {
    file1[key] = vals
    next
}
{
    print $1, ($1 in file1 ? file1[$1] : nomatch[1]), vals
    delete file1[$1]
}
END {
    for ( key in file1 ) {
        print key, file1[key], nomatch[2]
    }
}

要测试它，请修改您的示例，向 db.txt 添加不在 app.txt 中具有共享键的行，并向 app.txt 添加额外的字段。

它假定每个文件中的每一行将具有与该文件的第一行相同数量的字段。如果不是这样，请添加任何您想要处理的逻辑。如果任一输入文件可能为空，那么您需要添加逻辑来定义输出中应该存在多少字段。它还假定键值在每个输入文件中是唯一的。

英文:

To do a full outer join using any POSIX awk would be:

$ cat tst.awk
{
    key = $1
    vals = $0
    sub(/[^[:space:]]+[[:space:]]+/,&quot;&quot;,vals)
    if ( FNR == 1 ) {
        nomatch[++fileNr] = vals
        gsub(/[^[:space:]]+/,&quot;Null&quot;,nomatch[fileNr])
    }
}
NR==FNR {
    file1[key] = vals
    next
}
{
    print $1, ($1 in file1 ? file1[$1] : nomatch[1]), vals
    delete file1[$1]
}
END {
    for ( key in file1 ) {
        print key, file1[key], nomatch[2]
    }
}

To test it modify your example to add lines to db.txt that don't have a shared key in app.txt and add additional fields to app.txt.

It assumes that every line from each file will have the same number of fields as the first line from that file. If that's not true then add whatever logic you want to handle that. If either input file could be empty then you'd need to add logic to define how many fields should be present in the output. It also assumes that the key value is unique in each input file.

答案5

得分: 0

你可以使用 `join`
```bash
join -a1 app.txt db.txt

输出

5.13.0.12S 是的 WLPFO5_13_0_12S WLPFO 增量 2023-07-10_11:57:51 已激活 当前
5.13.0.11S 是的 WLPFO5_13_0_11S WLPFO 完整 2023-07-10_06:37:32 已激活 之前
5.13.0.24S 是的
5.13.0.23S 是的
5.13.0.22S 是的

英文:

You can use join

join -a1 app.txt db.txt

output

5.13.0.12S Yes WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current
5.13.0.11S Yes WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous
5.13.0.24S Yes
5.13.0.23S Yes
5.13.0.22S Yes

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Looking for Unix AWK command to perform full outer join of 2 files based on a common column (1st column in unix files is common)

问题

答案1

答案2

答案3

答案4

答案5

匹配用户序列在awk中

为什么在UNIX中，如果没有使用wait()，父进程总是在子进程之后执行？

为什么两个终端显示不同的OpenGL版本？

嵌套的for循环在Shell脚本编码中执行Hive时出错。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论