英文:
Looking for Unix AWK command to perform full outer join of 2 files based on a common column (1st column in unix files is common)
问题
以下是您要翻译的内容:
"Looking for Unix AWK command to perform full outer join of 2 files based on a common column (1st column in Unix files is common)
1st File - app.txt
5.13.0.12S Yes
5.13.0.11S Yes
5.13.0.24S Yes
5.13.0.23S Yes
5.13.0.22S Yes
1st File - db.txt
5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current
5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous
Tried below AWK command. Did not work well (expecting result like - full outer join)
awk 'FNR==NR {data[$1]=$0; next} {$1=data[$1]}1' db.txt app.txt
Output of AWK -
5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current Yes
5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous Yes
Yes
Yes
Yes
for last 3 records, expecting Nulls then values (Yes) from app.txt
Please suggest ..
Thanks"
英文:
Looking for Unix AWK command to perform full outer join of 2 files based on a common column (1st column in Unix files is common)
1st File - app.txt
5.13.0.12S Yes
5.13.0.11S Yes
5.13.0.24S Yes
5.13.0.23S Yes
5.13.0.22S Yes
1st File - db.txt
5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current
5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous
Tried below AWK command. Did not work well (expecting result like - full outer join)
awk 'FNR==NR {data[$1]=$0; next} {$1=data[$1]}1' db.txt app.txt
Output of AWK -
5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current Yes
5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous Yes
Yes
Yes
Yes
for last 3 records, expecting Nulls then values (Yes) from app.txt
Please suggest ..
Thanks
答案1
得分: 3
以下是您要翻译的内容:
不确定我是否误解了,但这里有一个使用AWK的潜在选项:
或者使用GNU join:
英文:
Not sure if I've misunderstood, but here is one potential option using AWK:
awk 'FNR == NR {
num_fields_db = NF
data[$1] = $0
next
}
{
if ($1 in data) {
print data[$1], $2
} else {
for (i = 1; i < num_fields_db; i++) {
output = output "NULL" "\t"
}
print $1 "\t" output $2
output = ""
}
}' db.txt app.txt
5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current Yes
5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous Yes
5.13.0.24S NULL NULL NULL NULL NULL NULL Yes
5.13.0.23S NULL NULL NULL NULL NULL NULL Yes
5.13.0.22S NULL NULL NULL NULL NULL NULL Yes
Or with GNU join:
join -e "NULL" -j1 -a1 -o 1.1,2.2,2.3,2.4,2.5,2.6,2.7,1.2 <(sort app.txt) <(sort db.txt)
5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous Yes
5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current Yes
5.13.0.22S NULL NULL NULL NULL NULL NULL Yes
5.13.0.23S NULL NULL NULL NULL NULL NULL Yes
5.13.0.24S NULL NULL NULL NULL NULL NULL Yes
# tab delimited
join -e "NULL" -j1 -a1 -o 1.1,2.2,2.3,2.4,2.5,2.6,2.7,1.2 <(sort app.txt) <(sort db.txt) | tr -s " " "\t"
5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous Yes
5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current Yes
5.13.0.22S NULL NULL NULL NULL NULL NULL Yes
5.13.0.23S NULL NULL NULL NULL NULL NULL Yes
5.13.0.24S NULL NULL NULL NULL NULL NULL Yes
答案2
得分: 3
使用您提供的示例,请尝试以下awk
解决方案。使用column
来使输出更清晰。
FNR==1 && NR==1{
for(i=1;i<=NF;i++){
null=(null?null "\t":"") "NULL"
}
}
FNR==NR{
arr[$1]=$0
next
}
{
print ($1 in arr)?arr[$1] "\t" $2:$1 "\t" null
}
' db.txt app.txt | column -t
请注意,这是您提供的awk
脚本的翻译部分,不包括代码部分。
英文:
With your shown samples please try following awk
solution. Using column
to get the output more clear form.
awk '
FNR==1 && NR==1{
for(i=1;i<=NF;i++){
null=(null?null "\t":"") "NULL"
}
}
FNR==NR{
arr[$1]=$0
next
}
{
print ($1 in arr)?arr[$1] "\t" $2:$1 "\t" null
}
' db.txt app.txt | column -t
答案3
得分: 3
你可以使用以下的awk
命令:
awk '
FNR == NR {
if (FNR == 1) {
nulls = sprintf("%*s", NF-1, "");
gsub(/ /, " NULL", nulls)
}
data[$1] = $0
next
}
{
print ($1 in data ? data[$1] : $1 nulls), $2
}' db.txt app.txt | column -t
在这里:
- 对于第一个文件的第一行,在变量
nulls
中使用sprintf
准备一个具有NF-1
个空格的字符串。 - 然后使用
gsub
在变量nulls
中将每个空格替换为" NULL"
。 - 将
db.txt
的每一行存储在关联数组data
中,键为$1
。 - 最后,在处理
app.txt
时,检查关联数组data
中是否存在$1
。 - 如果在数组中找到值,则打印
$1
和映射的值,否则打印变量$1
和nulls
。 - 使用
column -t
以表格格式显示输出。
英文:
You may use this awk
:
awk '
FNR == NR {
if (FNR == 1) {
nulls = sprintf("%*s", NF-1, "");
gsub(/ /, " NULL", nulls)
}
data[$1] = $0
next
}
{
print ($1 in data ? data[$1] : $1 nulls), $2
}' db.txt app.txt | column -t
5.13.0.12S WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current Yes
5.13.0.11S WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous Yes
5.13.0.24S NULL NULL NULL NULL NULL NULL Yes
5.13.0.23S NULL NULL NULL NULL NULL NULL Yes
5.13.0.22S NULL NULL NULL NULL NULL NULL Yes
Here:
- For the first row in first file, using
sprintf
, we prepare a string withNF-1
number of spaces in variablenulls
first. - Then using
gsub
we replace each space with" NULL"
in variablenulls
- Store each row of
db.txt
in associative arraydata
with key as$1
- Finally while processing
app.txt
we check presence of$1
in associative arraydata
- If value is found in array then print
$1
and mapped value otherwise print variable $1 andnulls
- Use
column -t
to display output in tabular format.
答案4
得分: 2
执行全外连接操作使用任何 POSIX awk 均可:
$ cat tst.awk
{
key = $1
vals = $0
sub(/[^[:space:]]+[[:space:]]+/,"",vals)
if ( FNR == 1 ) {
nomatch[++fileNr] = vals
gsub(/[^[:space:]]+/,"Null",nomatch[fileNr])
}
}
NR==FNR {
file1[key] = vals
next
}
{
print $1, ($1 in file1 ? file1[$1] : nomatch[1]), vals
delete file1[$1]
}
END {
for ( key in file1 ) {
print key, file1[key], nomatch[2]
}
}
要测试它,请修改您的示例,向 db.txt
添加不在 app.txt
中具有共享键的行,并向 app.txt
添加额外的字段。
它假定每个文件中的每一行将具有与该文件的第一行相同数量的字段。如果不是这样,请添加任何您想要处理的逻辑。如果任一输入文件可能为空,那么您需要添加逻辑来定义输出中应该存在多少字段。它还假定键值在每个输入文件中是唯一的。
英文:
To do a full outer join using any POSIX awk would be:
$ cat tst.awk
{
key = $1
vals = $0
sub(/[^[:space:]]+[[:space:]]+/,"",vals)
if ( FNR == 1 ) {
nomatch[++fileNr] = vals
gsub(/[^[:space:]]+/,"Null",nomatch[fileNr])
}
}
NR==FNR {
file1[key] = vals
next
}
{
print $1, ($1 in file1 ? file1[$1] : nomatch[1]), vals
delete file1[$1]
}
END {
for ( key in file1 ) {
print key, file1[key], nomatch[2]
}
}
To test it modify your example to add lines to db.txt
that don't have a shared key in app.txt
and add additional fields to app.txt
.
It assumes that every line from each file will have the same number of fields as the first line from that file. If that's not true then add whatever logic you want to handle that. If either input file could be empty then you'd need to add logic to define how many fields should be present in the output. It also assumes that the key value is unique in each input file.
答案5
得分: 0
你可以使用 `join`
```bash
join -a1 app.txt db.txt
输出
5.13.0.12S 是的 WLPFO5_13_0_12S WLPFO 增量 2023-07-10_11:57:51 已激活 当前
5.13.0.11S 是的 WLPFO5_13_0_11S WLPFO 完整 2023-07-10_06:37:32 已激活 之前
5.13.0.24S 是的
5.13.0.23S 是的
5.13.0.22S 是的
英文:
You can use join
join -a1 app.txt db.txt
output
5.13.0.12S Yes WLPFO5_13_0_12S WLPFO incremental 2023-07-10_11:57:51 Activated Current
5.13.0.11S Yes WLPFO5_13_0_11S WLPFO full 2023-07-10_06:37:32 Activated Previous
5.13.0.24S Yes
5.13.0.23S Yes
5.13.0.22S Yes
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论