英文:
Use awk to compare the differences between two files when a column matches
问题
Sure, here's the code to achieve the desired outcome:
awk 'NR==FNR{a[$1,$2]=$4;next} ($1,$2) in a{print $1,$2,$3,a[$1,$2]-$4}' file1.txt file2.txt > file3.txt
This awk
command does the following:
NR==FNR
is a condition that is true only for the first file (file1.txt
in this case).- For the first file, it creates an associative array
a
where the key is a combination of the first and second columns (FirstName and LastName) and the value is the BudgetSpent from that row. next
is used to skip processing the rest of the code for the first file and move on to the second file (file2.txt
).- For the second file, it checks if the combination of FirstName and LastName exists in the
a
array (i.e., it matches between the two files). If it does, it calculates the BudgetSpent difference and prints the desired columns tofile3.txt
.
The resulting file3.txt
will contain the FirstName, LastName, Age, and BudgetSpentDifferences as specified in your example.
英文:
I have two txt files that I want to compare. I want to calculate the BudgetSpent between the two files if and only if the first column (FirstName) and second column (LastName) in the first txt file matches the second file's first and second column.
So basically if file1.txt
's Firstname
== file2.txt
's Firstname
&& file1.txt
's LastName
== file2.txt
's LastName
, I want it to generate a third file whereby it generates all the columns that matched, and one extra column which would be the BudgetSpentDifferences
.
file1.txt
FirstName LastName Age BudgetSpent
Taylor Swift 23 1000
Joe Smith 30 5000
file2.txt
FirstName LastName Age BudgetSpent
Joe Smith 30 5200
Taylor Swift 23 800
Joe Johnson 22 100
This is how I want the third file generated to be:
file3.txt
FirstName LastName Age BudgetSpentDifferences
Taylor Swift 23 200
Joe Smith 30 -200
I am really confused as I am quite new to awk
. Please help me out I've been scrambling my head thinking about how to generate this. Not sure if there's an if condition that can input two files in awk
.
Thanks,
答案1
得分: 1
以下是您要的代码翻译部分:
awk 'NR==FNR && NR>1 { a[$1 " " $2] = $4; next}
NR!=FNR && FNR==1 { $4 = "BudgetSpent"; print }
NR!=FNR && FNR>1 && ($1 " " $2) in a { $4 -= a[$1 " " $2]; print}
' file1.txt file2.txt
输出结果如下:
FirstName LastName Age BudgetSpent
Joe Smith 30 200
Taylor Swift 23 -200
解释:
NR == FNR
对于第一个文件为真。NR > 1
对于第二个及之后的行为真。FNR == 1
对于每个文件的第一行为真。a[$1 " " $2] = $4
创建一个关联数组a
,将组合FirstName LastName
(用单个空格分隔)作为键,BudgetSpent
作为值。($1 " " $2) in a
如果组合FirstName LastName
是a
中的有效键,则为真。$4 -= a[$1 " " $2]
从字段4(BudgetSpent
)中减去从a
中取得的与键FirstName LastName
相关的值。
请注意,如果FirstName
或LastName
包含空格,则此脚本将无法正常工作。除了使用其他字符(例如TAB)作为字段分隔符外,此情况下数组键可能会引起歧义。
英文:
awk 'NR==FNR && NR>1 { a[$1 " " $2] = $4; next}
NR!=FNR && FNR==1 { $4 = "BudgetSpent"; print }
NR!=FNR && FNR>1 && ($1 " " $2) in a { $4 -= a[$1 " " $2]; print}
' file1.txt file2.txt
prints
FirstName LastName Age BudgetSpent
Joe Smith 30 200
Taylor Swift 23 -200
Explanation:
NR == FNR
is true for the first fileNR > 1
is true for the second and later lines totalFNR == 1
is true for the first line of every filea[$1 " " $2] = $4
creates an entry in an associative arraya
with the combinationFirstName LastName
(separated by a single space) as key andBudgetSpent
as value($1 " " $2) in a
is true if the combinationFirstName LastName
is a valid key ina
$4 -= a[$1 " " $2]
subtracts the value retrieved froma
with keyFirstName LastName
from the value of field 4 (BudgetSpent
)
Note that the script will not work if FirstName
or LastName
contains spaces. Apart from using some other character (e.g. TAB) as field separator, the array key might be ambiguous in this case.
There are nearly always different / simpler / shorter ways to solve a problem.
Citing jhnc's comment:
> NR>1
and NR!=FNR
conditions are unnecessary.
If the condition NR>1
is omitted, the array a
will also contain an entry based on the heading which should not be a problem. As the first condition then covers all lines of the first file and the action contains next
, the second and third line of the script will be executed only for the second file (and later), so the condition NR!=FNR
can be removed.
> You can reduce typing by assigning composite key to a variable: {u=$1" "$2} NR==FNR{a[u]=$4;next} u in a{ $4 = FNR==1?"Diff":$4-a[u]; print }
答案2
得分: 1
For GNU awk use a multi-demensional array for the name index:
awk 'NR==1{print $0"Differences"; next}
NR==FNR{a[$1][$2]=$4; next}
$1 in a && $2 in a[$1]{$4-=a[$1][$2]; print}' file1 file2
For POSIX awk, use spaces to index "FirstName LastName":
awk 'NR==1{print $0"Differences";next}
NR==FNR{a[$1" "$2]=$4;next}
$1" "$2 in a{$4-=a[$1" "$2];print}' file1 file2
For POSIX awk: If input names may contain spaces (but no tabs), use -F'\t'
to set input field separator to tab and change " "
to "\t"
:
NR==FNR{a[$1"\t"$2]=$4;next}
$1"\t"$2 in a{$4-=a[$1"\t"$2];print}' file1 file2
Result:
FirstName LastName Age BudgetSpentDifferences
Joe Smith 30 200
Taylor Swift 23 -200
- Line 1: Print "FirstName LastName Age BudgetSpentDifferences".
- Line 2: Create a two-dimensional index in the form
array[FirstName][LastName]=BudgetSpent
from file1. - Line 3: In case of an index match in file2, subtract the Budgets and print the results.
英文:
For GNU awk use a multi-demensional array for the name index:
awk 'NR==1{print $0"Differences"; next}
NR==FNR{a[$1][$2]=$4; next}
$1 in a && $2 in a[$1]{$4-=a[$1][$2]; print}' file1 file2
For POSIX awk, use spaces to index "FirstName LastName":
awk 'NR==1{print $0"Differences";next}
NR==FNR{a[$1" "$2]=$4;next}
$1" "$2 in a{$4-=a[$1 " " $2]; print}' file1 file2
For POSIX awk: If input names may contain spaces (but no tabs), use -F'\t'
to set input field separator to tab and change " "
to "\t"
:
awk -F'\t' 'NR==1{print $0"Differences";next}
NR==FNR{a[$1"\t"$2]=$4;next}
$1"\t"$2 in a{$4-=a[$1"\t"$2]; print}' file1 file2
Result:
FirstName LastName Age BudgetSpentDifferences
Joe Smith 30 200
Taylor Swift 23 -200
- Line 1: Print "FirstName LastName Age BudgetSpentDifferences".
- Line 2: Create a two-dimensional index in the form
array[FirstName][LastName]=BudgetSpent
from file1. - Line 3: In case of an index match in file2, subtract the Budgets and print the results.
答案3
得分: 0
to build upon @steffen
's, solution, here's one for all awk
s :
awk 'BEGIN { ____ = (_+= __ = _^= SUBSEP = FS )^_
} __ == NR && sub("$", "差异") ||
FNR == NR ? (___[$__, $_] = $____)&&"" : \
($__, $_) in ___ && ($____ -= ___[$__, $_])^!_ ' f1 f2
英文:
to build upon @steffen
's, solution, here's one for all awk
s :
awk 'BEGIN { ____ = (_+= __ = _^= SUBSEP = FS )^_
} __ == NR && sub("$", "Differences") ||
FNR == NR ? (___[$__, $_] = $____)<"" : \
($__, $_) in ___ && ($____ -= ___[$__, $_])^!_' f1 f2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论