英文:
Use awk to compare the differences between two files when a column matches
问题
Sure, here's the code to achieve the desired outcome:
awk 'NR==FNR{a[$1,$2]=$4;next} ($1,$2) in a{print $1,$2,$3,a[$1,$2]-$4}' file1.txt file2.txt > file3.txt
This awk command does the following:
NR==FNRis a condition that is true only for the first file (file1.txtin this case).- For the first file, it creates an associative array
awhere the key is a combination of the first and second columns (FirstName and LastName) and the value is the BudgetSpent from that row. nextis used to skip processing the rest of the code for the first file and move on to the second file (file2.txt).- For the second file, it checks if the combination of FirstName and LastName exists in the
aarray (i.e., it matches between the two files). If it does, it calculates the BudgetSpent difference and prints the desired columns tofile3.txt.
The resulting file3.txt will contain the FirstName, LastName, Age, and BudgetSpentDifferences as specified in your example.
英文:
I have two txt files that I want to compare. I want to calculate the BudgetSpent between the two files if and only if the first column (FirstName) and second column (LastName) in the first txt file matches the second file's first and second column.
So basically if file1.txt's Firstname == file2.txt's Firstname && file1.txt's LastName== file2.txt's LastName, I want it to generate a third file whereby it generates all the columns that matched, and one extra column which would be the BudgetSpentDifferences.
file1.txt
FirstName LastName Age BudgetSpent
Taylor Swift 23 1000
Joe Smith 30 5000
file2.txt
FirstName LastName Age BudgetSpent
Joe Smith 30 5200
Taylor Swift 23 800
Joe Johnson 22 100
This is how I want the third file generated to be:
file3.txt
FirstName LastName Age BudgetSpentDifferences
Taylor Swift 23 200
Joe Smith 30 -200
I am really confused as I am quite new to awk. Please help me out
I've been scrambling my head thinking about how to generate this. Not sure if there's an if condition that can input two files in awk.
Thanks,
答案1
得分: 1
以下是您要的代码翻译部分:
awk 'NR==FNR && NR>1 { a[$1 " " $2] = $4; next}
NR!=FNR && FNR==1 { $4 = "BudgetSpent"; print }
NR!=FNR && FNR>1 && ($1 " " $2) in a { $4 -= a[$1 " " $2]; print}
' file1.txt file2.txt
输出结果如下:
FirstName LastName Age BudgetSpent
Joe Smith 30 200
Taylor Swift 23 -200
解释:
NR == FNR对于第一个文件为真。NR > 1对于第二个及之后的行为真。FNR == 1对于每个文件的第一行为真。a[$1 " " $2] = $4创建一个关联数组a,将组合FirstName LastName(用单个空格分隔)作为键,BudgetSpent作为值。($1 " " $2) in a如果组合FirstName LastName是a中的有效键,则为真。$4 -= a[$1 " " $2]从字段4(BudgetSpent)中减去从a中取得的与键FirstName LastName相关的值。
请注意,如果FirstName或LastName包含空格,则此脚本将无法正常工作。除了使用其他字符(例如TAB)作为字段分隔符外,此情况下数组键可能会引起歧义。
英文:
awk 'NR==FNR && NR>1 { a[$1 " " $2] = $4; next}
NR!=FNR && FNR==1 { $4 = "BudgetSpent"; print }
NR!=FNR && FNR>1 && ($1 " " $2) in a { $4 -= a[$1 " " $2]; print}
' file1.txt file2.txt
prints
FirstName LastName Age BudgetSpent
Joe Smith 30 200
Taylor Swift 23 -200
Explanation:
NR == FNRis true for the first fileNR > 1is true for the second and later lines totalFNR == 1is true for the first line of every filea[$1 " " $2] = $4creates an entry in an associative arrayawith the combinationFirstName LastName(separated by a single space) as key andBudgetSpentas value($1 " " $2) in ais true if the combinationFirstName LastNameis a valid key ina$4 -= a[$1 " " $2]subtracts the value retrieved fromawith keyFirstName LastNamefrom the value of field 4 (BudgetSpent)
Note that the script will not work if FirstName or LastName contains spaces. Apart from using some other character (e.g. TAB) as field separator, the array key might be ambiguous in this case.
There are nearly always different / simpler / shorter ways to solve a problem.
Citing jhnc's comment:
> NR>1 and NR!=FNR conditions are unnecessary.
If the condition NR>1 is omitted, the array a will also contain an entry based on the heading which should not be a problem. As the first condition then covers all lines of the first file and the action contains next, the second and third line of the script will be executed only for the second file (and later), so the condition NR!=FNR can be removed.
> You can reduce typing by assigning composite key to a variable: {u=$1" "$2} NR==FNR{a[u]=$4;next} u in a{ $4 = FNR==1?"Diff":$4-a[u]; print }
答案2
得分: 1
For GNU awk use a multi-demensional array for the name index:
awk 'NR==1{print $0"Differences"; next}
NR==FNR{a[$1][$2]=$4; next}
$1 in a && $2 in a[$1]{$4-=a[$1][$2]; print}' file1 file2
For POSIX awk, use spaces to index "FirstName LastName":
awk 'NR==1{print $0"Differences";next}
NR==FNR{a[$1" "$2]=$4;next}
$1" "$2 in a{$4-=a[$1" "$2];print}' file1 file2
For POSIX awk: If input names may contain spaces (but no tabs), use -F'\t' to set input field separator to tab and change " " to "\t":
NR==FNR{a[$1"\t"$2]=$4;next}
$1"\t"$2 in a{$4-=a[$1"\t"$2];print}' file1 file2
Result:
FirstName LastName Age BudgetSpentDifferences
Joe Smith 30 200
Taylor Swift 23 -200
- Line 1: Print "FirstName LastName Age BudgetSpentDifferences".
- Line 2: Create a two-dimensional index in the form
array[FirstName][LastName]=BudgetSpentfrom file1. - Line 3: In case of an index match in file2, subtract the Budgets and print the results.
英文:
For GNU awk use a multi-demensional array for the name index:
awk 'NR==1{print $0"Differences"; next}
NR==FNR{a[$1][$2]=$4; next}
$1 in a && $2 in a[$1]{$4-=a[$1][$2]; print}' file1 file2
For POSIX awk, use spaces to index "FirstName LastName":
awk 'NR==1{print $0"Differences";next}
NR==FNR{a[$1" "$2]=$4;next}
$1" "$2 in a{$4-=a[$1 " " $2]; print}' file1 file2
For POSIX awk: If input names may contain spaces (but no tabs), use -F'\t' to set input field separator to tab and change " " to "\t":
awk -F'\t' 'NR==1{print $0"Differences";next}
NR==FNR{a[$1"\t"$2]=$4;next}
$1"\t"$2 in a{$4-=a[$1"\t"$2]; print}' file1 file2
Result:
FirstName LastName Age BudgetSpentDifferences
Joe Smith 30 200
Taylor Swift 23 -200
- Line 1: Print "FirstName LastName Age BudgetSpentDifferences".
- Line 2: Create a two-dimensional index in the form
array[FirstName][LastName]=BudgetSpentfrom file1. - Line 3: In case of an index match in file2, subtract the Budgets and print the results.
答案3
得分: 0
to build upon @steffen's, solution, here's one for all awks :
awk 'BEGIN { ____ = (_+= __ = _^= SUBSEP = FS )^_
} __ == NR && sub("$", "差异") ||
FNR == NR ? (___[$__, $_] = $____)&&"" : \
($__, $_) in ___ && ($____ -= ___[$__, $_])^!_ ' f1 f2
英文:
to build upon @steffen's, solution, here's one for all awks :
awk 'BEGIN { ____ = (_+= __ = _^= SUBSEP = FS )^_
} __ == NR && sub("$", "Differences") ||
FNR == NR ? (___[$__, $_] = $____)<"" : \
($__, $_) in ___ && ($____ -= ___[$__, $_])^!_' f1 f2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论