使用awk比较两个文件的差异,当某一列匹配时

huangapple go评论56阅读模式
英文:

Use awk to compare the differences between two files when a column matches

问题

Sure, here's the code to achieve the desired outcome:

awk 'NR==FNR{a[$1,$2]=$4;next} ($1,$2) in a{print $1,$2,$3,a[$1,$2]-$4}' file1.txt file2.txt > file3.txt

This awk command does the following:

  1. NR==FNR is a condition that is true only for the first file (file1.txt in this case).
  2. For the first file, it creates an associative array a where the key is a combination of the first and second columns (FirstName and LastName) and the value is the BudgetSpent from that row.
  3. next is used to skip processing the rest of the code for the first file and move on to the second file (file2.txt).
  4. For the second file, it checks if the combination of FirstName and LastName exists in the a array (i.e., it matches between the two files). If it does, it calculates the BudgetSpent difference and prints the desired columns to file3.txt.

The resulting file3.txt will contain the FirstName, LastName, Age, and BudgetSpentDifferences as specified in your example.

英文:

I have two txt files that I want to compare. I want to calculate the BudgetSpent between the two files if and only if the first column (FirstName) and second column (LastName) in the first txt file matches the second file's first and second column.

So basically if file1.txt's Firstname == file2.txt's Firstname && file1.txt's LastName== file2.txt's LastName, I want it to generate a third file whereby it generates all the columns that matched, and one extra column which would be the BudgetSpentDifferences.

file1.txt

FirstName LastName Age BudgetSpent
Taylor Swift 23 1000
Joe Smith 30 5000

file2.txt

FirstName LastName Age BudgetSpent
Joe Smith 30 5200
Taylor Swift 23 800
Joe Johnson 22 100

This is how I want the third file generated to be:
file3.txt

FirstName LastName Age BudgetSpentDifferences
Taylor Swift 23 200
Joe Smith 30 -200

I am really confused as I am quite new to awk. Please help me out 使用awk比较两个文件的差异,当某一列匹配时 I've been scrambling my head thinking about how to generate this. Not sure if there's an if condition that can input two files in awk.

Thanks,

答案1

得分: 1

以下是您要的代码翻译部分:

awk 'NR==FNR && NR>1 { a[$1 " " $2] = $4; next}
NR!=FNR && FNR==1 { $4 = "BudgetSpent"; print }
NR!=FNR && FNR>1 && ($1 " " $2) in a { $4 -=  a[$1 " " $2]; print}
' file1.txt file2.txt

输出结果如下:

FirstName LastName Age BudgetSpent
Joe Smith 30 200
Taylor Swift 23 -200

解释:

  • NR == FNR 对于第一个文件为真。
  • NR > 1 对于第二个及之后的行为真。
  • FNR == 1 对于每个文件的第一行为真。
  • a[$1 " " $2] = $4 创建一个关联数组 a,将组合 FirstName LastName(用单个空格分隔)作为键,BudgetSpent 作为值。
  • ($1 " " $2) in a 如果组合 FirstName LastNamea 中的有效键,则为真。
  • $4 -= a[$1 " " $2] 从字段4(BudgetSpent)中减去从 a 中取得的与键 FirstName LastName 相关的值。

请注意,如果FirstNameLastName包含空格,则此脚本将无法正常工作。除了使用其他字符(例如TAB)作为字段分隔符外,此情况下数组键可能会引起歧义。

英文:
awk 'NR==FNR && NR>1 { a[$1 " " $2] = $4; next}
NR!=FNR && FNR==1 { $4 = "BudgetSpent"; print }
NR!=FNR && FNR>1 && ($1 " " $2) in a { $4 -=  a[$1 " " $2]; print}
' file1.txt file2.txt

prints

FirstName LastName Age BudgetSpent
Joe Smith 30 200
Taylor Swift 23 -200

Explanation:

  • NR == FNR is true for the first file
  • NR > 1 is true for the second and later lines total
  • FNR == 1 is true for the first line of every file
  • a[$1 " " $2] = $4 creates an entry in an associative array a with the combination FirstName LastName (separated by a single space) as key and BudgetSpent as value
  • ($1 " " $2) in a is true if the combination FirstName LastName is a valid key in a
  • $4 -= a[$1 " " $2] subtracts the value retrieved from a with key FirstName LastName from the value of field 4 (BudgetSpent)

Note that the script will not work if FirstName or LastName contains spaces. Apart from using some other character (e.g. TAB) as field separator, the array key might be ambiguous in this case.


There are nearly always different / simpler / shorter ways to solve a problem.

Citing jhnc's comment:

> NR>1 and NR!=FNR conditions are unnecessary.

If the condition NR>1 is omitted, the array a will also contain an entry based on the heading which should not be a problem. As the first condition then covers all lines of the first file and the action contains next, the second and third line of the script will be executed only for the second file (and later), so the condition NR!=FNR can be removed.

> You can reduce typing by assigning composite key to a variable: {u=$1" "$2} NR==FNR{a[u]=$4;next} u in a{ $4 = FNR==1?"Diff":$4-a[u]; print }

答案2

得分: 1

For GNU awk use a multi-demensional array for the name index:

awk 'NR==1{print $0"Differences"; next}
     NR==FNR{a[$1][$2]=$4; next}
     $1 in a && $2 in a[$1]{$4-=a[$1][$2]; print}' file1 file2

For POSIX awk, use spaces to index "FirstName LastName":

awk 'NR==1{print $0"Differences";next}
     NR==FNR{a[$1" "$2]=$4;next}
     $1" "$2 in a{$4-=a[$1" "$2];print}' file1 file2

For POSIX awk: If input names may contain spaces (but no tabs), use -F'\t' to set input field separator to tab and change " " to "\t":

   NR==FNR{a[$1"\t"$2]=$4;next}
   $1"\t"$2 in a{$4-=a[$1"\t"$2];print}' file1 file2

Result:

FirstName LastName Age BudgetSpentDifferences
Joe Smith 30 200
Taylor Swift 23 -200
  • Line 1: Print "FirstName LastName Age BudgetSpentDifferences".
  • Line 2: Create a two-dimensional index in the form array[FirstName][LastName]=BudgetSpent from file1.
  • Line 3: In case of an index match in file2, subtract the Budgets and print the results.
英文:

For GNU awk use a multi-demensional array for the name index:

awk 'NR==1{print $0"Differences"; next}
     NR==FNR{a[$1][$2]=$4; next}
     $1 in a && $2 in a[$1]{$4-=a[$1][$2]; print}' file1 file2

For POSIX awk, use spaces to index "FirstName LastName":

awk 'NR==1{print $0"Differences";next}
     NR==FNR{a[$1" "$2]=$4;next}
     $1" "$2 in a{$4-=a[$1 " " $2]; print}' file1 file2

For POSIX awk: If input names may contain spaces (but no tabs), use -F'\t' to set input field separator to tab and change " " to "\t":

awk -F'\t' 'NR==1{print $0"Differences";next}
   NR==FNR{a[$1"\t"$2]=$4;next}
   $1"\t"$2 in a{$4-=a[$1"\t"$2]; print}' file1 file2

Result:

FirstName LastName Age BudgetSpentDifferences
Joe Smith 30 200
Taylor Swift 23 -200
  • Line 1: Print "FirstName LastName Age BudgetSpentDifferences".
  • Line 2: Create a two-dimensional index in the form array[FirstName][LastName]=BudgetSpent from file1.
  • Line 3: In case of an index match in file2, subtract the Budgets and print the results.

答案3

得分: 0

to build upon @steffen's, solution, here's one for all awks :

awk 'BEGIN { ____ = (_+= __ = _^= SUBSEP = FS )^_ 
     } __ == NR && sub("$", "差异")  ||
      FNR == NR ? (___[$__, $_] = $____)&&"" : \
     ($__, $_) in ___ && ($____ -= ___[$__, $_])^!_ ' f1 f2
英文:

to build upon @steffen's, solution, here's one for all awks :

awk 'BEGIN { ____ = (_+= __ = _^= SUBSEP = FS )^_ 
     } __ == NR && sub("$", "Differences")  ||
      FNR == NR ? (___[$__, $_] = $____)<"" : \
     ($__, $_) in ___ && ($____ -= ___[$__, $_])^!_' f1 f2

huangapple
  • 本文由 发表于 2023年5月31日 23:41:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76375223.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定