2023年5月31日 23:41:25go评论88阅读模式

英文:

Use awk to compare the differences between two files when a column matches

问题

Sure, here's the code to achieve the desired outcome:

awk 'NR==FNR{a[$1,$2]=$4;next} ($1,$2) in a{print $1,$2,$3,a[$1,$2]-$4}' file1.txt file2.txt > file3.txt

This awk command does the following:

NR==FNR is a condition that is true only for the first file (file1.txt in this case).
For the first file, it creates an associative array a where the key is a combination of the first and second columns (FirstName and LastName) and the value is the BudgetSpent from that row.
next is used to skip processing the rest of the code for the first file and move on to the second file (file2.txt).
For the second file, it checks if the combination of FirstName and LastName exists in the a array (i.e., it matches between the two files). If it does, it calculates the BudgetSpent difference and prints the desired columns to file3.txt.

The resulting file3.txt will contain the FirstName, LastName, Age, and BudgetSpentDifferences as specified in your example.

英文:

I have two txt files that I want to compare. I want to calculate the BudgetSpent between the two files if and only if the first column (FirstName) and second column (LastName) in the first txt file matches the second file's first and second column.

So basically if file1.txt's Firstname == file2.txt's Firstname && file1.txt's LastName== file2.txt's LastName, I want it to generate a third file whereby it generates all the columns that matched, and one extra column which would be the BudgetSpentDifferences.

file1.txt

FirstName LastName Age BudgetSpent
Taylor Swift 23 1000
Joe Smith 30 5000

file2.txt

FirstName LastName Age BudgetSpent
Joe Smith 30 5200
Taylor Swift 23 800
Joe Johnson 22 100

This is how I want the third file generated to be:
file3.txt

FirstName LastName Age BudgetSpentDifferences
Taylor Swift 23 200
Joe Smith 30 -200

I am really confused as I am quite new to awk. Please help me out I've been scrambling my head thinking about how to generate this. Not sure if there's an if condition that can input two files in awk.

Thanks,

答案1

得分: 1

以下是您要的代码翻译部分：

awk 'NR==FNR && NR>1 { a[$1 " " $2] = $4; next}
NR!=FNR && FNR==1 { $4 = "BudgetSpent"; print }
NR!=FNR && FNR>1 && ($1 " " $2) in a { $4 -=  a[$1 " " $2]; print}
' file1.txt file2.txt

输出结果如下：

FirstName LastName Age BudgetSpent
Joe Smith 30 200
Taylor Swift 23 -200

解释：

NR == FNR 对于第一个文件为真。
NR > 1 对于第二个及之后的行为真。
FNR == 1 对于每个文件的第一行为真。
a[$1 " " $2] = $4 创建一个关联数组 a，将组合 FirstName LastName（用单个空格分隔）作为键，BudgetSpent 作为值。
($1 " " $2) in a 如果组合 FirstName LastName 是 a 中的有效键，则为真。
$4 -= a[$1 " " $2] 从字段4（BudgetSpent）中减去从 a 中取得的与键 FirstName LastName 相关的值。

请注意，如果FirstName或LastName包含空格，则此脚本将无法正常工作。除了使用其他字符（例如TAB）作为字段分隔符外，此情况下数组键可能会引起歧义。

英文:

awk &#39;NR==FNR &amp;&amp; NR&gt;1 { a[$1 &quot; &quot; $2] = $4; next}
NR!=FNR &amp;&amp; FNR==1 { $4 = &quot;BudgetSpent&quot;; print }
NR!=FNR &amp;&amp; FNR&gt;1 &amp;&amp; ($1 &quot; &quot; $2) in a { $4 -=  a[$1 &quot; &quot; $2]; print}
&#39; file1.txt file2.txt

prints

FirstName LastName Age BudgetSpent
Joe Smith 30 200
Taylor Swift 23 -200

Explanation:

NR == FNR is true for the first file
NR > 1 is true for the second and later lines total
FNR == 1 is true for the first line of every file
a[$1 " " $2] = $4 creates an entry in an associative array a with the combination FirstName LastName (separated by a single space) as key and BudgetSpent as value
($1 " " $2) in a is true if the combination FirstName LastName is a valid key in a
$4 -= a[$1 " " $2] subtracts the value retrieved from a with key FirstName LastName from the value of field 4 (BudgetSpent)

Note that the script will not work if FirstName or LastName contains spaces. Apart from using some other character (e.g. TAB) as field separator, the array key might be ambiguous in this case.

There are nearly always different / simpler / shorter ways to solve a problem.

Citing jhnc's comment:

> NR>1 and NR!=FNR conditions are unnecessary.

If the condition NR>1 is omitted, the array a will also contain an entry based on the heading which should not be a problem. As the first condition then covers all lines of the first file and the action contains next, the second and third line of the script will be executed only for the second file (and later), so the condition NR!=FNR can be removed.

> You can reduce typing by assigning composite key to a variable: {u=$1" "$2} NR==FNR{a[u]=$4;next} u in a{ $4 = FNR==1?"Diff":$4-a[u]; print }

答案2

得分: 1

For GNU awk use a multi-demensional array for the name index:

awk 'NR==1{print $0"Differences"; next}
     NR==FNR{a[$1][$2]=$4; next}
     $1 in a && $2 in a[$1]{$4-=a[$1][$2]; print}' file1 file2

For POSIX awk, use spaces to index "FirstName LastName":

awk 'NR==1{print $0"Differences";next}
     NR==FNR{a[$1" "$2]=$4;next}
     $1" "$2 in a{$4-=a[$1" "$2];print}' file1 file2

For POSIX awk: If input names may contain spaces (but no tabs), use -F'\t' to set input field separator to tab and change " " to "\t":

   NR==FNR{a[$1"\t"$2]=$4;next}
   $1"\t"$2 in a{$4-=a[$1"\t"$2];print}' file1 file2

Result:

FirstName LastName Age BudgetSpentDifferences
Joe Smith 30 200
Taylor Swift 23 -200

Line 1: Print "FirstName LastName Age BudgetSpentDifferences".
Line 2: Create a two-dimensional index in the form array[FirstName][LastName]=BudgetSpent from file1.
Line 3: In case of an index match in file2, subtract the Budgets and print the results.

英文:

For GNU awk use a multi-demensional array for the name index:

awk &#39;NR==1{print $0&quot;Differences&quot;; next}
     NR==FNR{a[$1][$2]=$4; next}
     $1 in a &amp;&amp; $2 in a[$1]{$4-=a[$1][$2]; print}&#39; file1 file2

For POSIX awk, use spaces to index "FirstName LastName":

awk &#39;NR==1{print $0&quot;Differences&quot;;next}
     NR==FNR{a[$1&quot; &quot;$2]=$4;next}
     $1&quot; &quot;$2 in a{$4-=a[$1 &quot; &quot; $2]; print}&#39; file1 file2

For POSIX awk: If input names may contain spaces (but no tabs), use -F'\t' to set input field separator to tab and change " " to "\t":

awk -F&#39;\t&#39; &#39;NR==1{print $0&quot;Differences&quot;;next}
   NR==FNR{a[$1&quot;\t&quot;$2]=$4;next}
   $1&quot;\t&quot;$2 in a{$4-=a[$1&quot;\t&quot;$2]; print}&#39; file1 file2

Result:

FirstName LastName Age BudgetSpentDifferences
Joe Smith 30 200
Taylor Swift 23 -200

Line 1: Print "FirstName LastName Age BudgetSpentDifferences".
Line 2: Create a two-dimensional index in the form array[FirstName][LastName]=BudgetSpent from file1.
Line 3: In case of an index match in file2, subtract the Budgets and print the results.

答案3

得分: 0

to build upon @steffen's, solution, here's one for all awks :

awk 'BEGIN { ____ = (_+= __ = _^= SUBSEP = FS )^_ 
     } __ == NR && sub("$", "差异")  ||
      FNR == NR ? (___[$__, $_] = $____)&&"" : \
     ($__, $_) in ___ && ($____ -= ___[$__, $_])^!_ ' f1 f2

英文:

to build upon @steffen's, solution, here's one for all awks :

awk &#39;BEGIN { ____ = (_+= __ = _^= SUBSEP = FS )^_ 
     } __ == NR &amp;&amp; sub(&quot;$&quot;, &quot;Differences&quot;)  ||
      FNR == NR ? (___[$__, $_] = $____)&lt;&quot;&quot; : \
     ($__, $_) in ___ &amp;&amp; ($____ -= ___[$__, $_])^!_&#39; f1 f2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用awk比较两个文件的差异，当某一列匹配时

问题

答案1

答案2

答案3

在使用sed添加新行时，在单词前插入制表符和空格。

获取表中某列上的行的迭代次数和总计数。

如何对多级索引（列）的数据框进行排序？

将”fzf to return combination of input fields”翻译为中文： “fzf 返回输入字段的组合”

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。