2020年1月6日 23:35:54go评论69阅读模式

英文:

How to ignore specific column when comparing two files using awk

问题

Output File:

4|soccer|play4
5|golf|play6

英文:

File1:

1|footbal|play1
2|cricket|play2
3|tennis|play3
5|golf|play5

File2:

1|footbal|play1
2|cricket|play2
3|tennis1|play3
4|soccer|play4
5|golf|play6

Output File:

4|soccer|play4
5|golf|play6

I am comparing all columns of file1 and file2, but I need to ignore the second column while comparing.

awk &#39;NR==FNR {exclude[$0];next} !($0 in exclude)&#39; file1 file2 &gt; file3

答案1

得分: 4

以下是翻译好的部分：

要忽略的部分：

$ awk -F\| &#39;
NR==FNR {           # 第一个文件
    $2=&quot;&quot;           # 清空不需要的字段，链接它们：$2=$3=...$n=&quot;&quot;
    a[$0]           # 在$0上创建哈希
    next            # 下一条记录
}
{                   # 第二个文件
    b=$0            # 备份记录到b
    $2=&quot;&quot;           # 清空相同的字段
    if(!($0 in a))  # 参考
        print b     # 输出备份
}' file file2

输出：

4|soccer|play4
5|golf|play6

当然，只有当NF &gt;&gt; 需要置空的字段数量时才有意义。在其他情况下，请使用其他解决方案。

英文:

To ignore:

$ awk -F\| &#39;
NR==FNR {           # first file
    $2=&quot;&quot;           # empty the unwanted fields, chain them: $2=$3=...$n=&quot;&quot;
    a[$0]           # hash on $0
    next            # next record
}
{                   # second file
    b=$0            # backup the record to b
    $2=&quot;&quot;           # empty the same field
    if(!($0 in a))  # refer
        print b     # output the backup
}&#39; file file2

Output:

4|soccer|play4
5|golf|play6

This of course makes sense only if NF >> amount of fields to null. In other case use the other solutions.

答案2

得分: 2

逗号,运算符在这里很方便，可以分别处理字段。如果您只想要第一个和第三个字段，您可以使用您已经有的相同模式：

$ awk -F\| 'NR==FNR {exclude[$1,$3];next} !(($1,$3) in exclude)' file1 file2
4|soccer|play4
5|golf|play6

英文:

The comma , operator is handy here to treat the fields individually. If you just want the first and third fields you can use the same pattern you already have:

$ awk -F\| &#39;NR==FNR {exclude[$1,$3];next} !(($1,$3) in exclude)&#39; file1 file2
4|soccer|play4
5|golf|play6

答案3

得分: 2

EDIT2(Generic solution): 要在两个输入文件中将超过1列设置为null，可以尝试以下方法。我已经为此创建了变量，因此您不需要在代码中硬编码要设置为null的字段。应该在awk程序的-v file1_ignore和file2_ignore变量中以逗号分隔的形式提到所有字段编号。

BEGIN{
  FS=OFS="|"
  num1=split(file1_ignore,array1,",")
  num2=split(file2_ignore,array2,",")
}
FNR==NR{
  for(i=1;i<=num1;i++){
    $array1[i]=""
  }
  a[$0]
  next
}
{
  val=$0
  for(i=1;i<=num2;i++){
    $array2[i]=""
  }
}
!($0 in a){
  print val
  val=""
}
' file1 file2

Explanation: 对上述代码添加详细的解释。

BEGIN{                                               ##从这里开始BEGIN部分。
  FS=OFS="|"                                         ##将字段分隔符和输出字段分隔符设置为|。
  num1=split(file1_ignore,array1,",")                ##使用逗号作为分隔符将file1_ignore变量拆分到array1中。
  num2=split(file2_ignore,array2,",")                ##使用逗号作为分隔符将file2_ignore变量拆分到array2中。
}                                                    ##关闭此代码的BEGIN块。
FNR==NR{                                             ##检查条件，对于第一个输入文件Input_file1，这将为TRUE。
  for(i=1;i<=num1;i++){                              ##从这里开始运行直到num1变量的for循环。
    $array1[i]=""                                    ##将字段（通过array1的值获取）设置为空。
  }                                                  ##关闭上述for循环块。
  a[$0]                                              ##创建一个以当前行为索引的数组。
  next                                               ##next将跳过从这里开始的所有进一步语句。
}                                                    ##关闭FNR==NR条件的BLOCK。
{
  val=$0                                             ##创建一个变量val，其值为当前行。
  for(i=1;i<=num2;i++){                              ##从这里开始运行直到num2变量的for循环。
    $array2[i]=""                                    ##将字段（通过array2的值获取）设置为空。
  }                                                  ##关闭上述for循环块。
}
!($0 in a){                                          ##检查条件，如果当前行不在数组a中，则继续运行后续语句。
  print val                                          ##打印变量val。
  val=""                                             ##将变量val设置为空。
}
' file1 file2                                        ##指定Input_file的名称。

英文:

EDIT2(Generic solution): To nullify more than 1 column(s) in both the Input_file(s) could try following. I have made variables for it so you need not to hard code fields which you want to nullify in your code. One should mention all field numbers separated with , in -v file1_ignore and file2_ignore variables of this awk program.

awk -v file1_ignore=&quot;2,3&quot; -v file2_ignore=&quot;2,3&quot; &#39;
BEGIN{
  FS=OFS=&quot;|&quot;
  num1=split(file1_ignore,array1,&quot;,&quot;)
  num2=split(file2_ignore,array2,&quot;,&quot;)
}
FNR==NR{
  for(i=1;i&lt;=num1;i++){
    $array1[i]=&quot;&quot;
  }
  a[$0]
  next
}
{
  val=$0
  for(i=1;i&lt;=num2;i++){
    $array2[i]=&quot;&quot;
  }
}
!($0 in a){
  print val
  val=&quot;&quot;
}
&#39; file1 file2

Explanation: Adding a detailed explanation for above code.

awk -v file1_ignore=&quot;2,3&quot; -v file2_ignore=&quot;2,3&quot; &#39;    ##Starting awk program from here and setting variables named file1_ignore(which will be used to ignoring fields in Input_file1), file2_ignore(which will be used to ignoring fields in Input_file2).
BEGIN{                                               ##Starting BEGIN section from here.
  FS=OFS=&quot;|&quot;                                         ##Setting field seaprator and output field separator as | here.
  num1=split(file1_ignore,array1,&quot;,&quot;)                ##Spitting file1_ignore variable to array1 here with separator as , here.
  num2=split(file2_ignore,array2,&quot;,&quot;)                ##Spitting file1_ignore variable to array2 here with separator as , here.
}                                                    ##Closing BEGIN BLOCK for this code here.
FNR==NR{                                             ##Checking condition which will be TRUE for first Input_file Input_file1 here.
  for(i=1;i&lt;=num1;i++){                              ##Starting for loop to run till variable num1 here.
    $array1[i]=&quot;&quot;                                    ##Nullifying field(which will be get by value of array1).
  }                                                  ##Closing above for loop BLOCK here.
  a[$0]                                              ##Creating an array with index of current line.
  next                                               ##next will skip all further statements from here.
}                                                    ##Closing BLOCK for FNR==NR condition here.
{
  val=$0                                             ##Creating a variable val whose value is current line.
  for(i=1;i&lt;=num2;i++){                              ##Starting for loop to run till variable num2 here.
    $array2[i]=&quot;&quot;                                    ##Nullifying field(which will be get by value of array2).
  }                                                  ##Closing above for loop BLOCK here.
}
!($0 in a){                                          ##Checking condition if current line is NOT present in array a then run futher statements.
  print val                                          ##Printing variable val here.
  val=&quot;&quot;                                             ##Nullify variable val here.
}
&#39; file1 file2                                        ##Mentioning Input_file(s) name here.

EDIT1: To ignore multiple and different columns in both the files try following, I have taken example of same column number 2 to be nullified in both files you could keep it as per your need too.

awk -v file1_ignore=&quot;2&quot; -v file2_ignore=&quot;2&quot; &#39;
BEGIN{
  FS=OFS=&quot;|&quot;
}
FNR==NR{
  $file1_ignore=&quot;&quot;
  a[$0]
  next
}
{
  val=$0
  $file2_ignore=&quot;&quot;
}
!($0 in a){
  print val
  val=&quot;&quot;
}
&#39; file1 file2

Could you please try following.

awk &#39;BEGIN{FS=&quot;|&quot;}FNR==NR{a[$1,$3];next} !(($1,$3) in a)&#39; file1 file2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用awk比较两个文件时忽略特定列

问题

答案1

答案2

答案3

在cherryserver上下载Go的问题

使用Go管理Java进程

actix-web错误接受连接：打开的文件太多（操作系统错误24）

linux command to transform 'VW_ABCD_EF_GHIJ_KLM_…' into 'AbcdEfGhjKlm…'

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论