如何在Unix中根据查找文件来识别分隔文件中的行。

huangapple go评论48阅读模式
英文:

How can I identify lines from a delimited file, based on a lookup file in unix

问题

我试图执行以下操作:

#!/bin/sh
lookupFile=$1  #lookup.txt
inputFile=$2   #input.txt
outputFile=$3  #output.txt
while IFS= read -r line
  do
  awk -F'~' '{if ($3==$line) print >> $outputFile}' $inputFile
 done < "$lookupFile"

但是我遇到了错误,如下所示:

awk: cmd. line:1: (FILENAME=input.txt FNR=2) fatal: can't redirect to

我怎么解决这个问题?另外,如果文件非常庞大,有数千条记录需要搜索,这是否是一种有效的方法?

英文:

Assume that there are two files
File1 - lookup.txt

CAN
USD
INR
EUR

Another file Input.txt

1~Canada~CAN
2~United States of America~USD
3~Brazil~BRL

Both files may be very huge, hypothetically several thousand of records . Now I'm trying to identify the records in Input.txt and identify them based on values in lookup file.

The expected output should be

1~Canada~CAN
2~United States of America~USD

I tried to do something like below

#!/bin/sh
 lookupFile=$1  #lookup.txt
 inputFile=$2   #input.txt
 outputFile=$3  #output.txt
 while IFS= read -r line
  do
  awk -F&#39;~&#39; &#39;{if ($3==$line) print &gt;&gt; $outputFile}&#39;  $inputFile
 done &lt; &quot;$lookupFile&quot;

But I'm getting error like
awk: cmd. line:1: (FILENAME=input.txt FNR=2) fatal: can&#39;t redirect to

How can I fix this issue ? Also if the files really huge, with several thousand of records to search, is this an efficient way ?

答案1

得分: 3

使用您提供的示例,请尝试以下awk代码。我们可以在单个awk中执行此操作,需要在读取input.txt之前设置字段分隔符为~

awk 'FNR==NR{arr[$0];next} ($3 in arr)' lookup.txt FS="~" input.txt

解释:

awk '                          ##从这里开始awk程序。
FNR==NR{                       ##检查条件,当读取lookup.txt时为TRUE。
  arr[$0]                      ##创建以$0为索引的数组arr。
  next                         ##next用于跳过从此处开始的所有后续语句。
}
($3 in arr)                    ##如果$3在arr中存在,则打印该行。
' lookup.txt FS="~" input.txt  ##指定输入文件并在input.txt之前设置FS为~
英文:

With your shown samples please try following awk code. We could do this in single awk we need to take care of setting field separator as ~ before input.txt.

awk &#39;FNR==NR{arr[$0];next} ($3 in arr)&#39; lookup.txt FS=&quot;~&quot; input.txt

Explanation:

awk &#39;                          ##starting awk program from here.
FNR==NR{                       ##Checking condition which will be TRUE when lookup.txt is being read.
  arr[$0]                      ##Creating array arr with $0 as index.
  next                         ##next to skip all further statements from here.
}
($3 in arr)                    ##If $3 is present in arr then print that line.
&#39; lookup.txt FS=&quot;~&quot; input.txt  ##Mentioning Input_files and setting FS to ~ before input.txt

答案2

得分: 0

以下是代码部分的翻译:

$ grep -wFf lookup.txt input.txt
1~Canada~CAN
2~United States of America~USD

警告:这种方法不仅匹配最后一个单词。因此,如果lookup.txt中的一些值也可以在input.txt的其他地方找到,最好选择另一个解决方案。或者,如果它不包含可以被解释为正则表达式操作符的内容,请在使用grep之前对lookup.txt进行预处理。以下是使用bash、sed和grep的示例:

$ grep -f <( sed 's/.*/~&amp;$/&#39; lookup.txt ) input.txt
1~Canada~CAN
2~United States of America~USD
英文:

A non-awk solution that you could compare with on the performance point of view:

$ grep -wFf lookup.txt input.txt
1~Canada~CAN
2~United States of America~USD

Warning: this does not match only on the last word. So if some values in lookup.txt can also be found elsewhere in input.txt, prefer another solution. Or, if it contains nothing that could be interpreted as a regular expression operator, preprocess lookup.txt before grep. Example with bash, sed and grep:

$ grep -f &lt;( sed &#39;s/.*/~&amp;$/&#39; lookup.txt ) input.txt
1~Canada~CAN
2~United States of America~USD

huangapple
  • 本文由 发表于 2023年2月10日 11:11:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/75406581.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定