英文:
How can I identify lines from a delimited file, based on a lookup file in unix
问题
我试图执行以下操作:
#!/bin/sh
lookupFile=$1  #lookup.txt
inputFile=$2   #input.txt
outputFile=$3  #output.txt
while IFS= read -r line
  do
  awk -F'~' '{if ($3==$line) print >> $outputFile}' $inputFile
 done < "$lookupFile"
但是我遇到了错误,如下所示:
awk: cmd. line:1: (FILENAME=input.txt FNR=2) fatal: can't redirect to
我怎么解决这个问题?另外,如果文件非常庞大,有数千条记录需要搜索,这是否是一种有效的方法?
英文:
Assume that there are two files
File1 - lookup.txt
CAN
USD
INR
EUR
Another file Input.txt
1~Canada~CAN
2~United States of America~USD
3~Brazil~BRL
Both files may be very huge, hypothetically several thousand of records . Now I'm trying to identify the records in Input.txt and identify them based on values in lookup file.
The expected output should be
1~Canada~CAN
2~United States of America~USD
I tried to do something like below
#!/bin/sh
 lookupFile=$1  #lookup.txt
 inputFile=$2   #input.txt
 outputFile=$3  #output.txt
 while IFS= read -r line
  do
  awk -F'~' '{if ($3==$line) print >> $outputFile}'  $inputFile
 done < "$lookupFile"
But I'm getting error like
awk: cmd. line:1: (FILENAME=input.txt FNR=2) fatal: can't redirect to 
How can I fix this issue ? Also if the files really huge, with several thousand of records to search, is this an efficient way ?
答案1
得分: 3
使用您提供的示例,请尝试以下awk代码。我们可以在单个awk中执行此操作,需要在读取input.txt之前设置字段分隔符为~。
awk 'FNR==NR{arr[$0];next} ($3 in arr)' lookup.txt FS="~" input.txt
解释:
awk '                          ##从这里开始awk程序。
FNR==NR{                       ##检查条件,当读取lookup.txt时为TRUE。
  arr[$0]                      ##创建以$0为索引的数组arr。
  next                         ##next用于跳过从此处开始的所有后续语句。
}
($3 in arr)                    ##如果$3在arr中存在,则打印该行。
' lookup.txt FS="~" input.txt  ##指定输入文件并在input.txt之前设置FS为~
英文:
With your shown samples please try following awk code. We could do this in single awk we need to take care of setting field separator as ~ before input.txt.
awk 'FNR==NR{arr[$0];next} ($3 in arr)' lookup.txt FS="~" input.txt
Explanation:
awk '                          ##starting awk program from here.
FNR==NR{                       ##Checking condition which will be TRUE when lookup.txt is being read.
  arr[$0]                      ##Creating array arr with $0 as index.
  next                         ##next to skip all further statements from here.
}
($3 in arr)                    ##If $3 is present in arr then print that line.
' lookup.txt FS="~" input.txt  ##Mentioning Input_files and setting FS to ~ before input.txt
答案2
得分: 0
以下是代码部分的翻译:
$ grep -wFf lookup.txt input.txt
1~Canada~CAN
2~United States of America~USD
警告:这种方法不仅匹配最后一个单词。因此,如果lookup.txt中的一些值也可以在input.txt的其他地方找到,最好选择另一个解决方案。或者,如果它不包含可以被解释为正则表达式操作符的内容,请在使用grep之前对lookup.txt进行预处理。以下是使用bash、sed和grep的示例:
$ grep -f <( sed 's/.*/~&$/' lookup.txt ) input.txt
1~Canada~CAN
2~United States of America~USD
英文:
A non-awk solution that you could compare with on the performance point of view:
$ grep -wFf lookup.txt input.txt
1~Canada~CAN
2~United States of America~USD
Warning: this does not match only on the last word. So if some values in lookup.txt can also be found elsewhere in input.txt, prefer another solution. Or, if it contains nothing that could be interpreted as a regular expression operator, preprocess lookup.txt before grep. Example with bash, sed and grep:
$ grep -f <( sed 's/.*/~&$/' lookup.txt ) input.txt
1~Canada~CAN
2~United States of America~USD
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论