英文:
How can I identify lines from a delimited file, based on a lookup file in unix
问题
我试图执行以下操作:
#!/bin/sh
lookupFile=$1 #lookup.txt
inputFile=$2 #input.txt
outputFile=$3 #output.txt
while IFS= read -r line
do
awk -F'~' '{if ($3==$line) print >> $outputFile}' $inputFile
done < "$lookupFile"
但是我遇到了错误,如下所示:
awk: cmd. line:1: (FILENAME=input.txt FNR=2) fatal: can't redirect to
我怎么解决这个问题?另外,如果文件非常庞大,有数千条记录需要搜索,这是否是一种有效的方法?
英文:
Assume that there are two files
File1 - lookup.txt
CAN
USD
INR
EUR
Another file Input.txt
1~Canada~CAN
2~United States of America~USD
3~Brazil~BRL
Both files may be very huge, hypothetically several thousand of records . Now I'm trying to identify the records in Input.txt and identify them based on values in lookup file.
The expected output should be
1~Canada~CAN
2~United States of America~USD
I tried to do something like below
#!/bin/sh
lookupFile=$1 #lookup.txt
inputFile=$2 #input.txt
outputFile=$3 #output.txt
while IFS= read -r line
do
awk -F'~' '{if ($3==$line) print >> $outputFile}' $inputFile
done < "$lookupFile"
But I'm getting error like
awk: cmd. line:1: (FILENAME=input.txt FNR=2) fatal: can't redirect to
How can I fix this issue ? Also if the files really huge, with several thousand of records to search, is this an efficient way ?
答案1
得分: 3
使用您提供的示例,请尝试以下awk
代码。我们可以在单个awk
中执行此操作,需要在读取input.txt
之前设置字段分隔符为~
。
awk 'FNR==NR{arr[$0];next} ($3 in arr)' lookup.txt FS="~" input.txt
解释:
awk ' ##从这里开始awk程序。
FNR==NR{ ##检查条件,当读取lookup.txt时为TRUE。
arr[$0] ##创建以$0为索引的数组arr。
next ##next用于跳过从此处开始的所有后续语句。
}
($3 in arr) ##如果$3在arr中存在,则打印该行。
' lookup.txt FS="~" input.txt ##指定输入文件并在input.txt之前设置FS为~
英文:
With your shown samples please try following awk
code. We could do this in single awk
we need to take care of setting field separator as ~
before input.txt.
awk 'FNR==NR{arr[$0];next} ($3 in arr)' lookup.txt FS="~" input.txt
Explanation:
awk ' ##starting awk program from here.
FNR==NR{ ##Checking condition which will be TRUE when lookup.txt is being read.
arr[$0] ##Creating array arr with $0 as index.
next ##next to skip all further statements from here.
}
($3 in arr) ##If $3 is present in arr then print that line.
' lookup.txt FS="~" input.txt ##Mentioning Input_files and setting FS to ~ before input.txt
答案2
得分: 0
以下是代码部分的翻译:
$ grep -wFf lookup.txt input.txt
1~Canada~CAN
2~United States of America~USD
警告:这种方法不仅匹配最后一个单词。因此,如果lookup.txt
中的一些值也可以在input.txt
的其他地方找到,最好选择另一个解决方案。或者,如果它不包含可以被解释为正则表达式操作符的内容,请在使用grep之前对lookup.txt
进行预处理。以下是使用bash、sed和grep的示例:
$ grep -f <( sed 's/.*/~&$/' lookup.txt ) input.txt
1~Canada~CAN
2~United States of America~USD
英文:
A non-awk solution that you could compare with on the performance point of view:
$ grep -wFf lookup.txt input.txt
1~Canada~CAN
2~United States of America~USD
Warning: this does not match only on the last word. So if some values in lookup.txt
can also be found elsewhere in input.txt
, prefer another solution. Or, if it contains nothing that could be interpreted as a regular expression operator, preprocess lookup.txt
before grep. Example with bash, sed and grep:
$ grep -f <( sed 's/.*/~&$/' lookup.txt ) input.txt
1~Canada~CAN
2~United States of America~USD
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论