2023年2月10日 11:11:58go评论116阅读模式

英文:

How can I identify lines from a delimited file, based on a lookup file in unix

问题

我试图执行以下操作：

#!/bin/sh
lookupFile=$1  #lookup.txt
inputFile=$2   #input.txt
outputFile=$3  #output.txt
while IFS= read -r line
  do
  awk -F'~' '{if ($3==$line) print >> $outputFile}' $inputFile
 done < "$lookupFile"

但是我遇到了错误，如下所示：

awk: cmd. line:1: (FILENAME=input.txt FNR=2) fatal: can't redirect to

我怎么解决这个问题？另外，如果文件非常庞大，有数千条记录需要搜索，这是否是一种有效的方法？

英文:

Assume that there are two files
File1 - lookup.txt

CAN
USD
INR
EUR

Another file Input.txt

1~Canada~CAN
2~United States of America~USD
3~Brazil~BRL

Both files may be very huge, hypothetically several thousand of records . Now I'm trying to identify the records in Input.txt and identify them based on values in lookup file.

The expected output should be

1~Canada~CAN
2~United States of America~USD

I tried to do something like below

#!/bin/sh
 lookupFile=$1  #lookup.txt
 inputFile=$2   #input.txt
 outputFile=$3  #output.txt
 while IFS= read -r line
  do
  awk -F&#39;~&#39; &#39;{if ($3==$line) print &gt;&gt; $outputFile}&#39;  $inputFile
 done &lt; &quot;$lookupFile&quot;

But I'm getting error like
awk: cmd. line:1: (FILENAME=input.txt FNR=2) fatal: can't redirect to

How can I fix this issue ? Also if the files really huge, with several thousand of records to search, is this an efficient way ?

答案1

得分: 3

使用您提供的示例，请尝试以下awk代码。我们可以在单个awk中执行此操作，需要在读取input.txt之前设置字段分隔符为~。

awk 'FNR==NR{arr[$0];next} ($3 in arr)' lookup.txt FS="~" input.txt

解释：

awk '                          ##从这里开始awk程序。
FNR==NR{                       ##检查条件，当读取lookup.txt时为TRUE。
  arr[$0]                      ##创建以$0为索引的数组arr。
  next                         ##next用于跳过从此处开始的所有后续语句。
}
($3 in arr)                    ##如果$3在arr中存在，则打印该行。
' lookup.txt FS="~" input.txt  ##指定输入文件并在input.txt之前设置FS为~

英文:

With your shown samples please try following awk code. We could do this in single awk we need to take care of setting field separator as ~ before input.txt.

awk &#39;FNR==NR{arr[$0];next} ($3 in arr)&#39; lookup.txt FS=&quot;~&quot; input.txt

Explanation:

awk &#39;                          ##starting awk program from here.
FNR==NR{                       ##Checking condition which will be TRUE when lookup.txt is being read.
  arr[$0]                      ##Creating array arr with $0 as index.
  next                         ##next to skip all further statements from here.
}
($3 in arr)                    ##If $3 is present in arr then print that line.
&#39; lookup.txt FS=&quot;~&quot; input.txt  ##Mentioning Input_files and setting FS to ~ before input.txt

答案2

得分: 0

以下是代码部分的翻译：

$ grep -wFf lookup.txt input.txt
1~Canada~CAN
2~United States of America~USD

警告：这种方法不仅匹配最后一个单词。因此，如果lookup.txt中的一些值也可以在input.txt的其他地方找到，最好选择另一个解决方案。或者，如果它不包含可以被解释为正则表达式操作符的内容，请在使用grep之前对lookup.txt进行预处理。以下是使用bash、sed和grep的示例：

$ grep -f <( sed 's/.*/~&amp;$/&#39; lookup.txt ) input.txt
1~Canada~CAN
2~United States of America~USD

英文:

A non-awk solution that you could compare with on the performance point of view:

$ grep -wFf lookup.txt input.txt
1~Canada~CAN
2~United States of America~USD

Warning: this does not match only on the last word. So if some values in lookup.txt can also be found elsewhere in input.txt, prefer another solution. Or, if it contains nothing that could be interpreted as a regular expression operator, preprocess lookup.txt before grep. Example with bash, sed and grep:

$ grep -f &lt;( sed &#39;s/.*/~&amp;$/&#39; lookup.txt ) input.txt
1~Canada~CAN
2~United States of America~USD

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Unix中根据查找文件来识别分隔文件中的行。

问题

答案1

答案2

如何在bash脚本中打印一个字符串以及该字符串的所有n行。

重命名 Rust 中的 ‘main’ 线程

累计到个体使用awk

解析CSV文件中的嵌套单元格，使用bash终端或R。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论