使用awk计算CSV文件,但CSV文件的列数不均匀。

huangapple go评论53阅读模式
英文:

Using awk to calculate a csv but with uneven columns

问题

我需要使用扩展名为 .awk 的脚本来在文件中进行计算。每行包含数字,最后一个位置包含一个关键字,用于决定是否对该行中的数字进行加法或减法运算。然而,由于关键字不在同一列中,我无法弄清楚如何正确解析。表格如下所示:

1,2,5,7,14,11,51,ADD
1,3,5,SUB
15,13,11,19,ADD
19,13,12,22,SUB
1,5,8,2,0,13,18,22,6,4,7,ADD
11,3,SUB

任何帮助都将不胜感激。

英文:

I need to write a script using the .awk extension to do calculations within a file. Each row contains numbers and a keyword in the last spot dictating whether to add or subtract the numbers within a row. However, since the keywords are not in the same column, I cannot figure out how to parse through this correctly. The table looks like:

1,2,5,7,14,11,51,ADD
1,3,5,SUB
15,13,11,19,ADD
19,13,12,22,SUB
1,5,8,2,0,13,18,22,6,4,7,ADD
11,3,SUB

Any help would be appreciated.

答案1

得分: 3

以下是翻译好的部分:

你说你需要编写一个使用 awk 脚本的 awk 扩展名 -- 我理解为 yourscript.awkawk 相对简单,只要你理解 记录(行)、字段(列)和规则(应用于每行输入的 awk 命令 -- 按你编写的顺序)。

规则 包含在 awk 脚本的外部 { ... } 中。有两个特殊的规则,BEGIN(在处理第一个记录之前运行 - 用于设置、输出标题等)和 END 规则(在处理最后一个记录之后运行,比如输出所有记录值的总和、平均值或在处理最后一个记录后进行输出处理)。

在你的情况下,你只需要使用 BEGIN 来设置 FS字段分隔符)的值为 ',',这样你的字段就会以 ',' 分隔。

设置了 FS 之后,你可以使用特殊变量 NF(字段数量)来确定每行中的字段数(以及哪个字段包含 "ADD""SUB")。如果你想知道正在处理的文件中的当前行,内部变量 FNR 可以提供这个信息。

有了这个背景,你可以处理你的文件,根据 $NF$ 表示字段值,例如如果 NF == 3,那么 $NF 是第三个字段的值)来将所有值要么 "ADD" 要么 "SUB"(减去)。

更简单的方法是从第二个字段开始循环,要么加到第一个字段($1)上,要么从第一个字段减去,以开始每条记录的计算。获取总和并输出它,重复处理文件中的每条记录,例如:

#!/bin/awk -f

BEGIN { FS = "," }                        ##  初始化 FS 为 ","

{
  n = $1                                  ##  初始化 n 为第一个字段的值
  for (i = 2; i < NF; i++) {              ##  循环字段2到NF - 1
    if ($NF == "SUB")                     ##  如果 $NF 是 "SUB"
      n -= $i                             ##  从总数中减去当前值
    else if ($NF == "ADD")                ##  否则如果 $NF 是 "ADD"
      n += $i                             ##  加上当前值
  }
  printf "line: %d => % 3d\n", FNR, n     ##  输出总数
}

示例用法/输出

使用你的示例数据在文件 dat/keyword.dat 中,并将脚本命名为 keyword.awk(使用 chmod +x keyword.awk 使文件可执行),你可以这样做:

$ ./keyword.awk dat/keyword.txt
line: 1 =>  91
line: 2 =>  -7
line: 3 =>  58
line: 4 => -28
line: 5 =>  86
line: 6 =>   8

注意: 如果你的 ADDSUB 计算需要不同的处理,你可以根据需要简单地调整上面的求和逻辑。

如果有问题,请告诉我。

英文:

You say you need to write an awk script with the awk extension -- which I take to mean yourscript.awk. awk is fairly simple, as long as you understand Records (lines), Fields (columns) and Rules (the awk commands applied to each line of input -- in the order you write them)

A Rule is contained between outer { ... } within the awk script. There are two special rules BEGIN (runs before the 1st records is processed - for setup, header output, etc..) and the END rule (which runs after the last record is processed (like for outputting total sums of all record values, averages, or processing output after the last record)

In your case you simply need BEGIN to set FS (the Field-Seperator) value to ',' so your fields are spit on ','.

After you set FS, you can use the special variable NF (number of fields) to determine the number of fields in each row (and which field holds "ADD" or "SUB"). If you want to know the current line in the file being processed, the FNR internal variable gives you that info.

With that as a background, you can process your file and either "ADD" all values or "SUB" (substract) all values based on what is in $NF (the $ denotes field-value, e.g. if NF == 3, then $NF is the value of the 3rd field)

It is simpler to loop from the 2nd field on, either adding to, or subtracting from, the first field ($1) to start your calculation for each record. Get the total and output it, repeat for each record in the file, e.g.

#!/bin/awk -f

BEGIN { FS = "," }                        ##  initialize FS to ","

{
  n = $1                                  ##  initialize n to 1st field value
  for (i = 2; i < NF; i++) {              ##  loop fields 2 until NF - 1
    if ($NF == "SUB")                     ##  if $NF is "SUB"
      n -= $i                             ##  subtract current from total
    else if ($NF == "ADD")                ##  otherwise if $NF is "ADD"
      n += $i                             ##  add current to total
  }
  printf "line: %d => % 3d\n", FNR, n     ##  output total
}

Example Use/Output

With your sample data in the file dat/keyword.dat and the script named keyword.awk (and chmod +x keyword.awk to make the file executable), you would do:

$ ./keyword.awk dat/keyword.txt
line: 1 =>  91
line: 2 =>  -7
line: 3 =>  58
line: 4 => -28
line: 5 =>  86
line: 6 =>   8

NOTE: if your ADD or SUB calculations need to be done differently, you can simply adjust the summing logic above as needed.

Let me know if you have questions.

huangapple
  • 本文由 发表于 2023年6月1日 12:50:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76378743.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定