将CSV中的条目添加到awk中的不同数组并打印它们。

huangapple go评论61阅读模式
英文:

Add entries from a CSV into different arrays on awk and print them

问题

我正在制作一个.awk脚本,用来处理包含产品价格的.csv文件。输入文件如下所示:

Product,Price
EG412,25
EG411,15
EG516,55
EG517,60
LG210,10
LG180,5
HG915,95

我已经通过将第二列相加并除以NR - 1来计算平均值,但现在我需要根据产品价格是否高于或低于平均价格点将产品添加到数组中。我遇到的问题是我的数组没有打印出来,而且还添加了包含"Product,Price"的csv顶部列。我目前的代码如下:

BEGIN{
    FS=","
    sum=0
    avg=0
    high=0
    low=0
}
{
    sum+=$2
    total=NR-1
}
{
    avg=sum/total
}
{
    if ($2 > avg && NR > 1) {
        expensive[high] = $1
        high++
    } else if ($2 < avg && NR > 1) {
        cheap[low] = $1
        low++
    }
}
{
    for (i in expensive) {
        print i
        i++
    }
}
END{
    printf "Average Price: %.2f\n", avg
}

我感觉我可能以非常复杂的方式来解决这个问题,但我无法弄清楚如何使它正常工作。当我以这种方式运行它来测试高价值产品的数组时,它返回的结果是:

0
1
0
1
0
1
0
Average Price: 37.86

我会感激任何帮助解决这个问题。

英文:

I am making a .awk script and am taking in a .csv file containing product prices. The input file is:

Product,Price
EG412,25
EG411,15
EG516,55
EG517,60
LG210,10
LG180,5
HG915,95

I have already gotten an average through adding the second column and dividing by NR - 1, but now I am supposed to add products into arrays based upon if they are above or below the average price point. The issue I am running into is that my arrays are not printing and are also adding the top column of the csv containing "Product,Price". The code I have is:

BEGIN{
    FS=&quot;,&quot;
    sum=0
    avg=0
    high=0
    low=0
}
{
    sum+=$2
    total=NR-1
}
{
    avg=sum/total
}
{
    if ($2 &gt; avg &amp;&amp; NR &gt; 1) {
        expensive[high] = $1
        high++
    } else if ($2 &lt; avg &amp;&amp; NR &gt; 1) {
        cheap[low] = $1
        low++
    }
}
{
    for (i in expensive) {
        print i
        i++
    }
}
END{
    printf &quot;Average Price: &quot;&quot;%.2\n&quot;, avg
}

I feel like I am doing this in an incredibly convoluted way, but I can't figure out how to get this to work. When I run it this way to test the array for only high value products, the result it returns is:

0
1
0
1
0
1
0
Average Price: 37.86

I would appreciate any help resolving this issue.

答案1

得分: 1

使用两遍方法,首先计算平均值,然后确定哪些值高于/低于平均值:

$ cat tst.awk
BEGIN { FS="," }
FNR == 1 {
    next
}
NR == FNR {
    tot += $2
    ave = tot / (NR-1)
    next
}
{
    if ( $2 < ave ) {
        cheap[$1]
    }
    else if ( $2 > ave ) {
        expensive[$1]
    }
}
END {
    print "平均值:", ave+0

    print "\n便宜商品:"
    for ( product in cheap ) {
        print product
    }

    print "\n昂贵商品:"
    for ( product in expensive ) {
        print product
    }
}
$ awk -f tst.awk 文件 文件
平均值: 37.8571

便宜商品:
LG180
LG210
EG411
EG412

昂贵商品:
EG516
EG517
HG915

```

英文:

Use a 2-pass approach, first to calculate the average and then to determine which values are above/below the average:

$ cat tst.awk
BEGIN { FS=&quot;,&quot; }
FNR == 1 {
    next
}
NR == FNR {
    tot += $2
    ave = tot / (NR-1)
    next
}
{
    if ( $2 &lt; ave ) {
        cheap[$1]
    }
    else if ( $2 &gt; ave ) {
        expensive[$1]
    }
}
END {
    print &quot;Average:&quot;, ave+0

    print &quot;\nCheap:&quot;
    for ( product in cheap ) {
        print product
    }

    print &quot;\nExpensive:&quot;
    for ( product in expensive ) {
        print product
    }
}

<p>

$ awk -f tst.awk file file
Average: 37.8571

Cheap:
LG180
LG210
EG411
EG412

Expensive:
EG516
EG517
HG915

答案2

得分: 0

你应该事先计算好平均值,然后将其传递给你的脚本,类似这样的方式应该可以工作:

calc.awk

# 数组从1开始索引
BEGIN { low = high = 1 }

NR == 1 { next }

$2  < avg { cheap[low++]      = $2 }
$2 >= avg { expensive[high++] = $2 }

END {
  printf "平均值:%.2f\n", avg
  printf "便宜:"
  for (i in cheap)
    printf " %d", cheap[i]
  printf "\n高价:"
  for (i in expensive)
    printf " %d", expensive[i]
  printf "\n"
}

运行它如下:

awk -v avg=37.8571 -F, -f calc.awk infile.csv

输出:

平均值:37.86
便宜:25 15 10 5
高价:55 60 95
英文:

You should calculate the average before hand, then pass it into your script, something like this should work:

calc.awk

# Arrays are 1-indexed
BEGIN { low = high = 1 }

NR == 1 { next }

$2  &lt; avg { cheap[low++]      = $2 }
$2 &gt;= avg { expensive[high++] = $2 }

END {
  printf &quot;Average: %.2f\n&quot;, avg
  printf &quot;Cheap:&quot;
  for (i in cheap)
    printf &quot; %d&quot;, cheap[i]
  printf &quot;\nHigh:&quot;
  for (i in expensive)
    printf &quot; %d&quot;, expensive[i]
  printf &quot;\n&quot;
}

Run it like this:

awk -v avg=37.8571 -F, -f calc.awk infile.csv

Output:

Average: 37.86
Cheap: 25 15 10 5
High: 55 60 95

huangapple
  • 本文由 发表于 2023年6月1日 15:01:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76379403.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定