将CSV中的条目添加到awk中的不同数组并打印它们。

huangapple go评论92阅读模式
英文:

Add entries from a CSV into different arrays on awk and print them

问题

我正在制作一个.awk脚本,用来处理包含产品价格的.csv文件。输入文件如下所示:

  1. Product,Price
  2. EG412,25
  3. EG411,15
  4. EG516,55
  5. EG517,60
  6. LG210,10
  7. LG180,5
  8. HG915,95

我已经通过将第二列相加并除以NR - 1来计算平均值,但现在我需要根据产品价格是否高于或低于平均价格点将产品添加到数组中。我遇到的问题是我的数组没有打印出来,而且还添加了包含"Product,Price"的csv顶部列。我目前的代码如下:

  1. BEGIN{
  2. FS=","
  3. sum=0
  4. avg=0
  5. high=0
  6. low=0
  7. }
  8. {
  9. sum+=$2
  10. total=NR-1
  11. }
  12. {
  13. avg=sum/total
  14. }
  15. {
  16. if ($2 > avg && NR > 1) {
  17. expensive[high] = $1
  18. high++
  19. } else if ($2 < avg && NR > 1) {
  20. cheap[low] = $1
  21. low++
  22. }
  23. }
  24. {
  25. for (i in expensive) {
  26. print i
  27. i++
  28. }
  29. }
  30. END{
  31. printf "Average Price: %.2f\n", avg
  32. }

我感觉我可能以非常复杂的方式来解决这个问题,但我无法弄清楚如何使它正常工作。当我以这种方式运行它来测试高价值产品的数组时,它返回的结果是:

  1. 0
  2. 1
  3. 0
  4. 1
  5. 0
  6. 1
  7. 0
  8. Average Price: 37.86

我会感激任何帮助解决这个问题。

英文:

I am making a .awk script and am taking in a .csv file containing product prices. The input file is:

  1. Product,Price
  2. EG412,25
  3. EG411,15
  4. EG516,55
  5. EG517,60
  6. LG210,10
  7. LG180,5
  8. HG915,95

I have already gotten an average through adding the second column and dividing by NR - 1, but now I am supposed to add products into arrays based upon if they are above or below the average price point. The issue I am running into is that my arrays are not printing and are also adding the top column of the csv containing "Product,Price". The code I have is:

  1. BEGIN{
  2. FS=&quot;,&quot;
  3. sum=0
  4. avg=0
  5. high=0
  6. low=0
  7. }
  8. {
  9. sum+=$2
  10. total=NR-1
  11. }
  12. {
  13. avg=sum/total
  14. }
  15. {
  16. if ($2 &gt; avg &amp;&amp; NR &gt; 1) {
  17. expensive[high] = $1
  18. high++
  19. } else if ($2 &lt; avg &amp;&amp; NR &gt; 1) {
  20. cheap[low] = $1
  21. low++
  22. }
  23. }
  24. {
  25. for (i in expensive) {
  26. print i
  27. i++
  28. }
  29. }
  30. END{
  31. printf &quot;Average Price: &quot;&quot;%.2\n&quot;, avg
  32. }

I feel like I am doing this in an incredibly convoluted way, but I can't figure out how to get this to work. When I run it this way to test the array for only high value products, the result it returns is:

  1. 0
  2. 1
  3. 0
  4. 1
  5. 0
  6. 1
  7. 0
  8. Average Price: 37.86

I would appreciate any help resolving this issue.

答案1

得分: 1

使用两遍方法,首先计算平均值,然后确定哪些值高于/低于平均值:

  1. $ cat tst.awk
  2. BEGIN { FS="," }
  3. FNR == 1 {
  4. next
  5. }
  6. NR == FNR {
  7. tot += $2
  8. ave = tot / (NR-1)
  9. next
  10. }
  11. {
  12. if ( $2 < ave ) {
  13. cheap[$1]
  14. }
  15. else if ( $2 > ave ) {
  16. expensive[$1]
  17. }
  18. }
  19. END {
  20. print "平均值:", ave+0
  21. print "\n便宜商品:"
  22. for ( product in cheap ) {
  23. print product
  24. }
  25. print "\n昂贵商品:"
  26. for ( product in expensive ) {
  27. print product
  28. }
  29. }
  1. $ awk -f tst.awk 文件 文件
  2. 平均值: 37.8571
  3. 便宜商品:
  4. LG180
  5. LG210
  6. EG411
  7. EG412
  8. 昂贵商品:
  9. EG516
  10. EG517
  11. HG915

```

英文:

Use a 2-pass approach, first to calculate the average and then to determine which values are above/below the average:

  1. $ cat tst.awk
  2. BEGIN { FS=&quot;,&quot; }
  3. FNR == 1 {
  4. next
  5. }
  6. NR == FNR {
  7. tot += $2
  8. ave = tot / (NR-1)
  9. next
  10. }
  11. {
  12. if ( $2 &lt; ave ) {
  13. cheap[$1]
  14. }
  15. else if ( $2 &gt; ave ) {
  16. expensive[$1]
  17. }
  18. }
  19. END {
  20. print &quot;Average:&quot;, ave+0
  21. print &quot;\nCheap:&quot;
  22. for ( product in cheap ) {
  23. print product
  24. }
  25. print &quot;\nExpensive:&quot;
  26. for ( product in expensive ) {
  27. print product
  28. }
  29. }

<p>

  1. $ awk -f tst.awk file file
  2. Average: 37.8571
  3. Cheap:
  4. LG180
  5. LG210
  6. EG411
  7. EG412
  8. Expensive:
  9. EG516
  10. EG517
  11. HG915

答案2

得分: 0

你应该事先计算好平均值,然后将其传递给你的脚本,类似这样的方式应该可以工作:

calc.awk

  1. # 数组从1开始索引
  2. BEGIN { low = high = 1 }
  3. NR == 1 { next }
  4. $2 < avg { cheap[low++] = $2 }
  5. $2 >= avg { expensive[high++] = $2 }
  6. END {
  7. printf "平均值:%.2f\n", avg
  8. printf "便宜:"
  9. for (i in cheap)
  10. printf " %d", cheap[i]
  11. printf "\n高价:"
  12. for (i in expensive)
  13. printf " %d", expensive[i]
  14. printf "\n"
  15. }

运行它如下:

  1. awk -v avg=37.8571 -F, -f calc.awk infile.csv

输出:

  1. 平均值:37.86
  2. 便宜:25 15 10 5
  3. 高价:55 60 95
英文:

You should calculate the average before hand, then pass it into your script, something like this should work:

calc.awk

  1. # Arrays are 1-indexed
  2. BEGIN { low = high = 1 }
  3. NR == 1 { next }
  4. $2 &lt; avg { cheap[low++] = $2 }
  5. $2 &gt;= avg { expensive[high++] = $2 }
  6. END {
  7. printf &quot;Average: %.2f\n&quot;, avg
  8. printf &quot;Cheap:&quot;
  9. for (i in cheap)
  10. printf &quot; %d&quot;, cheap[i]
  11. printf &quot;\nHigh:&quot;
  12. for (i in expensive)
  13. printf &quot; %d&quot;, expensive[i]
  14. printf &quot;\n&quot;
  15. }

Run it like this:

  1. awk -v avg=37.8571 -F, -f calc.awk infile.csv

Output:

  1. Average: 37.86
  2. Cheap: 25 15 10 5
  3. High: 55 60 95

huangapple
  • 本文由 发表于 2023年6月1日 15:01:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76379403.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定