根据输入的列顺序重新排列CSV文件中的列/数值。

huangapple go评论132阅读模式
英文:

Rearranging column/values in csv file based on input column order

问题

我需要你在Unix/AWK中的帮助,以特定顺序重新排列CSV文件(逗号分隔文件),顺序是根据另一个文件中的给定顺序排列的。我的知识在这方面有限。

示例:需要重新排列的文件。

  1. PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
  2. PRD-0989 Static BATCH01 Estract
  3. PRD-0990 Static BATCH02 Xstract
  4. PRD-0991 Static BATCH03 8stract
  5. PRD-0992 Static BATCH04 errs
  6. PRD-0993 Static BATCH05 dgerrr
  7. PRD-0994 Static BATCH06 Narros
  8. PRD-0995 Static BATCH07 sef3r

如果将输入列顺序传递为[可以存储在另一个文件中],

  1. PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE

那么我的输出文件应该如下:

  1. PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
  2. PRD-0989 Estract BATCH01 Static
  3. PRD-0990 Xstract BATCH02 Static
  4. PRD-0991 8stract BATCH03 Static
  5. PRD-0992 errs BATCH04 Static
  6. PRD-0993 dgerrr BATCH05 Static
  7. PRD-0994 Narros BATCH06 Static
  8. PRD-0995 sef3r BATCH07 Static
英文:

i need your help in unix/awk to order the data in csv file (comma separated file) to certain order given in another file. my knowledge is limited on this.

example: File to rearrange.

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-html -->

  1. PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
  2. PRD-0989 Static BATCH01 Estract
  3. PRD-0990 Static BATCH02 Xstract
  4. PRD-0991 Static BATCH03 8stract
  5. PRD-0992 Static BATCH04 errs
  6. PRD-0993 Static BATCH05 dgerrr
  7. PRD-0994 Static BATCH06 Narros
  8. PRD-0995 Static BATCH07 sef3r

<!-- end snippet -->

and if pass input column order as [This could be stored in another file]

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-html -->

  1. PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE

<!-- end snippet -->

then my output file should come as below

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-html -->

  1. PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
  2. PRD-0989 Estract BATCH01 Static
  3. PRD-0990 Xstract BATCH02 Static
  4. PRD-0991 8stract BATCH03 Static
  5. PRD-0992 errs BATCH04 Static
  6. PRD-0993 dgerrr BATCH05 Static
  7. PRD-0994 Narros BATCH06 Static
  8. PRD-0995 sef3r BATCH07 Static

<!-- end snippet -->

答案1

得分: 0

以下是在TXR Lisp中的解决方案:

  1. $ txr get.tl columns data
  2. PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
  3. PRD-0989 Estract BATCH01 Static
  4. PRD-0990 Xstract BATCH02 Static
  5. PRD-0991 8stract BATCH03 Static
  6. PRD-0992 errs BATCH04 Static
  7. PRD-0993 dgerrr BATCH05 Static
  8. PRD-0994 Narros BATCH06 Static
  9. PRD-0995 sef3r BATCH07 Static

文件看起来像这样:

  1. $ cat columns
  2. PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
  3. $ cat data
  4. PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
  5. PRD-0989 Static BATCH01 Estract
  6. PRD-0990 Static BATCH02 Xstract
  7. PRD-0991 Static BATCH03 8stract
  8. PRD-0992 Static BATCH04 errs
  9. PRD-0993 Static BATCH05 dgerrr
  10. PRD-0994 Static BATCH06 Narros
  11. PRD-0995 Static BATCH07 sef3r

get.tl 文件中的代码如下:

  1. (match (@colfile @datafile) *args*
  2. (awk
  3. (:inputs datafile)
  4. (:let col-indices)
  5. ((= nr 1) (let ((wanted-cols (flow colfile
  6. file-get-lines first (spl &quot;,&quot;)))
  7. (col-index-hash (hash-zip f 0)))
  8. (set col-indices (mapcar col-index-hash wanted-cols))))
  9. (t [apply prn (mapcar f col-indices)])))

看起来数据可能是用制表符分隔的,而制表符被转换为了空格,制表符宽度为四个字符。为了输出制表符分隔的内容,我们可以在 awk 宏中添加一个 (:set ofs &quot;\t&quot;) 子句。

英文:

Solution in TXR Lisp:

  1. $ txr get.tl columns data
  2. PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
  3. PRD-0989 Estract BATCH01 Static
  4. PRD-0990 Xstract BATCH02 Static
  5. PRD-0991 8stract BATCH03 Static
  6. PRD-0992 errs BATCH04 Static
  7. PRD-0993 dgerrr BATCH05 Static
  8. PRD-0994 Narros BATCH06 Static
  9. PRD-0995 sef3r BATCH07 Static

Files look like:

  1. $ cat columns
  2. PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
  3. $ cat data
  4. PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
  5. PRD-0989 Static BATCH01 Estract
  6. PRD-0990 Static BATCH02 Xstract
  7. PRD-0991 Static BATCH03 8stract
  8. PRD-0992 Static BATCH04 errs
  9. PRD-0993 Static BATCH05 dgerrr
  10. PRD-0994 Static BATCH06 Narros
  11. PRD-0995 Static BATCH07 sef3r

Code in get.tl:

  1. (match (@colfile @datafile) *args*
  2. (awk
  3. (:inputs datafile)
  4. (:let col-indices)
  5. ((= nr 1) (let ((wanted-cols (flow colfile
  6. file-get-lines first (spl &quot;,&quot;)))
  7. (col-index-hash (hash-zip f 0)))
  8. (set col-indices (mapcar col-index-hash wanted-cols))))
  9. (t [apply prn (mapcar f col-indices)])))

It looks as if the data might have been tab separated, and the tabs got converted to spaces, with a tab width of four. To output tab-separated, we can add a (:set ofs &quot;\t&quot;) clause to the awk macro.

答案2

得分: 0

以下是您要翻译的部分:

如果:

  • 您的CSV文件很简单(没有包含逗号的带引号字段,每个记录占一行……),
  • 每条记录恰好具有相同数量的字段,
  • 并且您使用恰好相同数量的字段指定所需的输出顺序,

您可以尝试(使用任何POSIX awk,已经在GNU awk和macOS自带的awk上测试,输出前缀为-| ):

  1. awk -F, -v OFS=, -v o=&#39;PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE&#39; &#39;
  2. BEGIN {n=split(o,h)}
  3. NR==1 {print o; for(i=1;i&lt;=n;i++) a[$i]=i; next}
  4. {for(i=1;i&lt;=n;i++) printf(&quot;%s%s&quot;, $(a[h[i]]), i==n?&quot;\n&quot;:OFS)}
  5. &#39; foo.csv
  6. -| PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
  7. -| PRD-0989,Estract,BATCH01,Static
  8. -| PRD-0990,Xstract,BATCH02,Static
  9. -| PRD-0991,8stract,BATCH03,Static
  10. -| PRD-0992,errs,BATCH04,Static
  11. -| PRD-0993,dgerrr,BATCH05,Static
  12. -| PRD-0994,Narros,BATCH06,Static
  13. -| PRD-0995,sef3r,BATCH07,Static
  1. 将逗号设置为输入和输出字段分隔符(-F, -v OFS=,)。
  2. 将所需的输出顺序作为变量 o 传递(-v o=&#39;...&#39;)。
  3. BEGIN 块中,将变量 o 在输入字段分隔符上拆分,将结果存储在数组 h 中,并将列数存储在变量 n 中。
  4. 对于第一行,将字段存储在数组 a 中,字段号存储在值中(a[$i]=i)。进入下一行(next)。
  5. 对于其他行,按所需顺序打印前 n 个字段($(a[h[i]])),后跟输出字段分隔符,除了最后一个字段后跟换行符。

如果所需的输出顺序存储在另一个文件的第一行中,例如 bar.csv

  1. awk -F, -v OFS=, &#39;
  2. NR==1 {o=$0; n=split(o,h); next}
  3. NR!= FNR &amp;&amp; FNR==1 {print o; for(i=1;i&lt;=n;i++) a[$i]=i; next}
  4. {for(i=1;i&lt;=n;i++) printf(&quot;%s%s&quot;, $(a[h[i]]), i==n?&quot;\n&quot;:OFS)}
  5. &#39; bar.csv foo.csv
  1. NR==1 仅对第一个文件的第一行为真。
  2. NR!= FNR &amp;&amp; FNR==1 仅对第二个文件的第一行为真。
英文:

If:

  • your CSV is simple (no quoted fields containing commas, one line per record...),
  • and each record has exactly the same number of fields,
  • and you specify the desired output order with exactly the same number of fields,

you can try (with any POSIX awk, tested with GNU awk and the awk that comes with macOS, output prefixed with -| ):

  1. awk -F, -v OFS=, -v o=&#39;PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE&#39; &#39;
  2. BEGIN {n=split(o,h)}
  3. NR==1 {print o; for(i=1;i&lt;=n;i++) a[$i]=i; next}
  4. {for(i=1;i&lt;=n;i++) printf(&quot;%s%s&quot;, $(a[h[i]]), i==n?&quot;\n&quot;:OFS)}
  5. &#39; foo.csv
  6. -| PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
  7. -| PRD-0989,Estract,BATCH01,Static
  8. -| PRD-0990,Xstract,BATCH02,Static
  9. -| PRD-0991,8stract,BATCH03,Static
  10. -| PRD-0992,errs,BATCH04,Static
  11. -| PRD-0993,dgerrr,BATCH05,Static
  12. -| PRD-0994,Narros,BATCH06,Static
  13. -| PRD-0995,sef3r,BATCH07,Static
  1. Set comma as input and output field separators (-F, -v OFS=,).
  2. Pass desired output order as variable o (-v o=&#39;...&#39;).
  3. In BEGIN block split variable o on input fields separator, store the result in array h and the number of columns in variable n.
  4. For the first line store in array a the fields as keys and the field numbers as values (a[$i]=i). Go to next line (next).
  5. For the other lines print the n first fields in desired order ($(a[h[i]])), followed by the output field separator except the last that is followed by a newline.

If the desired output order is stored in the first line of another file, say bar.csv:

  1. awk -F, -v OFS=, &#39;
  2. NR==1 {o=$0; n=split(o,h); next}
  3. NR!= FNR &amp;&amp; FNR==1 {print o; for(i=1;i&lt;=n;i++) a[$i]=i; next}
  4. {for(i=1;i&lt;=n;i++) printf(&quot;%s%s&quot;, $(a[h[i]]), i==n?&quot;\n&quot;:OFS)}
  5. &#39; bar.csv foo.csv
  1. NR==1 is true only for the first line of the first file.
  2. NR!= FNR &amp;&amp; FNR==1 is true only for the first line of the second file.

答案3

得分: 0

以下是翻译好的部分:

  1. $ head *file
  2. ==> input_file <==
  3. PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
  4. PRD-0989 Static BATCH01 Estract
  5. PRD-0990 Static BATCH02 Xstract
  6. PRD-0991 Static BATCH03 8stract
  7. PRD-0992 Static BATCH04 errs
  8. PRD-0993 Static BATCH05 dgerrr
  9. PRD-0994 Static BATCH06 Narros
  10. PRD-0995 Static BATCH07 sef3r
  11. ==> order_file <==
  12. PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
  13. ---
  14. awk -F' *|,' '
  15. NR==FNR{
  16. for(i=1;i<=NF;i++) a[$i] = i
  17. next
  18. }
  19. FNR==1{
  20. for(i=1;i<=NF;i++) b[i] = a[$i]
  21. }
  22. {
  23. for (i=1;i<=length(b);i++) printf " %s", $(b[i])
  24. print ""
  25. }
  26. ' order_file input_file | column -t

PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
PRD-0989 Estract BATCH01 Static
PRD-0990 Xstract BATCH02 Static
PRD-0991 8stract BATCH03 Static
PRD-0992 errs BATCH04 Static
PRD-0993 dgerrr BATCH05 Static
PRD-0994 Narros BATCH06 Static
PRD-0995 sef3r BATCH07 Static

  1. <details>
  2. <summary>英文:</summary>
  3. $ head *file
  4. ==&gt; input_file &lt;==
  5. PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
  6. PRD-0989 Static BATCH01 Estract
  7. PRD-0990 Static BATCH02 Xstract
  8. PRD-0991 Static BATCH03 8stract
  9. PRD-0992 Static BATCH04 errs
  10. PRD-0993 Static BATCH05 dgerrr
  11. PRD-0994 Static BATCH06 Narros
  12. PRD-0995 Static BATCH07 sef3r
  13. ==&gt; order_file &lt;==
  14. PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
  15. ---
  16. ```sh
  17. awk -F&#39; *|,&#39; &#39;
  18. NR==FNR{
  19. for(i=1;i&lt;=NF;i++) a[$i] = i
  20. next
  21. }
  22. FNR==1{
  23. for(i=1;i&lt;=NF;i++) b[i] = a[$i]
  24. }
  25. {
  26. for (i=1;i&lt;=length(b);i++) printf &quot; %s&quot;, $(b[i])
  27. print &quot;&quot;
  28. }
  29. &#39; order_file input_file | column -t
  1. PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
  2. PRD-0989 Estract BATCH01 Static
  3. PRD-0990 Xstract BATCH02 Static
  4. PRD-0991 8stract BATCH03 Static
  5. PRD-0992 errs BATCH04 Static
  6. PRD-0993 dgerrr BATCH05 Static
  7. PRD-0994 Narros BATCH06 Static
  8. PRD-0995 sef3r BATCH07 Static

huangapple
  • 本文由 发表于 2023年8月9日 11:33:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76864396-2.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定