英文:
Rearranging column/values in csv file based on input column order
问题
我需要你在Unix/AWK中的帮助,以特定顺序重新排列CSV文件(逗号分隔文件),顺序是根据另一个文件中的给定顺序排列的。我的知识在这方面有限。
示例:需要重新排列的文件。
PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
PRD-0989 Static BATCH01 Estract
PRD-0990 Static BATCH02 Xstract
PRD-0991 Static BATCH03 8stract
PRD-0992 Static BATCH04 errs
PRD-0993 Static BATCH05 dgerrr
PRD-0994 Static BATCH06 Narros
PRD-0995 Static BATCH07 sef3r
如果将输入列顺序传递为[可以存储在另一个文件中],
PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
那么我的输出文件应该如下:
PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
PRD-0989 Estract BATCH01 Static
PRD-0990 Xstract BATCH02 Static
PRD-0991 8stract BATCH03 Static
PRD-0992 errs BATCH04 Static
PRD-0993 dgerrr BATCH05 Static
PRD-0994 Narros BATCH06 Static
PRD-0995 sef3r BATCH07 Static
英文:
i need your help in unix/awk to order the data in csv file (comma separated file) to certain order given in another file. my knowledge is limited on this.
example: File to rearrange.
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-html -->
PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
PRD-0989 Static BATCH01 Estract
PRD-0990 Static BATCH02 Xstract
PRD-0991 Static BATCH03 8stract
PRD-0992 Static BATCH04 errs
PRD-0993 Static BATCH05 dgerrr
PRD-0994 Static BATCH06 Narros
PRD-0995 Static BATCH07 sef3r
<!-- end snippet -->
and if pass input column order as [This could be stored in another file]
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-html -->
PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
<!-- end snippet -->
then my output file should come as below
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-html -->
PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
PRD-0989 Estract BATCH01 Static
PRD-0990 Xstract BATCH02 Static
PRD-0991 8stract BATCH03 Static
PRD-0992 errs BATCH04 Static
PRD-0993 dgerrr BATCH05 Static
PRD-0994 Narros BATCH06 Static
PRD-0995 sef3r BATCH07 Static
<!-- end snippet -->
答案1
得分: 0
以下是在TXR Lisp中的解决方案:
$ txr get.tl columns data
PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
PRD-0989 Estract BATCH01 Static
PRD-0990 Xstract BATCH02 Static
PRD-0991 8stract BATCH03 Static
PRD-0992 errs BATCH04 Static
PRD-0993 dgerrr BATCH05 Static
PRD-0994 Narros BATCH06 Static
PRD-0995 sef3r BATCH07 Static
文件看起来像这样:
$ cat columns
PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
$ cat data
PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
PRD-0989 Static BATCH01 Estract
PRD-0990 Static BATCH02 Xstract
PRD-0991 Static BATCH03 8stract
PRD-0992 Static BATCH04 errs
PRD-0993 Static BATCH05 dgerrr
PRD-0994 Static BATCH06 Narros
PRD-0995 Static BATCH07 sef3r
get.tl
文件中的代码如下:
(match (@colfile @datafile) *args*
(awk
(:inputs datafile)
(:let col-indices)
((= nr 1) (let ((wanted-cols (flow colfile
file-get-lines first (spl ",")))
(col-index-hash (hash-zip f 0)))
(set col-indices (mapcar col-index-hash wanted-cols))))
(t [apply prn (mapcar f col-indices)])))
看起来数据可能是用制表符分隔的,而制表符被转换为了空格,制表符宽度为四个字符。为了输出制表符分隔的内容,我们可以在 awk
宏中添加一个 (:set ofs "\t")
子句。
英文:
Solution in TXR Lisp:
$ txr get.tl columns data
PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
PRD-0989 Estract BATCH01 Static
PRD-0990 Xstract BATCH02 Static
PRD-0991 8stract BATCH03 Static
PRD-0992 errs BATCH04 Static
PRD-0993 dgerrr BATCH05 Static
PRD-0994 Narros BATCH06 Static
PRD-0995 sef3r BATCH07 Static
Files look like:
$ cat columns
PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
$ cat data
PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
PRD-0989 Static BATCH01 Estract
PRD-0990 Static BATCH02 Xstract
PRD-0991 Static BATCH03 8stract
PRD-0992 Static BATCH04 errs
PRD-0993 Static BATCH05 dgerrr
PRD-0994 Static BATCH06 Narros
PRD-0995 Static BATCH07 sef3r
Code in get.tl
:
(match (@colfile @datafile) *args*
(awk
(:inputs datafile)
(:let col-indices)
((= nr 1) (let ((wanted-cols (flow colfile
file-get-lines first (spl ",")))
(col-index-hash (hash-zip f 0)))
(set col-indices (mapcar col-index-hash wanted-cols))))
(t [apply prn (mapcar f col-indices)])))
It looks as if the data might have been tab separated, and the tabs got converted to spaces, with a tab width of four. To output tab-separated, we can add a (:set ofs "\t")
clause to the awk
macro.
答案2
得分: 0
以下是您要翻译的部分:
如果:
- 您的CSV文件很简单(没有包含逗号的带引号字段,每个记录占一行……),
- 每条记录恰好具有相同数量的字段,
- 并且您使用恰好相同数量的字段指定所需的输出顺序,
您可以尝试(使用任何POSIX awk
,已经在GNU awk
和macOS自带的awk
上测试,输出前缀为-|
):
awk -F, -v OFS=, -v o='PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE' '
BEGIN {n=split(o,h)}
NR==1 {print o; for(i=1;i<=n;i++) a[$i]=i; next}
{for(i=1;i<=n;i++) printf("%s%s", $(a[h[i]]), i==n?"\n":OFS)}
' foo.csv
-| PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
-| PRD-0989,Estract,BATCH01,Static
-| PRD-0990,Xstract,BATCH02,Static
-| PRD-0991,8stract,BATCH03,Static
-| PRD-0992,errs,BATCH04,Static
-| PRD-0993,dgerrr,BATCH05,Static
-| PRD-0994,Narros,BATCH06,Static
-| PRD-0995,sef3r,BATCH07,Static
- 将逗号设置为输入和输出字段分隔符(
-F, -v OFS=,
)。 - 将所需的输出顺序作为变量
o
传递(-v o='...'
)。 - 在
BEGIN
块中,将变量o
在输入字段分隔符上拆分,将结果存储在数组h
中,并将列数存储在变量n
中。 - 对于第一行,将字段存储在数组
a
中,字段号存储在值中(a[$i]=i
)。进入下一行(next
)。 - 对于其他行,按所需顺序打印前
n
个字段($(a[h[i]])
),后跟输出字段分隔符,除了最后一个字段后跟换行符。
如果所需的输出顺序存储在另一个文件的第一行中,例如 bar.csv
:
awk -F, -v OFS=, '
NR==1 {o=$0; n=split(o,h); next}
NR!= FNR && FNR==1 {print o; for(i=1;i<=n;i++) a[$i]=i; next}
{for(i=1;i<=n;i++) printf("%s%s", $(a[h[i]]), i==n?"\n":OFS)}
' bar.csv foo.csv
NR==1
仅对第一个文件的第一行为真。NR!= FNR && FNR==1
仅对第二个文件的第一行为真。
英文:
If:
- your CSV is simple (no quoted fields containing commas, one line per record...),
- and each record has exactly the same number of fields,
- and you specify the desired output order with exactly the same number of fields,
you can try (with any POSIX awk
, tested with GNU awk
and the awk
that comes with macOS, output prefixed with -|
):
awk -F, -v OFS=, -v o='PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE' '
BEGIN {n=split(o,h)}
NR==1 {print o; for(i=1;i<=n;i++) a[$i]=i; next}
{for(i=1;i<=n;i++) printf("%s%s", $(a[h[i]]), i==n?"\n":OFS)}
' foo.csv
-| PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
-| PRD-0989,Estract,BATCH01,Static
-| PRD-0990,Xstract,BATCH02,Static
-| PRD-0991,8stract,BATCH03,Static
-| PRD-0992,errs,BATCH04,Static
-| PRD-0993,dgerrr,BATCH05,Static
-| PRD-0994,Narros,BATCH06,Static
-| PRD-0995,sef3r,BATCH07,Static
- Set comma as input and output field separators (
-F, -v OFS=,
). - Pass desired output order as variable
o
(-v o='...'
). - In
BEGIN
block split variableo
on input fields separator, store the result in arrayh
and the number of columns in variablen
. - For the first line store in array
a
the fields as keys and the field numbers as values (a[$i]=i
). Go to next line (next
). - For the other lines print the
n
first fields in desired order ($(a[h[i]])
), followed by the output field separator except the last that is followed by a newline.
If the desired output order is stored in the first line of another file, say bar.csv
:
awk -F, -v OFS=, '
NR==1 {o=$0; n=split(o,h); next}
NR!= FNR && FNR==1 {print o; for(i=1;i<=n;i++) a[$i]=i; next}
{for(i=1;i<=n;i++) printf("%s%s", $(a[h[i]]), i==n?"\n":OFS)}
' bar.csv foo.csv
NR==1
is true only for the first line of the first file.NR!= FNR && FNR==1
is true only for the first line of the second file.
答案3
得分: 0
以下是翻译好的部分:
$ head *file
==> input_file <==
PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
PRD-0989 Static BATCH01 Estract
PRD-0990 Static BATCH02 Xstract
PRD-0991 Static BATCH03 8stract
PRD-0992 Static BATCH04 errs
PRD-0993 Static BATCH05 dgerrr
PRD-0994 Static BATCH06 Narros
PRD-0995 Static BATCH07 sef3r
==> order_file <==
PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
---
awk -F' *|,' '
NR==FNR{
for(i=1;i<=NF;i++) a[$i] = i
next
}
FNR==1{
for(i=1;i<=NF;i++) b[i] = a[$i]
}
{
for (i=1;i<=length(b);i++) printf " %s", $(b[i])
print ""
}
' order_file input_file | column -t
PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
PRD-0989 Estract BATCH01 Static
PRD-0990 Xstract BATCH02 Static
PRD-0991 8stract BATCH03 Static
PRD-0992 errs BATCH04 Static
PRD-0993 dgerrr BATCH05 Static
PRD-0994 Narros BATCH06 Static
PRD-0995 sef3r BATCH07 Static
<details>
<summary>英文:</summary>
$ head *file
==> input_file <==
PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
PRD-0989 Static BATCH01 Estract
PRD-0990 Static BATCH02 Xstract
PRD-0991 Static BATCH03 8stract
PRD-0992 Static BATCH04 errs
PRD-0993 Static BATCH05 dgerrr
PRD-0994 Static BATCH06 Narros
PRD-0995 Static BATCH07 sef3r
==> order_file <==
PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
---
```sh
awk -F' *|,' '
NR==FNR{
for(i=1;i<=NF;i++) a[$i] = i
next
}
FNR==1{
for(i=1;i<=NF;i++) b[i] = a[$i]
}
{
for (i=1;i<=length(b);i++) printf " %s", $(b[i])
print ""
}
' order_file input_file | column -t
PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
PRD-0989 Estract BATCH01 Static
PRD-0990 Xstract BATCH02 Static
PRD-0991 8stract BATCH03 Static
PRD-0992 errs BATCH04 Static
PRD-0993 dgerrr BATCH05 Static
PRD-0994 Narros BATCH06 Static
PRD-0995 sef3r BATCH07 Static
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论