英文:
Rearranging column/values in csv file based on input column order
问题
我需要你在Unix/AWK中的帮助,将CSV文件(逗号分隔文件)按照另一个文件中给定的顺序进行排序。我在这方面的知识有限。
示例:需要重新排列的文件。
PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
PRD-0989 Static BATCH01 Estract
PRD-0990 Static BATCH02 Xstract
PRD-0991 Static BATCH03 8stract
PRD-0992 Static BATCH04 errs
PRD-0993 Static BATCH05 dgerrr
PRD-0994 Static BATCH06 Narros
PRD-0995 Static BATCH07 sef3r
如果将输入的列顺序传递为以下内容(可以存储在另一个文件中):
PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
那么我的输出文件应该如下所示:
PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
PRD-0989 Estract BATCH01 Static
PRD-0990 Xstract BATCH02 Static
PRD-0991 8stract BATCH03 Static
PRD-0992 errs BATCH04 Static
PRD-0993 dgerrr BATCH05 Static
PRD-0994 Narros BATCH06 Static
PRD-0995 sef3r BATCH07 Static
英文:
i need your help in unix/awk to order the data in csv file (comma separated file) to certain order given in another file. my knowledge is limited on this.
example: File to rearrange.
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-html -->
PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
PRD-0989 Static BATCH01 Estract
PRD-0990 Static BATCH02 Xstract
PRD-0991 Static BATCH03 8stract
PRD-0992 Static BATCH04 errs
PRD-0993 Static BATCH05 dgerrr
PRD-0994 Static BATCH06 Narros
PRD-0995 Static BATCH07 sef3r
<!-- end snippet -->
and if pass input column order as [This could be stored in another file]
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-html -->
PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
<!-- end snippet -->
then my output file should come as below
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-html -->
PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
PRD-0989 Estract BATCH01 Static
PRD-0990 Xstract BATCH02 Static
PRD-0991 8stract BATCH03 Static
PRD-0992 errs BATCH04 Static
PRD-0993 dgerrr BATCH05 Static
PRD-0994 Narros BATCH06 Static
PRD-0995 sef3r BATCH07 Static
<!-- end snippet -->
答案1
得分: 0
在TXR Lisp中的解决方案:
$ txr get.tl columns data
PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
PRD-0989 Estract BATCH01 Static
PRD-0990 Xstract BATCH02 Static
PRD-0991 8stract BATCH03 Static
PRD-0992 errs BATCH04 Static
PRD-0993 dgerrr BATCH05 Static
PRD-0994 Narros BATCH06 Static
PRD-0995 sef3r BATCH07 Static
文件内容如下:
$ cat columns
PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
$ cat data
PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
PRD-0989 Static BATCH01 Estract
PRD-0990 Static BATCH02 Xstract
PRD-0991 Static BATCH03 8stract
PRD-0992 Static BATCH04 errs
PRD-0993 Static BATCH05 dgerrr
PRD-0994 Static BATCH06 Narros
PRD-0995 Static BATCH07 sef3r
get.tl
中的代码如下:
(match (@colfile @datafile) *args*
(awk
(:inputs datafile)
(:let col-indices)
((= nr 1) (let ((wanted-cols (flow colfile
file-get-lines first (spl ",")))
(col-index-hash (hash-zip f 0)))
(set col-indices (mapcar col-index-hash wanted-cols))))
(t [apply prn (mapcar f col-indices)])))
看起来数据可能是以制表符分隔的,而制表符被转换为空格,制表符宽度为四个空格。要输出以制表符分隔的内容,我们可以在awk
宏中添加(:set ofs "\t")
子句。
英文:
Solution in TXR Lisp:
$ txr get.tl columns data
PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
PRD-0989 Estract BATCH01 Static
PRD-0990 Xstract BATCH02 Static
PRD-0991 8stract BATCH03 Static
PRD-0992 errs BATCH04 Static
PRD-0993 dgerrr BATCH05 Static
PRD-0994 Narros BATCH06 Static
PRD-0995 sef3r BATCH07 Static
Files look like:
$ cat columns
PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
$ cat data
PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
PRD-0989 Static BATCH01 Estract
PRD-0990 Static BATCH02 Xstract
PRD-0991 Static BATCH03 8stract
PRD-0992 Static BATCH04 errs
PRD-0993 Static BATCH05 dgerrr
PRD-0994 Static BATCH06 Narros
PRD-0995 Static BATCH07 sef3r
Code in get.tl
:
(match (@colfile @datafile) *args*
(awk
(:inputs datafile)
(:let col-indices)
((= nr 1) (let ((wanted-cols (flow colfile
file-get-lines first (spl ",")))
(col-index-hash (hash-zip f 0)))
(set col-indices (mapcar col-index-hash wanted-cols))))
(t [apply prn (mapcar f col-indices)])))
It looks as if the data might have been tab separated, and the tabs got converted to spaces, with a tab width of four. To output tab-separated, we can add a (:set ofs "\t")
clause to the awk
macro.
答案2
得分: 0
如果:
- 你的CSV文件很简单(没有包含逗号的引用字段,每条记录占据一行...),
- 每条记录的字段数完全相同,
- 你用相同数量的字段指定了期望的输出顺序,
你可以尝试以下方法(使用任何POSIX的awk
,已在GNU awk
和macOS自带的awk
上测试,输出以-|
为前缀):
awk -F, -v OFS=, -v o='PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE' '
BEGIN {n=split(o,h)}
NR==1 {print o; for(i=1;i<=n;i++) a[$i]=i; next}
{for(i=1;i<=n;i++) printf("%s%s", $(a[h[i]]), i==n?"\n":OFS)}
' foo.csv
-| PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
-| PRD-0989,Estract,BATCH01,Static
-| PRD-0990,Xstract,BATCH02,Static
-| PRD-0991,8stract,BATCH03,Static
-| PRD-0992,errs,BATCH04,Static
-| PRD-0993,dgerrr,BATCH05,Static
-| PRD-0994,Narros,BATCH06,Static
-| PRD-0995,sef3r,BATCH07,Static
- 将逗号设置为输入和输出字段分隔符(
-F, -v OFS=,
)。 - 将期望的输出顺序作为变量
o
传递(-v o='...'
)。 - 在
BEGIN
块中,将变量o
按输入字段分隔符拆分,将结果存储在数组h
中,并将列数存储在变量n
中。 - 对于第一行,在数组
a
中将字段作为键,字段号作为值进行存储(a[$i]=i
)。然后跳到下一行(next
)。 - 对于其他行,按照期望的顺序打印前
n
个字段($(a[h[i]])
),字段之间用输出字段分隔符分隔,最后一个字段后面是换行符。
如果期望的输出顺序存储在另一个文件的第一行中,比如bar.csv
:
awk -F, -v OFS=, '
NR==1 {o=$0; n=split(o,h); next}
NR!= FNR && FNR==1 {print o; for(i=1;i<=n;i++) a[$i]=i; next}
{for(i=1;i<=n;i++) printf("%s%s", $(a[h[i]]), i==n?"\n":OFS)}
' bar.csv foo.csv
NR==1
只对第一个文件的第一行成立。NR!= FNR && FNR==1
只对第二个文件的第一行成立。
英文:
If:
- your CSV is simple (no quoted fields containing commas, one line per record...),
- and each record has exactly the same number of fields,
- and you specify the desired output order with exactly the same number of fields,
you can try (with any POSIX awk
, tested with GNU awk
and the awk
that comes with macOS, output prefixed with -|
):
awk -F, -v OFS=, -v o='PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE' '
BEGIN {n=split(o,h)}
NR==1 {print o; for(i=1;i<=n;i++) a[$i]=i; next}
{for(i=1;i<=n;i++) printf("%s%s", $(a[h[i]]), i==n?"\n":OFS)}
' foo.csv
-| PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
-| PRD-0989,Estract,BATCH01,Static
-| PRD-0990,Xstract,BATCH02,Static
-| PRD-0991,8stract,BATCH03,Static
-| PRD-0992,errs,BATCH04,Static
-| PRD-0993,dgerrr,BATCH05,Static
-| PRD-0994,Narros,BATCH06,Static
-| PRD-0995,sef3r,BATCH07,Static
- Set comma as input and output field separators (
-F, -v OFS=,
). - Pass desired output order as variable
o
(-v o='...'
). - In
BEGIN
block split variableo
on input fields separator, store the result in arrayh
and the number of columns in variablen
. - For the first line store in array
a
the fields as keys and the field numbers as values (a[$i]=i
). Go to next line (next
). - For the other lines print the
n
first fields in desired order ($(a[h[i]])
), followed by the output field separator except the last that is followed by a newline.
If the desired output order is stored in the first line of another file, say bar.csv
:
awk -F, -v OFS=, '
NR==1 {o=$0; n=split(o,h); next}
NR!= FNR && FNR==1 {print o; for(i=1;i<=n;i++) a[$i]=i; next}
{for(i=1;i<=n;i++) printf("%s%s", $(a[h[i]]), i==n?"\n":OFS)}
' bar.csv foo.csv
NR==1
is true only for the first line of the first file.NR!= FNR && FNR==1
is true only for the first line of the second file.
答案3
得分: 0
awk -F' *|,' '
NR==FNR{
for(i=1;i<=NF;i++) a[$i] = i
next
}
FNR==1{
for(i=1;i<=NF;i++) b[i] = a[$i]
}
{
for (i=1;i<=length(b);i++) printf " %s", $(b[i])
print ""
}
' order_file input_file | column -t
输出结果:
PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
PRD-0989 Estract BATCH01 Static
PRD-0990 Xstract BATCH02 Static
PRD-0991 8stract BATCH03 Static
PRD-0992 errs BATCH04 Static
PRD-0993 dgerrr BATCH05 Static
PRD-0994 Narros BATCH06 Static
PRD-0995 sef3r BATCH07 Static
英文:
$ head *file
==> input_file <==
PRODUCTCODE SITE BATCHID LV1P_DESCRIPTION
PRD-0989 Static BATCH01 Estract
PRD-0990 Static BATCH02 Xstract
PRD-0991 Static BATCH03 8stract
PRD-0992 Static BATCH04 errs
PRD-0993 Static BATCH05 dgerrr
PRD-0994 Static BATCH06 Narros
PRD-0995 Static BATCH07 sef3r
==> order_file <==
PRODUCTCODE,LV1P_DESCRIPTION,BATCHID,SITE
awk -F' *|,' '
NR==FNR{
for(i=1;i<=NF;i++) a[$i] = i
next
}
FNR==1{
for(i=1;i<=NF;i++) b[i] = a[$i]
}
{
for (i=1;i<=length(b);i++) printf " %s", $(b[i])
print ""
}
' order_file input_file | column -t
PRODUCTCODE LV1P_DESCRIPTION BATCHID SITE
PRD-0989 Estract BATCH01 Static
PRD-0990 Xstract BATCH02 Static
PRD-0991 8stract BATCH03 Static
PRD-0992 errs BATCH04 Static
PRD-0993 dgerrr BATCH05 Static
PRD-0994 Narros BATCH06 Static
PRD-0995 sef3r BATCH07 Static
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论