英文:
How to split a field and then to print the last element using awk
问题
I am trying to edit a file which has this format:
field1 field2 field3 gene_id "xxxxx"; transcript_id "XM_xxxxxxxx.x"; db_xref "GeneID:102885392"; exon_number "1";
I would like as output:
field1 field2 field3 exon_number "1";
I am using awk to do it, but I failed to print the last part of the last field after splitting it. Here is my code:
awk '{split($4,a,";"); print ($1, $2,$3, a[$NF])}' input
I know a[$NF]
is not working, but how to indicate the last subfield; is it the last element of the array? (In my file exon_number is not always the 5th element, but always the last one).
英文:
I am trying to edit a file which has this format:
field1 field2 field3 gene_id "xxxxx"; transcript_id "XM_xxxxxxxx.x"; db_xref "GeneID:102885392"; exon_number "1";
I would like as output:
field1 field2 field3 exon_number "1";
I am using awk to do it, but I failed to print the last part of the last field after splitting it. Here is my code:
awk '{split($4,a,";"); print ($1, $2,$3, a[$NF])}' input
I know a[$NF]
is not working, but how to indicate the last subfield; is it the last element of the array? (In my file exon_number is not always the 5th element, but always the last one).
答案1
得分: 3
exon_number "1"
是你第二个最后一个以 ;
分隔的子字段,而不是最后一个,因为最后一个 ;
后面是一个空字符串,你正在进行拆分。
awk 'BEGIN{FS=OFS="\t"} {n=split($4,a,/[[:space:]]*;[[:space:]]*/); print $1, $2, $3, a[n-1]";"}' input
或者:
awk 'BEGIN{FS=OFS="\t"} {n=split($4,a,/[[:space:]]*;[[:space:]]*/); $4=a[n-1]";"; print}' input
在 https://www.gnu.org/software/gawk/manual/gawk.html#String-Functions 上查看 split()
。
英文:
exon_number "1"
is your 2nd-last ;
-separated subfield, not your last one since there's a null string after the last ;
you're splitting on.
awk 'BEGIN{FS=OFS="\t"} {n=split($4,a,/[[:space:]]*;[[:space:]]*/); print $1, $2, $3, a[n-1]";"}' input
or:
awk 'BEGIN{FS=OFS="\t"} {n=split($4,a,/[[:space:]]*;[[:space:]]*/); $4=a[n-1]";"; print}' input
See split()
at https://www.gnu.org/software/gawk/manual/gawk.html#String-Functions
答案2
得分: 1
$ STR='field1<\t>field2<\t>field3<\t>gene_id "xxxxx"; transcript_id "XM_xxxxxxxx.x"; db_xref "GeneID:102885392"; exon_number "1";'
$ awk -F'; ' '{sub(/>[^>]*$/,">",$1); $0=$1 $NF}1' <<<"$STR"
field1<\t>field2<\t>field3<\t>exon_number "1";
英文:
$ STR='field1<\t>field2<\t>field3<\t>gene_id "xxxxx"; transcript_id "XM_xxxxxxxx.x"; db_xref "GeneID:102885392"; exon_number "1";'
$ awk -F'; ' '{sub(/>[^>]*$/,">",$1); $0=$1 $NF}1' <<<"$STR"
field1<\t>field2<\t>field3<\t>exon_number "1";
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论