怎么分割一个字段,然后用awk打印最后一个元素

huangapple go评论58阅读模式
英文:

How to split a field and then to print the last element using awk

问题

I am trying to edit a file which has this format:

field1 field2 field3 gene_id "xxxxx"; transcript_id "XM_xxxxxxxx.x"; db_xref "GeneID:102885392"; exon_number "1";

I would like as output:

field1 field2 field3 exon_number "1";

I am using awk to do it, but I failed to print the last part of the last field after splitting it. Here is my code:

awk '{split($4,a,";"); print ($1, $2,$3, a[$NF])}' input

I know a[$NF] is not working, but how to indicate the last subfield; is it the last element of the array? (In my file exon_number is not always the 5th element, but always the last one).

英文:

I am trying to edit a file which has this format:

field1 field2 field3 gene_id "xxxxx"; transcript_id "XM_xxxxxxxx.x"; db_xref "GeneID:102885392"; exon_number "1";

I would like as output:

field1 field2 field3 exon_number "1";

I am using awk to do it, but I failed to print the last part of the last field after splitting it. Here is my code:

awk '{split($4,a,";"); print ($1, $2,$3, a[$NF])}' input

I know a[$NF] is not working, but how to indicate the last subfield; is it the last element of the array? (In my file exon_number is not always the 5th element, but always the last one).

答案1

得分: 3

exon_number "1" 是你第二个最后一个以 ; 分隔的子字段,而不是最后一个,因为最后一个 ; 后面是一个空字符串,你正在进行拆分。

awk 'BEGIN{FS=OFS="\t"} {n=split($4,a,/[[:space:]]*;[[:space:]]*/); print $1, $2, $3, a[n-1]";"}' input

或者:

awk 'BEGIN{FS=OFS="\t"} {n=split($4,a,/[[:space:]]*;[[:space:]]*/); $4=a[n-1]";"; print}' input

https://www.gnu.org/software/gawk/manual/gawk.html#String-Functions 上查看 split()

英文:

exon_number "1" is your 2nd-last ;-separated subfield, not your last one since there's a null string after the last ; you're splitting on.

awk 'BEGIN{FS=OFS="\t"} {n=split($4,a,/[[:space:]]*;[[:space:]]*/); print $1, $2, $3, a[n-1]";"}' input

or:

awk 'BEGIN{FS=OFS="\t"} {n=split($4,a,/[[:space:]]*;[[:space:]]*/); $4=a[n-1]";"; print}' input

See split() at https://www.gnu.org/software/gawk/manual/gawk.html#String-Functions

答案2

得分: 1

$ STR='field1<\t>field2<\t>field3<\t>gene_id "xxxxx"; transcript_id "XM_xxxxxxxx.x"; db_xref "GeneID:102885392"; exon_number "1";'

$ awk -F'; ' '{sub(/>[^>]*$/,">",$1); $0=$1 $NF}1' <<<"$STR"
field1<\t>field2<\t>field3<\t>exon_number "1";

英文:
$ STR=&#39;field1&lt;\t&gt;field2&lt;\t&gt;field3&lt;\t&gt;gene_id &quot;xxxxx&quot;; transcript_id &quot;XM_xxxxxxxx.x&quot;; db_xref &quot;GeneID:102885392&quot;; exon_number &quot;1&quot;;&#39;    

$ awk -F&#39;; &#39; &#39;{sub(/&gt;[^&gt;]*$/,&quot;&gt;&quot;,$1); $0=$1 $NF}1&#39; &lt;&lt;&lt;&quot;$STR&quot;
field1&lt;\t&gt;field2&lt;\t&gt;field3&lt;\t&gt;exon_number &quot;1&quot;; 

huangapple
  • 本文由 发表于 2023年5月11日 18:48:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76226794.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定