awk – 如何在.pdb文件中添加一行 “TER”

huangapple go评论85阅读模式
英文:

awk - how to add one line "TER" in .pdb file

问题

这是我的.pdb文件片段:

ATOM     73  HG1 GLU     4      77.769  51.123  52.300  1.00  0.00           H
ATOM     74  HG2 GLU     4      78.465  52.119  52.349  1.00  0.00           H
ATOM     75  CD  GLU     4      79.068  49.945  51.438  1.00  0.00           C
ATOM     76  OE1 GLU     4      80.069  49.715  50.698  1.00  0.00           O
ATOM     77  OE2 GLU     4      78.545  49.062  52.176  1.00  0.00           O
ATOM     78  C   GLU     4      81.179  52.948  53.610  1.00  0.00           C
ATOM     79  O   GLU     4      80.203  53.460  54.165  1.00  0.00           O
ATOM     80  N   GLU     5      82.590  53.305  53.698  1.00  0.00           N
ATOM     81  HN  GLU     5      83.090  53.117  52.847  1.00  0.00           H
ATOM     82  CA  GLU     5      83.454  54.267  54.627  1.00  0.00           C
ATOM     83  HA  GLU     5      83.749  55.087  53.980  1.00  0.00           H
ATOM     84  CB  GLU     5      82.258  54.565  55.220  1.00  0.00           C

我尝试编写一个awk脚本,在第4个氨基酸残基之后直接添加"TER"行(氨基酸残基的编号在第5列附近,接近三字母氨基酸代码)。

我的脚本如下,但它不起作用(它不在所需位置向pdb文件添加新行"TER"):

awk 'NR==5 {print; print "TER"} NR!=5' my_pdb.pdb > pdb-with-ter.pdb

我尝试过类似这样的东西:

awk 'NR==5 {print; print "TER"} NR!=5' my_pdb.pdb > pdb-with-ter.pdb

最后,我希望获得这样的片段:

ATOM     73  HG1 GLU     4      77.769  51.123  52.300  1.00  0.00           H
ATOM     74  HG2 GLU     4      78.465  52.119  52.349  1.00  0.00           H
ATOM     75  CD  GLU     4      79.068  49.945  51.438  1.00  0.00           C
ATOM     76  OE1 GLU     4      80.069  49.715  50.698  1.00  0.00           O
ATOM     77  OE2 GLU     4      78.545  49.062  52.176  1.00  0.00           O
ATOM     78  C   GLU     4      81.179  52.948  53.610  1.00  0.00           C
ATOM     79  O   GLU     4      80.203  53.460  54.165  1.00  0.00           O
TER
ATOM     80  N   GLU     5      82.590  53.305  53.698  1.00  0.00           N
ATOM     81  HN  GLU     5      83.090  53.117  52.847  1.00  0.00           H
ATOM     82  CA  GLU     5      83.454  54.267  54.627  1.00  0.00           C
ATOM     83  HA  GLU     5      83.749  55.087  53.980  1.00  0.00           H
ATOM     84  CB  GLU     5      82.258  54.565  55.220  1.00  0.00           C
英文:

This is my fragment .pdb file:

ATOM     73  HG1 GLU     4      77.769  51.123  52.300  1.00  0.00           H
ATOM     74  HG2 GLU     4      78.465  52.119  52.349  1.00  0.00           H
ATOM     75  CD  GLU     4      79.068  49.945  51.438  1.00  0.00           C
ATOM     76  OE1 GLU     4      80.069  49.715  50.698  1.00  0.00           O
ATOM     77  OE2 GLU     4      78.545  49.062  52.176  1.00  0.00           O
ATOM     78  C   GLU     4      81.179  52.948  53.610  1.00  0.00           C
ATOM     79  O   GLU     4      80.203  53.460  54.165  1.00  0.00           O
ATOM     80  N   GLU     5      82.590  53.305  53.698  1.00  0.00           N
ATOM     81  HN  GLU     5      83.090  53.117  52.847  1.00  0.00           H
ATOM     82  CA  GLU     5      83.454  54.267  54.627  1.00  0.00           C
ATOM     83  HA  GLU     5      83.749  55.087  53.980  1.00  0.00           H
ATOM     84  CB  GLU     5      82.258  54.565  55.220  1.00  0.00           C

I try write script in awk, which will add "TER" line directly after 4th aminoacid residue (number of aminoacid residue is given in 5th column, near three letter code of aminoacid).

My script looks like below, but it doesn't work (it doesn't add new line "TER" to pdb file in required space):

awk 'NR==5 {print; print "TER"} NR!=5' my_pdb.pdb > pdb-with-ter.pdb 

Could You help me ?

I tried something like this:

awk 'NR==5 {print; print "TER"} NR!=5' my_pdb.pdb > pdb-with-ter.pdb

Finally I want obtain such fragment:

ATOM     73  HG1 GLU     4      77.769  51.123  52.300  1.00  0.00           H
ATOM     74  HG2 GLU     4      78.465  52.119  52.349  1.00  0.00           H
ATOM     75  CD  GLU     4      79.068  49.945  51.438  1.00  0.00           C
ATOM     76  OE1 GLU     4      80.069  49.715  50.698  1.00  0.00           O
ATOM     77  OE2 GLU     4      78.545  49.062  52.176  1.00  0.00           O
ATOM     78  C   GLU     4      81.179  52.948  53.610  1.00  0.00           C
ATOM     79  O   GLU     4      80.203  53.460  54.165  1.00  0.00           O
TER
ATOM     80  N   GLU     5      82.590  53.305  53.698  1.00  0.00           N
ATOM     81  HN  GLU     5      83.090  53.117  52.847  1.00  0.00           H
ATOM     82  CA  GLU     5      83.454  54.267  54.627  1.00  0.00           C
ATOM     83  HA  GLU     5      83.749  55.087  53.980  1.00  0.00           H
ATOM     84  CB  GLU     5      82.258  54.565  55.220  1.00  0.00           C

答案1

得分: 2

假设:

  • 当第5列的值从4变为5时,添加一行新行(TER)

一个awk的想法:

prev==4 && $5==5 { print "TER" } 
                 { prev = $5   }
1

或者作为一行命令:

这将生成:

ATOM     73  HG1 GLU     4      77.769  51.123  52.300  1.00  0.00           H
ATOM     74  HG2 GLU     4      78.465  52.119  52.349  1.00  0.00           H
ATOM     75  CD  GLU     4      79.068  49.945  51.438  1.00  0.00           C
ATOM     76  OE1 GLU     4      80.069  49.715  50.698  1.00  0.00           O
ATOM     77  OE2 GLU     4      78.545  49.062  52.176  1.00  0.00           O
ATOM     78  C   GLU     4      81.179  52.948  53.610  1.00  0.00           C
ATOM     79  O   GLU     4      80.203  53.460  54.165  1.00  0.00           O
TER
ATOM     80  N   GLU     5      82.590  53.305  53.698  1.00  0.00           N
ATOM     81  HN  GLU     5      83.090  53.117  52.847  1.00  0.00           H
ATOM     82  CA  GLU     5      83.454  54.267  54.627  1.00  0.00           C
ATOM     83  HA  GLU     5      83.749  55.087  53.980  1.00  0.00           H
ATOM     84  CB  GLU     5      82.258  54.565  55.220  1.00  0.00           C
英文:

Assumptions:

  • when the 5th column value changes from 4 to 5, add a new line (TER)

One awk idea:

awk '
prev==4 && $5==5 { print "TER" } 
                 { prev = $5   }
1
' my_pdb.pdb

# or as a one-liner

awk 'prev=="4" && $5=="5" {print "TER"} {prev=$5} 1' my_pdb.pdb

This generates:

ATOM     73  HG1 GLU     4      77.769  51.123  52.300  1.00  0.00           H
ATOM     74  HG2 GLU     4      78.465  52.119  52.349  1.00  0.00           H
ATOM     75  CD  GLU     4      79.068  49.945  51.438  1.00  0.00           C
ATOM     76  OE1 GLU     4      80.069  49.715  50.698  1.00  0.00           O
ATOM     77  OE2 GLU     4      78.545  49.062  52.176  1.00  0.00           O
ATOM     78  C   GLU     4      81.179  52.948  53.610  1.00  0.00           C
ATOM     79  O   GLU     4      80.203  53.460  54.165  1.00  0.00           O
TER
ATOM     80  N   GLU     5      82.590  53.305  53.698  1.00  0.00           N
ATOM     81  HN  GLU     5      83.090  53.117  52.847  1.00  0.00           H
ATOM     82  CA  GLU     5      83.454  54.267  54.627  1.00  0.00           C
ATOM     83  HA  GLU     5      83.749  55.087  53.980  1.00  0.00           H
ATOM     84  CB  GLU     5      82.258  54.565  55.220  1.00  0.00           C

huangapple
  • 本文由 发表于 2023年7月10日 18:59:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76653047.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定