重新编号.pdb文件中的数值。

huangapple go评论59阅读模式
英文:

renumbering values in .pdb file

问题

你想要将第6列的编号从1开始更改为与上面一样,但从6开始而不是从1开始。另外,你还想将第2列的编号从1开始更改为从53开始。以下是你期望的输出:

ATOM 51 O ARG 4 18.189 21.505 -30.356 0.00 0.00
ATOM 52 OXT ARG 5 19.822 21.322 -27.773 0.00 0.00
TER
ATOM 53 N MET A 6 -9.976 22.279 65.378 1.00 37.35 N
ATOM 54 H MET A 6 -9.180 21.915 65.882 1.00 37.35 H
ATOM 55 N LYS A 7 -11.970 21.837 62.804 1.00 40.65 N
ATOM 56 H LYS A 7 -11.194 21.438 62.295 1.00 40.65 H

你提供的代码看起来几乎正确,但有些地方需要更正。特别是,你的脚本中有一些 HTML 编码,应该将其替换为相应的字符。此外,你需要将echo $line替换为echo "$line",以保留行的格式。最后,确保你的脚本在执行前具有执行权限(使用chmod +x your_script.sh)。

如果你需要进一步的帮助或有其他问题,请告诉我。

英文:

I have one .pdb file, which correspond to peptide-protein complex. Fragment of this file looks like this:

<sup>edit: replaced the sample input with a subset of what OP provided afterwards.</sup>

ATOM     51  O   ARG     4      18.189  21.505 -30.356  0.00  0.00
ATOM     52  OXT ARG     5      19.822  21.322 -27.773  0.00  0.00
TER
ATOM      1  N   MET A   1      -9.976  22.279  65.378  1.00 37.35           N
ATOM      2  H   MET A   1      -9.180  21.915  65.882  1.00 37.35           H
ATOM      3  N   LYS A   2     -11.970  21.837  62.804  1.00 40.65           N
ATOM      4  H   LYS A   2     -11.194  21.438  62.295  1.00 40.65           H

I want change numeration 6th column for all lines directly after first "TER" line in my .pdb file. Now, numeration in 6th column start from 1 but I want, that numeration in 6th column started in the same way as above, but started with 6 instead of 1.

Additionally, I want change numeration in 2nd column for all lines also directly after first "TER" line in my .pdb file. Now, numeration in 2nd column start from 1 but I want, that numeration in 2nd column started in the same way as above, but started with 53 instead of 1.

<sub>edit: added an expected output that illustrates OP's original goal.</sub>

The expected output would be:

ATOM 51 O ARG 4 18.189 21.505 -30.356 0.00 0.00
ATOM 52 OXT ARG 5 19.822 21.322 -27.773 0.00 0.00
TER
ATOM 53 N MET A 6 -9.976 22.279 65.378 1.00 37.35 N
ATOM 54 H MET A 6 -9.180 21.915 65.882 1.00 37.35 H
ATOM 55 N LYS A 7 -11.970 21.837 62.804 1.00 40.65 N
ATOM 56 H LYS A 7 -11.194 21.438 62.295 1.00 40.65 H

Can You help me ?

I tried doing this by using bash script, but when I'm opening my edited .pdb file in Pymol I have something like this

重新编号.pdb文件中的数值。

This is my code:

#!/bin/bash

# read input file and output file names from command line arguments
input_file=complex.pdb
output_file=renum.pdb

# initialize residue counter and flag for tracking first &quot;TER&quot; occurrence
residue_num=1
ter_found=false

# loop through the lines of the input file
while read line
do
    # check if the line contains &quot;TER&quot;
    if [[ &quot;$line&quot; == &quot;TER&quot; ]]
    then
        # if it does, reset the residue counter to 5 and set the flag to true
        residue_num=5
        ter_found=true
    else
        # if it doesn&#39;t, extract the residue name and chain ID from the line
        residue_name=$(echo $line | awk &#39;{print $4}&#39;)
        chain_id=$(echo $line | awk &#39;{print $5}&#39;)

        # if the residue name or chain ID has changed, increment the residue counter
        if [[ &quot;$residue_name&quot; != &quot;$prev_residue_name&quot; || &quot;$chain_id&quot; != &quot;$prev_chain_id&quot; ]]
        then
            residue_num=$((residue_num+1))
        fi

        # if the first &quot;TER&quot; has been found, replace the 6th column with the new residue number
        if [[ &quot;$ter_found&quot; == true ]]
        then
            line=$(echo $line | awk -v num=&quot;$residue_num&quot; &#39;{$6=num; print}&#39;)
        fi

        # save the current residue name and chain ID for comparison in the next iteration
        prev_residue_name=$residue_name
        prev_chain_id=$chain_id
    fi

    # write the modified line to the output file
    echo $line &gt;&gt; $output_file
done &lt; $input_file

Finally my renum.pdb file looks like this (this is only fragment this .pdb file):

<sup>edit: The following output was generated by processing the "updated" input with OP's code.</sup>

ATOM 51 O ARG 4 18.189 21.505 -30.356 0.00 0.00
ATOM 52 OXT ARG 5 19.822 21.322 -27.773 0.00 0.00
TER
ATOM 1 N MET A 6 -9.976 22.279 65.378 1.00 37.35 N
ATOM 2 H MET A 6 -9.180 21.915 65.882 1.00 37.35 H
ATOM 3 N LYS A 7 -11.970 21.837 62.804 1.00 40.65 N
ATOM 4 H LYS A 7 -11.194 21.438 62.295 1.00 40.65 H

答案1

得分: 2

以下是您要翻译的内容:

一个PDB文件具有固定宽度的列,因此不应该将其转换为单空格分隔的文件,因为其他程序在处理后将无法处理它。

对于ATOM记录,规范如下:

列 & (宽度) 数据类型 字段 定义
1 - 6 (6) 字符串(6) 记录名 "ATOM "
7 - 11 (5) 整数 序号 原子序列号
12 (1) - - -
13 - 16 (4) 原子 名称 原子名称
17 (1) 字符 altLoc 替代位置指示符
18 - 20 (3) 残基名称 resName 残基名称
21 (1) - - -
22 (1) 字符 chainID 链标识符
23 - 26 (4) 整数 resSeq 残基序号
27 (1) 字符 iCode 残基插入代码
28 - 30 (3) - - -
31 - 38 (8) 实数(8.3) x X坐标的正交坐标(以Å为单位)
39 - 46 (8) 实数(8.3) y Y坐标的正交坐标(以Å为单位)
47 - 54 (8) 实数(8.3) z Z坐标的正交坐标(以Å为单位)
55 - 60 (6) 实数(6.2) 占据 占据度
61 - 66 (6) 实数(6.2) 温度因子 温度因子
67 - 76 (10) - - -
77 - 78 (2) 长字符串(2) 元素 元素符号,右对齐
79 - 80 (2) 长字符串(2) 电荷 原子电荷

要使用任何awk来处理它,您可以使用substr来提取相关部分:

awk '
    /^TER/  { after_TER = 1 }
    /^ATOM/ {
        if (after_TER) {
            resName = substr($0,18,3)
            if ( resName != previous_resName ) {
                previous_resName = resName
                resSeq = sprintf("%4d", resSeq + 1)
            }
            $0 = substr($0,1,6) \
                 sprintf("%5d", ++serial) \
                 substr($0,12,11) \
                 resSeq \
                 substr($0,27)
        } else {
            serial = substr($0,7,5)
            resSeq = substr($0,23,4)
        }
    }
    { print }
' mod.pdb > renum.pdb

示例:

输入:

ATOM     51  O   ARG     4      18.189  21.505 -30.356  0.00  0.00
ATOM     52  OXT ARG     5      19.822  21.322 -27.773  0.00  0.00
TER
ATOM      1  N   MET A   1      -9.976  22.279  65.378  1.00 37.35           N
ATOM      2  H   MET A   1      -9.180  21.915  65.882  1.00 37.35           H
ATOM      3  N   LYS A   2     -11.970  21.837  62.804  1.00 40.65           N
ATOM      4  H   LYS A   2     -11.194  21.438  62.295  1.00 40.65           H

输出:

ATOM     51  O   ARG     4      18.189  21.505 -30.356  0.00  0.00
ATOM     52  OXT ARG     5      19.822  21.322 -27.773  0.00  0.00
TER
ATOM     53  N   MET A   6      -9.976  22.279  65.378  1.00 37.35           N
ATOM     54  H   MET A   6      -9.180  21.915  65.882  1.00 37.35           H
ATOM     55  N   LYS A   7     -11.970  21.837  62.804  1.00 40.65           N
ATOM     56  H   LYS A   7     -11.194  21.438  62.295  1.00 40.65           H
英文:

A PDB has fixed-width columns, so you shouldn't transform it into a single-space delimited file because the other programs won't be able to process it afterwards.

For an ATOM record, the specification is:

COLUMNS & (width) DATA TYPE FIELD DEFINITION
1 - 6 (6) String(6) recordName "ATOM "
7 - 11 (5) Integer serial Atom serial number
12 (1) - - -
13 - 16 (4) Atom name Atom name
17 (1) Character altLoc Alternate location indicator
18 - 20 (3) Residue name resName Residue name
21 (1) - - -
22 (1) Character chainID Chain identifier
23 - 26 (4) Integer resSeq Residue sequence number
27 (1) AChar iCode Code for insertion of residues
28 - 30 (3) - - -
31 - 38 (8) Real(8.3) x Orthogonal coordinates for X in Å
39 - 46 (8) Real(8.3) y Orthogonal coordinates for Y in Å
47 - 54 (8) Real(8.3) z Orthogonal coordinates for Z in Å
55 - 60 (6) Real(6.2) occupancy Occupancy
61 - 66 (6) Real(6.2) tempFactor Temperature factor
67 - 76 (10) - - -
77 - 78 (2) LString(2) element Element symbol, right-justified
79 - 80 (2) LString(2) charge Charge on the atom

For processing it with any awk, you can use substr to extract the relevant parts:

awk &#39;
    /^TER/  { after_TER = 1 }
    /^ATOM/ {
        if (after_TER) {
            resName = substr($0,18,3)
            if ( resName != previous_resName ) {
                previous_resName = resName
                resSeq = sprintf(&quot;%4d&quot;, resSeq + 1)
            }
            $0 = substr($0,1,6) \
                 sprintf(&quot;%5d&quot;, ++serial) \
                 substr($0,12,11) \
                 resSeq \
                 substr($0,27)
        } else {
            serial = substr($0,7,5)
            resSeq = substr($0,23,4)
        }
    }
    { print }
&#39; mod.pdb &gt; renum.pdb
Example

Input:

ATOM     51  O   ARG     4      18.189  21.505 -30.356  0.00  0.00
ATOM     52  OXT ARG     5      19.822  21.322 -27.773  0.00  0.00
TER
ATOM      1  N   MET A   1      -9.976  22.279  65.378  1.00 37.35           N
ATOM      2  H   MET A   1      -9.180  21.915  65.882  1.00 37.35           H
ATOM      3  N   LYS A   2     -11.970  21.837  62.804  1.00 40.65           N
ATOM      4  H   LYS A   2     -11.194  21.438  62.295  1.00 40.65           H

Output:

ATOM     51  O   ARG     4      18.189  21.505 -30.356  0.00  0.00
ATOM     52  OXT ARG     5      19.822  21.322 -27.773  0.00  0.00
TER
ATOM     53  N   MET A   6      -9.976  22.279  65.378  1.00 37.35           N
ATOM     54  H   MET A   6      -9.180  21.915  65.882  1.00 37.35           H
ATOM     55  N   LYS A   7     -11.970  21.837  62.804  1.00 40.65           N
ATOM     56  H   LYS A   7     -11.194  21.438  62.295  1.00 40.65           H

huangapple
  • 本文由 发表于 2023年5月10日 16:46:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76216504.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定