2023年5月10日 16:46:41go评论66阅读模式

英文:

renumbering values in .pdb file

问题

你想要将第6列的编号从1开始更改为与上面一样，但从6开始而不是从1开始。另外，你还想将第2列的编号从1开始更改为从53开始。以下是你期望的输出：

ATOM 51 O ARG 4 18.189 21.505 -30.356 0.00 0.00
ATOM 52 OXT ARG 5 19.822 21.322 -27.773 0.00 0.00
TER
ATOM 53 N MET A 6 -9.976 22.279 65.378 1.00 37.35 N
ATOM 54 H MET A 6 -9.180 21.915 65.882 1.00 37.35 H
ATOM 55 N LYS A 7 -11.970 21.837 62.804 1.00 40.65 N
ATOM 56 H LYS A 7 -11.194 21.438 62.295 1.00 40.65 H

你提供的代码看起来几乎正确，但有些地方需要更正。特别是，你的脚本中有一些 HTML 编码，应该将其替换为相应的字符。此外，你需要将echo $line替换为echo "$line"，以保留行的格式。最后，确保你的脚本在执行前具有执行权限（使用chmod +x your_script.sh）。

如果你需要进一步的帮助或有其他问题，请告诉我。

英文:

I have one .pdb file, which correspond to peptide-protein complex. Fragment of this file looks like this:

edit: replaced the sample input with a subset of what OP provided afterwards.

ATOM     51  O   ARG     4      18.189  21.505 -30.356  0.00  0.00
ATOM     52  OXT ARG     5      19.822  21.322 -27.773  0.00  0.00
TER
ATOM      1  N   MET A   1      -9.976  22.279  65.378  1.00 37.35           N
ATOM      2  H   MET A   1      -9.180  21.915  65.882  1.00 37.35           H
ATOM      3  N   LYS A   2     -11.970  21.837  62.804  1.00 40.65           N
ATOM      4  H   LYS A   2     -11.194  21.438  62.295  1.00 40.65           H

I want change numeration 6th column for all lines directly after first "TER" line in my .pdb file. Now, numeration in 6th column start from 1 but I want, that numeration in 6th column started in the same way as above, but started with 6 instead of 1.

Additionally, I want change numeration in 2nd column for all lines also directly after first "TER" line in my .pdb file. Now, numeration in 2nd column start from 1 but I want, that numeration in 2nd column started in the same way as above, but started with 53 instead of 1.

edit: added an expected output that illustrates OP's original goal.

The expected output would be:

ATOM 51 O ARG 4 18.189 21.505 -30.356 0.00 0.00
ATOM 52 OXT ARG 5 19.822 21.322 -27.773 0.00 0.00
TER
ATOM 53 N MET A 6 -9.976 22.279 65.378 1.00 37.35 N
ATOM 54 H MET A 6 -9.180 21.915 65.882 1.00 37.35 H
ATOM 55 N LYS A 7 -11.970 21.837 62.804 1.00 40.65 N
ATOM 56 H LYS A 7 -11.194 21.438 62.295 1.00 40.65 H

Can You help me ?

I tried doing this by using bash script, but when I'm opening my edited .pdb file in Pymol I have something like this

This is my code:

#!/bin/bash

# read input file and output file names from command line arguments
input_file=complex.pdb
output_file=renum.pdb

# initialize residue counter and flag for tracking first &quot;TER&quot; occurrence
residue_num=1
ter_found=false

# loop through the lines of the input file
while read line
do
    # check if the line contains &quot;TER&quot;
    if [[ &quot;$line&quot; == &quot;TER&quot; ]]
    then
        # if it does, reset the residue counter to 5 and set the flag to true
        residue_num=5
        ter_found=true
    else
        # if it doesn&#39;t, extract the residue name and chain ID from the line
        residue_name=$(echo $line | awk &#39;{print $4}&#39;)
        chain_id=$(echo $line | awk &#39;{print $5}&#39;)

        # if the residue name or chain ID has changed, increment the residue counter
        if [[ &quot;$residue_name&quot; != &quot;$prev_residue_name&quot; || &quot;$chain_id&quot; != &quot;$prev_chain_id&quot; ]]
        then
            residue_num=$((residue_num+1))
        fi

        # if the first &quot;TER&quot; has been found, replace the 6th column with the new residue number
        if [[ &quot;$ter_found&quot; == true ]]
        then
            line=$(echo $line | awk -v num=&quot;$residue_num&quot; &#39;{$6=num; print}&#39;)
        fi

        # save the current residue name and chain ID for comparison in the next iteration
        prev_residue_name=$residue_name
        prev_chain_id=$chain_id
    fi

    # write the modified line to the output file
    echo $line &gt;&gt; $output_file
done &lt; $input_file

Finally my renum.pdb file looks like this (this is only fragment this .pdb file):

edit: The following output was generated by processing the "updated" input with OP's code.

ATOM 51 O ARG 4 18.189 21.505 -30.356 0.00 0.00
ATOM 52 OXT ARG 5 19.822 21.322 -27.773 0.00 0.00
TER
ATOM 1 N MET A 6 -9.976 22.279 65.378 1.00 37.35 N
ATOM 2 H MET A 6 -9.180 21.915 65.882 1.00 37.35 H
ATOM 3 N LYS A 7 -11.970 21.837 62.804 1.00 40.65 N
ATOM 4 H LYS A 7 -11.194 21.438 62.295 1.00 40.65 H

答案1

得分: 2

以下是您要翻译的内容：

一个PDB文件具有固定宽度的列，因此不应该将其转换为单空格分隔的文件，因为其他程序在处理后将无法处理它。

对于ATOM记录，规范如下：

列 & (宽度)	数据类型	字段	定义
1 - 6 (6)	字符串(6)	记录名	"ATOM "
7 - 11 (5)	整数	序号	原子序列号
12 (1)	-	-	-
13 - 16 (4)	原子	名称	原子名称
17 (1)	字符	altLoc	替代位置指示符
18 - 20 (3)	残基名称	resName	残基名称
21 (1)	-	-	-
22 (1)	字符	chainID	链标识符
23 - 26 (4)	整数	resSeq	残基序号
27 (1)	字符	iCode	残基插入代码
28 - 30 (3)	-	-	-
31 - 38 (8)	实数(8.3)	x	X坐标的正交坐标（以Å为单位）
39 - 46 (8)	实数(8.3)	y	Y坐标的正交坐标（以Å为单位）
47 - 54 (8)	实数(8.3)	z	Z坐标的正交坐标（以Å为单位）
55 - 60 (6)	实数(6.2)	占据	占据度
61 - 66 (6)	实数(6.2)	温度因子	温度因子
67 - 76 (10)	-	-	-
77 - 78 (2)	长字符串(2)	元素	元素符号，右对齐
79 - 80 (2)	长字符串(2)	电荷	原子电荷

要使用任何awk来处理它，您可以使用substr来提取相关部分：

awk '
    /^TER/  { after_TER = 1 }
    /^ATOM/ {
        if (after_TER) {
            resName = substr($0,18,3)
            if ( resName != previous_resName ) {
                previous_resName = resName
                resSeq = sprintf("%4d", resSeq + 1)
            }
            $0 = substr($0,1,6) \
                 sprintf("%5d", ++serial) \
                 substr($0,12,11) \
                 resSeq \
                 substr($0,27)
        } else {
            serial = substr($0,7,5)
            resSeq = substr($0,23,4)
        }
    }
    { print }
' mod.pdb > renum.pdb

示例：

输入：

ATOM     51  O   ARG     4      18.189  21.505 -30.356  0.00  0.00
ATOM     52  OXT ARG     5      19.822  21.322 -27.773  0.00  0.00
TER
ATOM      1  N   MET A   1      -9.976  22.279  65.378  1.00 37.35           N
ATOM      2  H   MET A   1      -9.180  21.915  65.882  1.00 37.35           H
ATOM      3  N   LYS A   2     -11.970  21.837  62.804  1.00 40.65           N
ATOM      4  H   LYS A   2     -11.194  21.438  62.295  1.00 40.65           H

输出：

ATOM     51  O   ARG     4      18.189  21.505 -30.356  0.00  0.00
ATOM     52  OXT ARG     5      19.822  21.322 -27.773  0.00  0.00
TER
ATOM     53  N   MET A   6      -9.976  22.279  65.378  1.00 37.35           N
ATOM     54  H   MET A   6      -9.180  21.915  65.882  1.00 37.35           H
ATOM     55  N   LYS A   7     -11.970  21.837  62.804  1.00 40.65           N
ATOM     56  H   LYS A   7     -11.194  21.438  62.295  1.00 40.65           H

英文:

A PDB has fixed-width columns, so you shouldn't transform it into a single-space delimited file because the other programs won't be able to process it afterwards.

For an ATOM record, the specification is:

COLUMNS & (width)	DATA TYPE	FIELD	DEFINITION
1 - 6 (6)	String(6)	recordName	"`ATOM` "
7 - 11 (5)	Integer	serial	Atom serial number
12 (1)	-	-	-
13 - 16 (4)	Atom	name	Atom name
17 (1)	Character	altLoc	Alternate location indicator
18 - 20 (3)	Residue name	resName	Residue name
21 (1)	-	-	-
22 (1)	Character	chainID	Chain identifier
23 - 26 (4)	Integer	resSeq	Residue sequence number
27 (1)	AChar	iCode	Code for insertion of residues
28 - 30 (3)	-	-	-
31 - 38 (8)	Real(8.3)	x	Orthogonal coordinates for X in Å
39 - 46 (8)	Real(8.3)	y	Orthogonal coordinates for Y in Å
47 - 54 (8)	Real(8.3)	z	Orthogonal coordinates for Z in Å
55 - 60 (6)	Real(6.2)	occupancy	Occupancy
61 - 66 (6)	Real(6.2)	tempFactor	Temperature factor
67 - 76 (10)	-	-	-
77 - 78 (2)	LString(2)	element	Element symbol, right-justified
79 - 80 (2)	LString(2)	charge	Charge on the atom

For processing it with any awk, you can use substr to extract the relevant parts:

awk &#39;
    /^TER/  { after_TER = 1 }
    /^ATOM/ {
        if (after_TER) {
            resName = substr($0,18,3)
            if ( resName != previous_resName ) {
                previous_resName = resName
                resSeq = sprintf(&quot;%4d&quot;, resSeq + 1)
            }
            $0 = substr($0,1,6) \
                 sprintf(&quot;%5d&quot;, ++serial) \
                 substr($0,12,11) \
                 resSeq \
                 substr($0,27)
        } else {
            serial = substr($0,7,5)
            resSeq = substr($0,23,4)
        }
    }
    { print }
&#39; mod.pdb &gt; renum.pdb

Example

Input:

ATOM     51  O   ARG     4      18.189  21.505 -30.356  0.00  0.00
ATOM     52  OXT ARG     5      19.822  21.322 -27.773  0.00  0.00
TER
ATOM      1  N   MET A   1      -9.976  22.279  65.378  1.00 37.35           N
ATOM      2  H   MET A   1      -9.180  21.915  65.882  1.00 37.35           H
ATOM      3  N   LYS A   2     -11.970  21.837  62.804  1.00 40.65           N
ATOM      4  H   LYS A   2     -11.194  21.438  62.295  1.00 40.65           H

Output:

ATOM     51  O   ARG     4      18.189  21.505 -30.356  0.00  0.00
ATOM     52  OXT ARG     5      19.822  21.322 -27.773  0.00  0.00
TER
ATOM     53  N   MET A   6      -9.976  22.279  65.378  1.00 37.35           N
ATOM     54  H   MET A   6      -9.180  21.915  65.882  1.00 37.35           H
ATOM     55  N   LYS A   7     -11.970  21.837  62.804  1.00 40.65           N
ATOM     56  H   LYS A   7     -11.194  21.438  62.295  1.00 40.65           H

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

重新编号.pdb文件中的数值。

问题

答案1

Example

Changing colour at column number: 在列号处更改颜色

如何在bash脚本中打印一个字符串以及该字符串的所有n行。

PyTables 在 macOS M1 上安装时与 Python 3.11 失败。

如何从 macOS 上的启动守护程序中枚举所有已登录用户会话？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论