英文:
renumbering values in .pdb file
问题
你想要将第6列的编号从1开始更改为与上面一样,但从6开始而不是从1开始。另外,你还想将第2列的编号从1开始更改为从53开始。以下是你期望的输出:
ATOM 51 O ARG 4 18.189 21.505 -30.356 0.00 0.00
ATOM 52 OXT ARG 5 19.822 21.322 -27.773 0.00 0.00
TER
ATOM 53 N MET A 6 -9.976 22.279 65.378 1.00 37.35 N
ATOM 54 H MET A 6 -9.180 21.915 65.882 1.00 37.35 H
ATOM 55 N LYS A 7 -11.970 21.837 62.804 1.00 40.65 N
ATOM 56 H LYS A 7 -11.194 21.438 62.295 1.00 40.65 H
你提供的代码看起来几乎正确,但有些地方需要更正。特别是,你的脚本中有一些 HTML 编码,应该将其替换为相应的字符。此外,你需要将echo $line
替换为echo "$line"
,以保留行的格式。最后,确保你的脚本在执行前具有执行权限(使用chmod +x your_script.sh
)。
如果你需要进一步的帮助或有其他问题,请告诉我。
英文:
I have one .pdb file, which correspond to peptide-protein complex. Fragment of this file looks like this:
<sup>edit: replaced the sample input with a subset of what OP provided afterwards.</sup>
ATOM 51 O ARG 4 18.189 21.505 -30.356 0.00 0.00
ATOM 52 OXT ARG 5 19.822 21.322 -27.773 0.00 0.00
TER
ATOM 1 N MET A 1 -9.976 22.279 65.378 1.00 37.35 N
ATOM 2 H MET A 1 -9.180 21.915 65.882 1.00 37.35 H
ATOM 3 N LYS A 2 -11.970 21.837 62.804 1.00 40.65 N
ATOM 4 H LYS A 2 -11.194 21.438 62.295 1.00 40.65 H
I want change numeration 6th column for all lines directly after first "TER" line in my .pdb file. Now, numeration in 6th column start from 1 but I want, that numeration in 6th column started in the same way as above, but started with 6 instead of 1.
Additionally, I want change numeration in 2nd column for all lines also directly after first "TER" line in my .pdb file. Now, numeration in 2nd column start from 1 but I want, that numeration in 2nd column started in the same way as above, but started with 53 instead of 1.
<sub>edit: added an expected output that illustrates OP's original goal.</sub>
The expected output would be:
ATOM 51 O ARG 4 18.189 21.505 -30.356 0.00 0.00
ATOM 52 OXT ARG 5 19.822 21.322 -27.773 0.00 0.00
TER
ATOM 53 N MET A 6 -9.976 22.279 65.378 1.00 37.35 N
ATOM 54 H MET A 6 -9.180 21.915 65.882 1.00 37.35 H
ATOM 55 N LYS A 7 -11.970 21.837 62.804 1.00 40.65 N
ATOM 56 H LYS A 7 -11.194 21.438 62.295 1.00 40.65 H
Can You help me ?
I tried doing this by using bash script, but when I'm opening my edited .pdb file in Pymol I have something like this
This is my code:
#!/bin/bash
# read input file and output file names from command line arguments
input_file=complex.pdb
output_file=renum.pdb
# initialize residue counter and flag for tracking first "TER" occurrence
residue_num=1
ter_found=false
# loop through the lines of the input file
while read line
do
# check if the line contains "TER"
if [[ "$line" == "TER" ]]
then
# if it does, reset the residue counter to 5 and set the flag to true
residue_num=5
ter_found=true
else
# if it doesn't, extract the residue name and chain ID from the line
residue_name=$(echo $line | awk '{print $4}')
chain_id=$(echo $line | awk '{print $5}')
# if the residue name or chain ID has changed, increment the residue counter
if [[ "$residue_name" != "$prev_residue_name" || "$chain_id" != "$prev_chain_id" ]]
then
residue_num=$((residue_num+1))
fi
# if the first "TER" has been found, replace the 6th column with the new residue number
if [[ "$ter_found" == true ]]
then
line=$(echo $line | awk -v num="$residue_num" '{$6=num; print}')
fi
# save the current residue name and chain ID for comparison in the next iteration
prev_residue_name=$residue_name
prev_chain_id=$chain_id
fi
# write the modified line to the output file
echo $line >> $output_file
done < $input_file
Finally my renum.pdb file looks like this (this is only fragment this .pdb file):
<sup>edit: The following output was generated by processing the "updated" input with OP's code.</sup>
ATOM 51 O ARG 4 18.189 21.505 -30.356 0.00 0.00
ATOM 52 OXT ARG 5 19.822 21.322 -27.773 0.00 0.00
TER
ATOM 1 N MET A 6 -9.976 22.279 65.378 1.00 37.35 N
ATOM 2 H MET A 6 -9.180 21.915 65.882 1.00 37.35 H
ATOM 3 N LYS A 7 -11.970 21.837 62.804 1.00 40.65 N
ATOM 4 H LYS A 7 -11.194 21.438 62.295 1.00 40.65 H
答案1
得分: 2
以下是您要翻译的内容:
一个PDB文件具有固定宽度的列,因此不应该将其转换为单空格分隔的文件,因为其他程序在处理后将无法处理它。
对于ATOM记录,规范如下:
列 & (宽度) | 数据类型 | 字段 | 定义 |
---|---|---|---|
1 - 6 (6) | 字符串(6) | 记录名 | "ATOM " |
7 - 11 (5) | 整数 | 序号 | 原子序列号 |
12 (1) | - | - | - |
13 - 16 (4) | 原子 | 名称 | 原子名称 |
17 (1) | 字符 | altLoc | 替代位置指示符 |
18 - 20 (3) | 残基名称 | resName | 残基名称 |
21 (1) | - | - | - |
22 (1) | 字符 | chainID | 链标识符 |
23 - 26 (4) | 整数 | resSeq | 残基序号 |
27 (1) | 字符 | iCode | 残基插入代码 |
28 - 30 (3) | - | - | - |
31 - 38 (8) | 实数(8.3) | x | X坐标的正交坐标(以Å为单位) |
39 - 46 (8) | 实数(8.3) | y | Y坐标的正交坐标(以Å为单位) |
47 - 54 (8) | 实数(8.3) | z | Z坐标的正交坐标(以Å为单位) |
55 - 60 (6) | 实数(6.2) | 占据 | 占据度 |
61 - 66 (6) | 实数(6.2) | 温度因子 | 温度因子 |
67 - 76 (10) | - | - | - |
77 - 78 (2) | 长字符串(2) | 元素 | 元素符号,右对齐 |
79 - 80 (2) | 长字符串(2) | 电荷 | 原子电荷 |
要使用任何awk
来处理它,您可以使用substr
来提取相关部分:
awk '
/^TER/ { after_TER = 1 }
/^ATOM/ {
if (after_TER) {
resName = substr($0,18,3)
if ( resName != previous_resName ) {
previous_resName = resName
resSeq = sprintf("%4d", resSeq + 1)
}
$0 = substr($0,1,6) \
sprintf("%5d", ++serial) \
substr($0,12,11) \
resSeq \
substr($0,27)
} else {
serial = substr($0,7,5)
resSeq = substr($0,23,4)
}
}
{ print }
' mod.pdb > renum.pdb
示例:
输入:
ATOM 51 O ARG 4 18.189 21.505 -30.356 0.00 0.00
ATOM 52 OXT ARG 5 19.822 21.322 -27.773 0.00 0.00
TER
ATOM 1 N MET A 1 -9.976 22.279 65.378 1.00 37.35 N
ATOM 2 H MET A 1 -9.180 21.915 65.882 1.00 37.35 H
ATOM 3 N LYS A 2 -11.970 21.837 62.804 1.00 40.65 N
ATOM 4 H LYS A 2 -11.194 21.438 62.295 1.00 40.65 H
输出:
ATOM 51 O ARG 4 18.189 21.505 -30.356 0.00 0.00
ATOM 52 OXT ARG 5 19.822 21.322 -27.773 0.00 0.00
TER
ATOM 53 N MET A 6 -9.976 22.279 65.378 1.00 37.35 N
ATOM 54 H MET A 6 -9.180 21.915 65.882 1.00 37.35 H
ATOM 55 N LYS A 7 -11.970 21.837 62.804 1.00 40.65 N
ATOM 56 H LYS A 7 -11.194 21.438 62.295 1.00 40.65 H
英文:
A PDB has fixed-width columns, so you shouldn't transform it into a single-space delimited file because the other programs won't be able to process it afterwards.
For an ATOM record, the specification is:
COLUMNS & (width) | DATA TYPE | FIELD | DEFINITION |
---|---|---|---|
1 - 6 (6) | String(6) | recordName | "ATOM " |
7 - 11 (5) | Integer | serial | Atom serial number |
12 (1) | - | - | - |
13 - 16 (4) | Atom | name | Atom name |
17 (1) | Character | altLoc | Alternate location indicator |
18 - 20 (3) | Residue name | resName | Residue name |
21 (1) | - | - | - |
22 (1) | Character | chainID | Chain identifier |
23 - 26 (4) | Integer | resSeq | Residue sequence number |
27 (1) | AChar | iCode | Code for insertion of residues |
28 - 30 (3) | - | - | - |
31 - 38 (8) | Real(8.3) | x | Orthogonal coordinates for X in Å |
39 - 46 (8) | Real(8.3) | y | Orthogonal coordinates for Y in Å |
47 - 54 (8) | Real(8.3) | z | Orthogonal coordinates for Z in Å |
55 - 60 (6) | Real(6.2) | occupancy | Occupancy |
61 - 66 (6) | Real(6.2) | tempFactor | Temperature factor |
67 - 76 (10) | - | - | - |
77 - 78 (2) | LString(2) | element | Element symbol, right-justified |
79 - 80 (2) | LString(2) | charge | Charge on the atom |
For processing it with any awk
, you can use substr
to extract the relevant parts:
awk '
/^TER/ { after_TER = 1 }
/^ATOM/ {
if (after_TER) {
resName = substr($0,18,3)
if ( resName != previous_resName ) {
previous_resName = resName
resSeq = sprintf("%4d", resSeq + 1)
}
$0 = substr($0,1,6) \
sprintf("%5d", ++serial) \
substr($0,12,11) \
resSeq \
substr($0,27)
} else {
serial = substr($0,7,5)
resSeq = substr($0,23,4)
}
}
{ print }
' mod.pdb > renum.pdb
Example
Input:
ATOM 51 O ARG 4 18.189 21.505 -30.356 0.00 0.00
ATOM 52 OXT ARG 5 19.822 21.322 -27.773 0.00 0.00
TER
ATOM 1 N MET A 1 -9.976 22.279 65.378 1.00 37.35 N
ATOM 2 H MET A 1 -9.180 21.915 65.882 1.00 37.35 H
ATOM 3 N LYS A 2 -11.970 21.837 62.804 1.00 40.65 N
ATOM 4 H LYS A 2 -11.194 21.438 62.295 1.00 40.65 H
Output:
ATOM 51 O ARG 4 18.189 21.505 -30.356 0.00 0.00
ATOM 52 OXT ARG 5 19.822 21.322 -27.773 0.00 0.00
TER
ATOM 53 N MET A 6 -9.976 22.279 65.378 1.00 37.35 N
ATOM 54 H MET A 6 -9.180 21.915 65.882 1.00 37.35 H
ATOM 55 N LYS A 7 -11.970 21.837 62.804 1.00 40.65 N
ATOM 56 H LYS A 7 -11.194 21.438 62.295 1.00 40.65 H
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论