英文:
Converting an Isometric SMILE into its atoms and non-hydrogen neighbours
问题
希望阅读此内容的人一切安好。
我有一个代码问题。我尝试将一个异构的SMILE(分子的描述符)转换为其原子组和邻居。
我的代码如下。
import rdkit
from rdkit import Chem
def collect_bonded_atoms(smile):
mol = Chem.MolFromSmiles(smile)
atom_counts = {}
for atom in mol.GetAtoms():
neighbors = [(neighbor.GetSymbol(), bond.GetBondType())
for neighbor, bond in zip(atom.GetNeighbors(), atom.GetBonds())]
neighbors.sort()
key = "{}-{}".format(atom.GetSymbol(), "".join("{}-".format(bond_order) if bond_order == 1 else "=" if bond_order == 2 else "#" for symbol, bond_order in neighbors))
atom_counts[key] = atom_counts.get(key, 0) + 1
return atom_counts
smile = "CC(C)(C)C(=O)O"
print(collect_bonded_atoms(smile))
输出如下:
{'C-C-': 3, 'C-C-C-C-C-': 1, 'C-C-O-O=': 1, 'O-C=': 1, 'O-C-': 1}
虽然这对于这个分子的SMILE效果很好,但最好的情况下,我希望它的结构如下:
{'C-C-': 3, 'C-C(-C)(-C)-C-': 1, 'C-C-O(=O)': 1, 'O=C': 1, 'O-C-': 1}
我无法弄清楚如何修复这个问题。这是一个次要问题。
我主要的问题是在使用这个分子时:
smile = "CCCCCCCCN1C=C[N+](=C1)C.F[P-](F)(F)(F)(F)F"
我的输出完全错误。这是我的输出:
{'C-C-': 1, 'C-C-C-': 6, 'C-C-N-': 1, 'N-C-C#C#': 2, 'C-C#N#': 2, 'C-N#N#': 1, 'C-N-': 1, 'F-P-': 6, 'P-F-F-F-F-F-F-': 1}
首先,双键(bond_order == 2)被显示为“#”。其次,分子SMILE中的数字1表示一个环。这意味着它与下一个1连接。在输出中,它到处都是。
请问我可以得到一些关于这个问题的指导吗?
谢谢
关于如何修复它的建议,甚至更好的修改。次要问题并不那么重要,但如果可能,也请考虑一下。
英文:
Hope whoever is reading this is well.
I have an issue with my code. I am trying to convert an isometric SMILE, a descriptor of a molecule, into its atomic groups and neightbours.
My code is below.
import rdkit
from rdkit import Chem
def collect_bonded_atoms(smile):
mol = Chem.MolFromSmiles(smile)
atom_counts = {}
for atom in mol.GetAtoms():
neighbors = [(neighbor.GetSymbol(), bond.GetBondType())
for neighbor, bond in zip(atom.GetNeighbors(), atom.GetBonds())]
neighbors.sort()
key = "{}-{}".format(atom.GetSymbol(), "".join(f"{symbol}{'-' if bond_order == 1 else '=' if bond_order == 2 else '#'}" for symbol, bond_order in neighbors))
atom_counts[key] = atom_counts.get(key, 0) + 1
return atom_counts
smile = "CC(C)(C)C(=O)O"
print(collect_bonded_atoms(smile))
And output is
{'C-C-': 3, 'C-C-C-C-C-': 1, 'C-C-O-O=': 1, 'O-C=': 1, 'O-C-': 1}
Whilst this works well for this molecule's SMILE, though preferably I would've liked it to be structured as,
{'C-C-': 3, 'C-C(-C)(-C)-C-': 1, 'C-C-O(=O)': 1, 'O=C': 1, 'O-C-': 1}
I can't figure out how to fix this. This is a side issue.
The main issue I have is when using this molecule
smile = "CCCCCCCCN1C=C[N+](=C1)C.F[P-](F)(F)(F)(F)F"
My output is very wrong. This is my output.
{'C-C-': 1, 'C-C-C-': 6, 'C-C-N-': 1, 'N-C-C#C#': 2, 'C-C#N#': 2, 'C-N#N#': 1, 'C-N-': 1, 'F-P-': 6, 'P-F-F-F-F-F-F-': 1}
First is that double bonds (bond_order == 2) are shown as a #. Second where it shows the number 1 in the molecule SMILE, that represents a ring. This means that it connects to the next 1. In the output, it is all over the place.
Can I please have some guidance on this?
Thanks
Advice on how to fix it, or even better a modification. The side issue isn't as important, but if possible same for it.
答案1
得分: 1
你看到#
是因为这些键是芳香环的一部分,它们的键类型为AROMATIC,而不是SINGLE或DOUBLE。在RdKit中,AROMATIC键的键序为1.5,所以所有这些键最终都会进入else
循环中。
要解决这个问题,你可以采取两种方法:
- 更改你的
if-else
条件以考虑1.5的键序。 - 通过更新你的代码,使用Chem.Kekulize(mol)来解除芳香环的共轭并使所有键变为静态,如下所示:
mol = Chem.MolFromSmiles(smile)
Chem.Kekulize(mol)
英文:
You're getting the #
because those bonds are part of an aromatic ring, making their bond type AROMATIC instead of a SINGLE or DOUBLE. In RdKit, AROMATIC bonds have a bond order of 1.5 so all those bonds are ending up in the else
loop.
To fix this, you can do two things:
- Change your
if-else
condition to acknowledge the 1.5 bond order - Kekulize the
mol
object to remove conjugation due to the aromatic ring and make all the bonds static. You can do this by updating your code as:
mol = Chem.MolFromSmiles(smile)
Chem.Kekulize(mol)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论