将一个等距SMILE转化为其原子和非氢邻居

huangapple go评论69阅读模式
英文:

Converting an Isometric SMILE into its atoms and non-hydrogen neighbours

问题

希望阅读此内容的人一切安好。

我有一个代码问题。我尝试将一个异构的SMILE(分子的描述符)转换为其原子组和邻居。

我的代码如下。

import rdkit
from rdkit import Chem

def collect_bonded_atoms(smile):
    mol = Chem.MolFromSmiles(smile)
    atom_counts = {}
    for atom in mol.GetAtoms():
        neighbors = [(neighbor.GetSymbol(), bond.GetBondType())
                     for neighbor, bond in zip(atom.GetNeighbors(), atom.GetBonds())]
        neighbors.sort()
        key = "{}-{}".format(atom.GetSymbol(), "".join("{}-".format(bond_order) if bond_order == 1 else "=" if bond_order == 2 else "#" for symbol, bond_order in neighbors))
        atom_counts[key] = atom_counts.get(key, 0) + 1
    return atom_counts

smile = "CC(C)(C)C(=O)O"
print(collect_bonded_atoms(smile))

输出如下:

{'C-C-': 3, 'C-C-C-C-C-': 1, 'C-C-O-O=': 1, 'O-C=': 1, 'O-C-': 1}

虽然这对于这个分子的SMILE效果很好,但最好的情况下,我希望它的结构如下:

{'C-C-': 3, 'C-C(-C)(-C)-C-': 1, 'C-C-O(=O)': 1, 'O=C': 1, 'O-C-': 1}

我无法弄清楚如何修复这个问题。这是一个次要问题。

我主要的问题是在使用这个分子时:

smile = "CCCCCCCCN1C=C[N+](=C1)C.F[P-](F)(F)(F)(F)F"

我的输出完全错误。这是我的输出:

{'C-C-': 1, 'C-C-C-': 6, 'C-C-N-': 1, 'N-C-C#C#': 2, 'C-C#N#': 2, 'C-N#N#': 1, 'C-N-': 1, 'F-P-': 6, 'P-F-F-F-F-F-F-': 1}

首先,双键(bond_order == 2)被显示为“#”。其次,分子SMILE中的数字1表示一个环。这意味着它与下一个1连接。在输出中,它到处都是。

请问我可以得到一些关于这个问题的指导吗?
谢谢

关于如何修复它的建议,甚至更好的修改。次要问题并不那么重要,但如果可能,也请考虑一下。

英文:

Hope whoever is reading this is well.

I have an issue with my code. I am trying to convert an isometric SMILE, a descriptor of a molecule, into its atomic groups and neightbours.

My code is below.

import rdkit
from rdkit import Chem

def collect_bonded_atoms(smile):
    mol = Chem.MolFromSmiles(smile)
    atom_counts = {}
    for atom in mol.GetAtoms():
        neighbors = [(neighbor.GetSymbol(), bond.GetBondType())
                     for neighbor, bond in zip(atom.GetNeighbors(), atom.GetBonds())]
        neighbors.sort()
        key = "{}-{}".format(atom.GetSymbol(), "".join(f"{symbol}{'-' if bond_order == 1 else '=' if bond_order == 2 else '#'}" for symbol, bond_order in neighbors))
        atom_counts[key] = atom_counts.get(key, 0) + 1
    return atom_counts

smile = "CC(C)(C)C(=O)O"
print(collect_bonded_atoms(smile))

And output is

{'C-C-': 3, 'C-C-C-C-C-': 1, 'C-C-O-O=': 1, 'O-C=': 1, 'O-C-': 1}

Whilst this works well for this molecule's SMILE, though preferably I would've liked it to be structured as,

{'C-C-': 3, 'C-C(-C)(-C)-C-': 1, 'C-C-O(=O)': 1, 'O=C': 1, 'O-C-': 1}

I can't figure out how to fix this. This is a side issue.

The main issue I have is when using this molecule

smile = "CCCCCCCCN1C=C[N+](=C1)C.F[P-](F)(F)(F)(F)F"

My output is very wrong. This is my output.

{'C-C-': 1, 'C-C-C-': 6, 'C-C-N-': 1, 'N-C-C#C#': 2, 'C-C#N#': 2, 'C-N#N#': 1, 'C-N-': 1, 'F-P-': 6, 'P-F-F-F-F-F-F-': 1}

First is that double bonds (bond_order == 2) are shown as a #. Second where it shows the number 1 in the molecule SMILE, that represents a ring. This means that it connects to the next 1. In the output, it is all over the place.

Can I please have some guidance on this?
Thanks

Advice on how to fix it, or even better a modification. The side issue isn't as important, but if possible same for it.

答案1

得分: 1

你看到#是因为这些键是芳香环的一部分,它们的键类型为AROMATIC,而不是SINGLE或DOUBLE。在RdKit中,AROMATIC键的键序为1.5,所以所有这些键最终都会进入else循环中。

要解决这个问题,你可以采取两种方法:

  1. 更改你的if-else条件以考虑1.5的键序。
  2. 通过更新你的代码,使用Chem.Kekulize(mol)来解除芳香环的共轭并使所有键变为静态,如下所示:
mol = Chem.MolFromSmiles(smile)
Chem.Kekulize(mol)
英文:

You're getting the # because those bonds are part of an aromatic ring, making their bond type AROMATIC instead of a SINGLE or DOUBLE. In RdKit, AROMATIC bonds have a bond order of 1.5 so all those bonds are ending up in the else loop.

To fix this, you can do two things:

  1. Change your if-else condition to acknowledge the 1.5 bond order
  2. Kekulize the mol object to remove conjugation due to the aromatic ring and make all the bonds static. You can do this by updating your code as:
mol = Chem.MolFromSmiles(smile)
Chem.Kekulize(mol)

huangapple
  • 本文由 发表于 2023年2月7日 03:20:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75365677.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定