2023年2月7日 03:20:22go评论106阅读模式

英文:

Converting an Isometric SMILE into its atoms and non-hydrogen neighbours

问题

希望阅读此内容的人一切安好。

我有一个代码问题。我尝试将一个异构的SMILE（分子的描述符）转换为其原子组和邻居。

我的代码如下。

import rdkit
from rdkit import Chem
def collect_bonded_atoms(smile):
    mol = Chem.MolFromSmiles(smile)
    atom_counts = {}
    for atom in mol.GetAtoms():
        neighbors = [(neighbor.GetSymbol(), bond.GetBondType())
                     for neighbor, bond in zip(atom.GetNeighbors(), atom.GetBonds())]
        neighbors.sort()
        key = "{}-{}".format(atom.GetSymbol(), "".join("{}-".format(bond_order) if bond_order == 1 else "=" if bond_order == 2 else "#" for symbol, bond_order in neighbors))
        atom_counts[key] = atom_counts.get(key, 0) + 1
    return atom_counts
smile = "CC(C)(C)C(=O)O"
print(collect_bonded_atoms(smile))

输出如下：

{'C-C-': 3, 'C-C-C-C-C-': 1, 'C-C-O-O=': 1, 'O-C=': 1, 'O-C-': 1}

虽然这对于这个分子的SMILE效果很好，但最好的情况下，我希望它的结构如下：

{'C-C-': 3, 'C-C(-C)(-C)-C-': 1, 'C-C-O(=O)': 1, 'O=C': 1, 'O-C-': 1}

我无法弄清楚如何修复这个问题。这是一个次要问题。

我主要的问题是在使用这个分子时：

smile = "CCCCCCCCN1C=C[N+](=C1)C.F[P-](F)(F)(F)(F)F"

我的输出完全错误。这是我的输出：

{'C-C-': 1, 'C-C-C-': 6, 'C-C-N-': 1, 'N-C-C#C#': 2, 'C-C#N#': 2, 'C-N#N#': 1, 'C-N-': 1, 'F-P-': 6, 'P-F-F-F-F-F-F-': 1}

首先，双键（bond_order == 2）被显示为“#”。其次，分子SMILE中的数字1表示一个环。这意味着它与下一个1连接。在输出中，它到处都是。

请问我可以得到一些关于这个问题的指导吗？
谢谢

关于如何修复它的建议，甚至更好的修改。次要问题并不那么重要，但如果可能，也请考虑一下。

英文:

Hope whoever is reading this is well.

I have an issue with my code. I am trying to convert an isometric SMILE, a descriptor of a molecule, into its atomic groups and neightbours.

My code is below.

import rdkit
from rdkit import Chem
def collect_bonded_atoms(smile):
    mol = Chem.MolFromSmiles(smile)
    atom_counts = {}
    for atom in mol.GetAtoms():
        neighbors = [(neighbor.GetSymbol(), bond.GetBondType())
                     for neighbor, bond in zip(atom.GetNeighbors(), atom.GetBonds())]
        neighbors.sort()
        key = &quot;{}-{}&quot;.format(atom.GetSymbol(), &quot;&quot;.join(f&quot;{symbol}{&#39;-&#39; if bond_order == 1 else &#39;=&#39; if bond_order == 2 else &#39;#&#39;}&quot; for symbol, bond_order in neighbors))
        atom_counts[key] = atom_counts.get(key, 0) + 1
    return atom_counts
smile = &quot;CC(C)(C)C(=O)O&quot;
print(collect_bonded_atoms(smile))

And output is

{&#39;C-C-&#39;: 3, &#39;C-C-C-C-C-&#39;: 1, &#39;C-C-O-O=&#39;: 1, &#39;O-C=&#39;: 1, &#39;O-C-&#39;: 1}

Whilst this works well for this molecule's SMILE, though preferably I would've liked it to be structured as,

{&#39;C-C-&#39;: 3, &#39;C-C(-C)(-C)-C-&#39;: 1, &#39;C-C-O(=O)&#39;: 1, &#39;O=C&#39;: 1, &#39;O-C-&#39;: 1}

I can't figure out how to fix this. This is a side issue.

The main issue I have is when using this molecule

smile = &quot;CCCCCCCCN1C=C[N+](=C1)C.F[P-](F)(F)(F)(F)F&quot;

My output is very wrong. This is my output.

{&#39;C-C-&#39;: 1, &#39;C-C-C-&#39;: 6, &#39;C-C-N-&#39;: 1, &#39;N-C-C#C#&#39;: 2, &#39;C-C#N#&#39;: 2, &#39;C-N#N#&#39;: 1, &#39;C-N-&#39;: 1, &#39;F-P-&#39;: 6, &#39;P-F-F-F-F-F-F-&#39;: 1}

First is that double bonds (bond_order == 2) are shown as a #. Second where it shows the number 1 in the molecule SMILE, that represents a ring. This means that it connects to the next 1. In the output, it is all over the place.

Can I please have some guidance on this?
Thanks

Advice on how to fix it, or even better a modification. The side issue isn't as important, but if possible same for it.

答案1

得分: 1

你看到#是因为这些键是芳香环的一部分，它们的键类型为AROMATIC，而不是SINGLE或DOUBLE。在RdKit中，AROMATIC键的键序为1.5，所以所有这些键最终都会进入else循环中。

要解决这个问题，你可以采取两种方法：

更改你的if-else条件以考虑1.5的键序。
通过更新你的代码，使用Chem.Kekulize(mol)来解除芳香环的共轭并使所有键变为静态，如下所示：

mol = Chem.MolFromSmiles(smile)
Chem.Kekulize(mol)

英文:

You're getting the # because those bonds are part of an aromatic ring, making their bond type AROMATIC instead of a SINGLE or DOUBLE. In RdKit, AROMATIC bonds have a bond order of 1.5 so all those bonds are ending up in the else loop.

To fix this, you can do two things:

Change your if-else condition to acknowledge the 1.5 bond order
Kekulize the mol object to remove conjugation due to the aromatic ring and make all the bonds static. You can do this by updating your code as:

mol = Chem.MolFromSmiles(smile)
Chem.Kekulize(mol)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将一个等距SMILE转化为其原子和非氢邻居

问题

答案1

用Python和Selenium拖动特定列来排序表格。

基于日期时间进行插值。

Merge_asof行为

HTTP回调URL与WebSocket在异步响应方面有何区别？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。