BioPython – 如何一次对齐多个序列?

huangapple go评论64阅读模式
英文:

BioPython - How do I align multiple sequences at once?

问题

这可能是一个非常愚蠢的问题,但我在文档中找不到我要找的内容。

我正在尝试同时对齐多个序列。从biopython包中,我可以看到如何对两个序列进行对齐,例如:

from Bio.Seq import Seq 
from Bio import pairwise2

seq1 = Seq("ACCGGT") 
seq2 = Seq("ACGT")
alignments = pairwise2.align.globalxx(seq1, seq2)
print(alignments[0])

这个工作得很好。现在我想一次性对齐多个序列,修改自文档

from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Align import MultipleSeqAlignment

a = SeqRecord(Seq("AACGTAT"), id="Alpha")
b = SeqRecord(Seq("ACGTAT-"), id="Beta")
c = SeqRecord(Seq("AGGTAT-"), id="Gamma")

align = MultipleSeqAlignment([a, b, c],
                             annotations={"tool": "demo"},
                             column_annotations={"stats": "CCCXCCC"})
print(align)

但这实际上并没有做任何事情。我现在如何实际对齐这些序列?

英文:

(This may be a really stupid question, but I cannot find what I'm looking for in the documentation)

I'm trying to align multiple sequences at once. From the biopython package, I can see how I can make an alignment of two sequences, e.g.:

from Bio.Seq import Seq 
from Bio import pairwise2

seq1 = Seq("ACCGGT") 
seq2 = Seq("ACGT")
alignments = pairwise2.align.globalxx(seq1, seq2)
print(alignments[0])

>>> Alignment(seqA='ACCGGT', seqB='A-C-GT', score=4.0, start=0, end=6) 

Which works fine.
Now I would like to align multiple sequences at once, altered from the docs:

from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Align import MultipleSeqAlignment
a = SeqRecord(Seq("AACGTAT"), id="Alpha")
b = SeqRecord(Seq("ACGTAT-"), id="Beta")
c = SeqRecord(Seq("AGGTAT-"), id="Gamma")
align = MultipleSeqAlignment([a, b, c],
                             annotations={"tool": "demo"},
                             column_annotations={"stats": "CCCXCCC"})
print(align)

>>> Alignment with 3 rows and 7 columns
AACGTAT Alpha
ACGTAT- Beta
AGGTAT- Gamma

But that does not really do anything. How do I now actually align these sequences?

答案1

得分: 1

在我的环境中:

import Bio

print('BIOPYTHON VERSION : ', Bio.__version__)

from Bio.Align.Applications import ClustalwCommandline
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import AlignIO

a = SeqRecord(Seq("AACGTAT"), id="Alpha")
b = SeqRecord(Seq("ACGTAT-"), id="Beta")
c = SeqRecord(Seq("AGGTAT-"), id="Gamma")

with open('input_clustal.fa', 'a') as handle:

    for i in [a, b, c]:

        SeqIO.write(i, handle, 'fasta')

clustalw_cline = ClustalwCommandline("./clustalw2", infile="input_clustal.fa")

print('clustal command : ', clustalw_cline)
print('--------------')

try:

    a = clustalw_cline()

    print('ok')
    print('--------------')

    align = AlignIO.read("input_clustal.aln", "clustal")

    print(align)

    print('--------------')

    for i in range(len(align)):
        print(str(i + 1) + '  ---> ', align[i].seq, '\n')

except Exception as E:

    print('Error : ', E)

    print('--------------')

输出:

BIOPYTHON VERSION : 1.80
clustal command : ./clustalw2 -infile=input_clustal.fa
--------------
ok
--------------
Alignment with 3 rows and 8 columns
AACGTAT- Alpha
-ACGTAT- Beta
-AGGTAT- Gamma
--------------
1  --->  AACGTAT- 

2  --->  -ACGTAT- 

3  --->  -AGGTAT- 

请确保你的机器上有一个类似于示例中的 clustalW 可执行文件,可以从这里获得更多信息:http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec94

根据 from Bio.Align import MultipleSeqAlignmentMultipleSeqAlignment 只是一个 Biopython 类(对象),用于存储多个序列的比对,而不进行比对计算。

如果你多次运行脚本,请记得删除我代码中创建的输入文件,否则将会出现错误:

BIOPYTHON VERSION : 1.80
clustal command : ./clustalw2 -infile=input_clustal.fa
--------------
Error : Non-zero return code 255 from './clustalw2 -infile=input_clustal.fa', message 'ERROR: Multiple sequences found with the same name (found Alpha at least twice)!'
--------------
英文:

in my env:

import Bio 

print('BIOPYTHON VERSION : ', Bio.__version__)


from Bio.Align.Applications import ClustalwCommandline

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import AlignIO

a = SeqRecord(Seq("AACGTAT"), id="Alpha")
b = SeqRecord(Seq("ACGTAT-"), id="Beta")
c = SeqRecord(Seq("AGGTAT-"), id="Gamma")


with open('input_clustal.fa', 'a') as handle:
    
    for i in [a,b,c]:
    
        SeqIO.write(i, handle ,'fasta')
        
        
clustalw_cline = ClustalwCommandline("./clustalw2", infile="input_clustal.fa")


print('clustal command : ', clustalw_cline)
print('--------------')



try : 
    
    a = clustalw_cline()
    
    print('ok')
    print('--------------')
    
    
    align = AlignIO.read("input_clustal.aln", "clustal")

    print(align)

    print('--------------')
     
    for i in range(len(align)):
        
        print(str(i+1)+'  ---> ', align[i].seq,'\n')

    
except Exception as E :
    
    print('Error : ', E)
    
    print('--------------')

output:

BIOPYTHON VERSION :  1.80
clustal command :  ./clustalw2 -infile=input_clustal.fa
--------------
ok
--------------
Alignment with 3 rows and 8 columns
AACGTAT- Alpha
-ACGTAT- Beta
-AGGTAT- Gamma
--------------
1  --->  AACGTAT- 

2  --->  -ACGTAT- 

3  --->  -AGGTAT- 

You need a clustalW executable on your machine like in example from:

http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec94

>>> import os
>>> from Bio.Align.Applications import ClustalwCommandline
>>> clustalw_exe = r"C:\Program Files\new clustal\clustalw2.exe"
>>> clustalw_cline = ClustalwCommandline(clustalw_exe, infile="opuntia.fasta")

As per from Bio.Align import MultipleSeqAlignment , MultipleSeqAlignment is just a Biopython Class (Object) that store a multiple sequences alignment and does not calculate it.

Remember to delete the input file created in from my code if you run the script more than once, otherwise you are feeding clustalw2 with same name sequrnces and that will throw an error:

BIOPYTHON VERSION :  1.80
clustal command :  ./clustalw2 -infile=input_clustal.fa
--------------
Error :  Non-zero return code 255 from './clustalw2 -infile=input_clustal.fa', message 'ERROR: Multiple sequences found with same name (found Alpha at least twice)!'
--------------

huangapple
  • 本文由 发表于 2023年5月30日 04:24:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76360149.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定