英文:
BioPython - How do I align multiple sequences at once?
问题
这可能是一个非常愚蠢的问题,但我在文档中找不到我要找的内容。
我正在尝试同时对齐多个序列。从biopython包中,我可以看到如何对两个序列进行对齐,例如:
from Bio.Seq import Seq
from Bio import pairwise2
seq1 = Seq("ACCGGT")
seq2 = Seq("ACGT")
alignments = pairwise2.align.globalxx(seq1, seq2)
print(alignments[0])
这个工作得很好。现在我想一次性对齐多个序列,修改自文档:
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Align import MultipleSeqAlignment
a = SeqRecord(Seq("AACGTAT"), id="Alpha")
b = SeqRecord(Seq("ACGTAT-"), id="Beta")
c = SeqRecord(Seq("AGGTAT-"), id="Gamma")
align = MultipleSeqAlignment([a, b, c],
annotations={"tool": "demo"},
column_annotations={"stats": "CCCXCCC"})
print(align)
但这实际上并没有做任何事情。我现在如何实际对齐这些序列?
英文:
(This may be a really stupid question, but I cannot find what I'm looking for in the documentation)
I'm trying to align multiple sequences at once. From the biopython package, I can see how I can make an alignment of two sequences, e.g.:
from Bio.Seq import Seq
from Bio import pairwise2
seq1 = Seq("ACCGGT")
seq2 = Seq("ACGT")
alignments = pairwise2.align.globalxx(seq1, seq2)
print(alignments[0])
>>> Alignment(seqA='ACCGGT', seqB='A-C-GT', score=4.0, start=0, end=6)
Which works fine.
Now I would like to align multiple sequences at once, altered from the docs:
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Align import MultipleSeqAlignment
a = SeqRecord(Seq("AACGTAT"), id="Alpha")
b = SeqRecord(Seq("ACGTAT-"), id="Beta")
c = SeqRecord(Seq("AGGTAT-"), id="Gamma")
align = MultipleSeqAlignment([a, b, c],
annotations={"tool": "demo"},
column_annotations={"stats": "CCCXCCC"})
print(align)
>>> Alignment with 3 rows and 7 columns
AACGTAT Alpha
ACGTAT- Beta
AGGTAT- Gamma
But that does not really do anything. How do I now actually align these sequences?
答案1
得分: 1
在我的环境中:
import Bio
print('BIOPYTHON VERSION : ', Bio.__version__)
from Bio.Align.Applications import ClustalwCommandline
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import AlignIO
a = SeqRecord(Seq("AACGTAT"), id="Alpha")
b = SeqRecord(Seq("ACGTAT-"), id="Beta")
c = SeqRecord(Seq("AGGTAT-"), id="Gamma")
with open('input_clustal.fa', 'a') as handle:
for i in [a, b, c]:
SeqIO.write(i, handle, 'fasta')
clustalw_cline = ClustalwCommandline("./clustalw2", infile="input_clustal.fa")
print('clustal command : ', clustalw_cline)
print('--------------')
try:
a = clustalw_cline()
print('ok')
print('--------------')
align = AlignIO.read("input_clustal.aln", "clustal")
print(align)
print('--------------')
for i in range(len(align)):
print(str(i + 1) + ' ---> ', align[i].seq, '\n')
except Exception as E:
print('Error : ', E)
print('--------------')
输出:
BIOPYTHON VERSION : 1.80
clustal command : ./clustalw2 -infile=input_clustal.fa
--------------
ok
--------------
Alignment with 3 rows and 8 columns
AACGTAT- Alpha
-ACGTAT- Beta
-AGGTAT- Gamma
--------------
1 ---> AACGTAT-
2 ---> -ACGTAT-
3 ---> -AGGTAT-
请确保你的机器上有一个类似于示例中的 clustalW 可执行文件,可以从这里获得更多信息:http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec94
根据 from Bio.Align import MultipleSeqAlignment
,MultipleSeqAlignment
只是一个 Biopython 类(对象),用于存储多个序列的比对,而不进行比对计算。
如果你多次运行脚本,请记得删除我代码中创建的输入文件,否则将会出现错误:
BIOPYTHON VERSION : 1.80
clustal command : ./clustalw2 -infile=input_clustal.fa
--------------
Error : Non-zero return code 255 from './clustalw2 -infile=input_clustal.fa', message 'ERROR: Multiple sequences found with the same name (found Alpha at least twice)!'
--------------
英文:
in my env:
import Bio
print('BIOPYTHON VERSION : ', Bio.__version__)
from Bio.Align.Applications import ClustalwCommandline
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import AlignIO
a = SeqRecord(Seq("AACGTAT"), id="Alpha")
b = SeqRecord(Seq("ACGTAT-"), id="Beta")
c = SeqRecord(Seq("AGGTAT-"), id="Gamma")
with open('input_clustal.fa', 'a') as handle:
for i in [a,b,c]:
SeqIO.write(i, handle ,'fasta')
clustalw_cline = ClustalwCommandline("./clustalw2", infile="input_clustal.fa")
print('clustal command : ', clustalw_cline)
print('--------------')
try :
a = clustalw_cline()
print('ok')
print('--------------')
align = AlignIO.read("input_clustal.aln", "clustal")
print(align)
print('--------------')
for i in range(len(align)):
print(str(i+1)+' ---> ', align[i].seq,'\n')
except Exception as E :
print('Error : ', E)
print('--------------')
output:
BIOPYTHON VERSION : 1.80
clustal command : ./clustalw2 -infile=input_clustal.fa
--------------
ok
--------------
Alignment with 3 rows and 8 columns
AACGTAT- Alpha
-ACGTAT- Beta
-AGGTAT- Gamma
--------------
1 ---> AACGTAT-
2 ---> -ACGTAT-
3 ---> -AGGTAT-
You need a clustalW executable on your machine like in example from:
http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec94
>>> import os
>>> from Bio.Align.Applications import ClustalwCommandline
>>> clustalw_exe = r"C:\Program Files\new clustal\clustalw2.exe"
>>> clustalw_cline = ClustalwCommandline(clustalw_exe, infile="opuntia.fasta")
As per from Bio.Align import MultipleSeqAlignment
, MultipleSeqAlignment
is just a Biopython Class (Object) that store a multiple sequences alignment and does not calculate it.
Remember to delete the input file created in from my code if you run the script more than once, otherwise you are feeding clustalw2 with same name sequrnces and that will throw an error:
BIOPYTHON VERSION : 1.80
clustal command : ./clustalw2 -infile=input_clustal.fa
--------------
Error : Non-zero return code 255 from './clustalw2 -infile=input_clustal.fa', message 'ERROR: Multiple sequences found with same name (found Alpha at least twice)!'
--------------
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论