英文:
My open reading frame (ORF) finding code is not finding the longest ORF in the sequence
问题
我明白你只需要翻译代码部分,以下是代码的翻译:
def find_orfs(sequence):
# Scan through the sequence to find open reading frames
longest_orf = ""
strand = ""
longest_orf_start = -1
longest_orf_end = -1
for i in range(3):
# Search forward frames
orfs = re.findall(r'(?s)ATG(?:...)*?(?:TAA|TAG|TGA)', sequence[i:])
for orf in orfs:
# Find the longest ORF within the sequence
if len(orf) > len(longest_orf):
strand = "+"
longest_orf = orf
longest_orf_start = i + sequence.index(orf) + 1
longest_orf_end = i + sequence.index(orf) + len(orf)
# Search reverse frames
seq_rev = str(Seq(sequence).reverse_complement())
orfs = re.findall(r'(?s)ATG(?:...)*?(?:TAA|TAG|TGA)', seq_rev[i:])
for orf in orfs:
# Find the longest ORF within the sequence
if len(orf) > len(longest_orf):
longest_orf = orf
strand = "-"
longest_orf_start = len(sequence) - i - seq_rev.index(orf) - len(orf) + 1
longest_orf_end = len(sequence) - i - seq_rev.index(orf)
print("Longest ORF:", longest_orf)
print("Strand:", strand)
print("Start position:", longest_orf_start)
print("End position:", longest_orf_end)
# Reverse complement the original DNA sequence
seq_rev_comp = str(Seq(sequence).reverse_complement())
# Translate the longest ORF to a protein sequence
protein_seq = Seq(longest_orf).translate()
print("Protein sequence:", protein_seq)
protein_seq = str(protein_seq)
return longest_orf, protein_seq
希望这对你有所帮助。如果你有任何其他问题,请随时提出。
英文:
I am trying to code a function that finds the longest Open reading frame. However, in this one instance it is not locating the longest ORF and I cannot figure out why.
This is the sequence:
> GCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGC
> GGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCG
> TGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGC
> TGCTCACGCTGTACCTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTG
> CCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAA
> AGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAG
> ATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACT
> CCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCGCTGGGCT
> GGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGG
> CCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAA
> CGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCG
> CACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAA
> CAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAA
> GCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGG
> CTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTG
> ACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCA
> ACACGACTTAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCC
> GCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGG
> CCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGG
> CCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCT
My code says the longest ORF is 228 nucleotides in length and located between nucleotides 655 and 882. However the longest ORF is actually 327 nucleotides in length and located between nucleotides 575 and 901.
This is my code. I have made sure it does not stop at stop codons not in the reading frame but no success. Can anyone figure out why it doesn't work? I save the sequence as a fasta file and then open it and save the sequence before calling the function.
def find_orfs(sequence):
# Scan through the sequence to find open reading frames
longest_orf = ""
strand = ""
longest_orf_start = -1
longest_orf_end = -1
for i in range(3):
# Search forward frames
orfs =re.findall(r'(?s)ATG(?:...)*?(?:TAA|TAG|TGA)', sequence[i:])
for orf in orfs:
# Find the longest ORF within the sequence
if len(orf) > len(longest_orf):
strand = "+"
longest_orf = orf
longest_orf_start = i + sequence.index(orf) +1
longest_orf_end = i + sequence.index(orf) + len(orf)
# Search reverse frames
seq_rev = str(Seq(sequence).reverse_complement())
orfs = re.findall(r'(?s)ATG(?:...)*?(?:TAA|TAG|TGA)', seq_rev[i:])
for orf in orfs:
# Find the longest ORF within the sequence
if len(orf) > len(longest_orf):
longest_orf = orf
strand = "-"
longest_orf_start = len(sequence) - i - seq_rev.index(orf) - len(orf) +1
longest_orf_end = len(sequence) - i - seq_rev.index(orf)
print("Longest ORF:", longest_orf)
print("Strand:", strand)
print("Start position:", longest_orf_start)
print("End position:", longest_orf_end)
# Reverse complement the original DNA sequence
seq_rev_comp = str(Seq(sequence).reverse_complement())
# Translate the longest ORF to a protein sequence
protein_seq = Seq(longest_orf).translate()
print("Protein sequence:", protein_seq)
protein_seq = str(protein_seq)
return(longest_orf, protein_seq)
答案1
得分: 1
这段代码的问题在于正则表达式:
(?s)ATG(?:...)*?(?:TAA|TAG|TGA)
你可以访问这个链接来测试这个正则表达式与你的序列。这个正则表达式不允许重叠匹配。因此,它会找到一个匹配,消耗字符,然后分割出正确的开放阅读框。你可以通过以下图像看到这一点:
请注意,第二个匹配涵盖了正确的orf,而它找到的下一个orf是错误的答案。
为了修复这个问题,我们可以使用正向预查(?=)
:
(?=(ATG(?:...)*?)(?:TAG|TGA|TAA))
这个正则表达式与原始正则表达式接近,只是删除了(?s)
并添加了(?=)
。现在,我们允许重叠匹配,可以得到我们想要的答案。现在orfs
变量的两个实例应该如下所示:
orfs = re.findall(r'(?=(ATG(?:...)*?)(?:TAG|TGA|TAA))', input_seq[i:])
经过这个改变,我得到了以下输出:
最长的ORF:ATGACGACCATCAGGGACAGCTTCAACGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCGCACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAACAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGCTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGA
链:+
起始位置:575
结束位置:901
蛋白质序列:MTTIRDSFNGSYQPNFDHWTADRHGDLCRTWTRCWRFSIGSAPLTSITNKSEVAKPDRTIKIPGVSPWKRSPVPTLPLTGYLSAFLPSGFLNAHAVGISVRCRSFAPS*
('ATGACGACCATCAGGGACAGCTTCAACGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCGCACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAACAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGCTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGA', 'MTTIRDSFNGSYQPNFDHWTADRHGDLCRTWTRCWRFSIGSAPLTSITNKSEVAKPDRTIKIPGVSPWKRSPVPTLPLTGYLSAFLPSGFLNAHAVGISVRCRSFAPS*')
英文:
The issue with this code is the regular expression
:
(?s)ATG(?:...)*?(?:TAA|TAG|TGA)
You can take a look at this website to try out the regex
with your sequence
. The regular expression does not allow for overlaps. So what ends up happening is that it finds a hit, consumes the characters and ends up segmenting the correct open reading frame
. You can see what I am talking about with the following image:
Notice that the second match
eats into the correct orf
and the next orf
that it finds is the wrong answer.
To fix this we can use (?=)
for positive lookahead
:
(?=(ATG(?:...)*?)(?:TAG|TGA|TAA))
The regex
is close to the original, except for the removal of (?s)
and the addition of (?=)
. So now that we allow for overlaps, we can get the answer we're looking for. Both instances of the orfs
variable should now look like this:
orfs = re.findall(r'(?=(ATG(?:...)*?)(?:TAG|TGA|TAA))', input_seq[i:])
Having made that change, gives me the following output:
Longest ORF: ATGACGACCATCAGGGACAGCTTCAACGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCGCACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAACAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGCTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGA
Strand: +
Start position: 575
End position: 901
Protein sequence: MTTIRDSFNGSYQPNFDHWTADRHGDLCRTWTRCWRFSIGSAPLTSITNKSEVAKPDRTIKIPGVSPWKRSPVPTLPLTGYLSAFLPSGFLNAHAVGISVRCRSFAPS*
('ATGACGACCATCAGGGACAGCTTCAACGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCGCACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAACAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGCTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGA', 'MTTIRDSFNGSYQPNFDHWTADRHGDLCRTWTRCWRFSIGSAPLTSITNKSEVAKPDRTIKIPGVSPWKRSPVPTLPLTGYLSAFLPSGFLNAHAVGISVRCRSFAPS*')
答案2
得分: 0
我的代码与你的类似,但使用了Python的Max()内置函数;它是从bioinformatics.stackexchange.com: 在DNA序列中查找开放阅读框中复制的。
import re
from Bio.Seq import Seq
sequence = ('GCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCGGTGGCGAAA'
'CCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCG'
'TGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGAAGCGTGGC'
'TGCTCACGCTGTACCTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTG'
'CCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAA'
'AGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAG'
'ATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACT'
'CCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCGCTGGGCT'
'GGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGG'
'CCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAA'
'CGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCG'
'CACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAA'
'CAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAA'
'GCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGG'
'CTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTG'
'ACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCA'
'ACACGACTTAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCC'
'GCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGG'
'CCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGG'
'CCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCT')
print(sequence,'\n\n')
def find_orfs(sequence):
pattern = re.compile(r'(?=(ATG(?:...)*?)(?=TAG|TGA|TAA))')
revcompseq = sequence[::-1].maketrans("ATGC", "TACG") # 反向互补
b = [(m.span(1), m.start(1), m.end(1), (m.end(1)-m.start(1)), m, m.groups()[0], 'forward') for m in re.finditer(pattern, sequence)]
b_rev = [(m.span(1), m.start(1), m.end(1), (m.end(1)-m.start(1)), m, m.groups()[0], 'reverse') for m in re.finditer(pattern, sequence[::-1].translate(revcompseq))]
b = max(b, key=lambda x: x[3])
try:
b_rev = max(b_rev, key=lambda x: x[3])
except:
b_rev = (0, 0, 0, 0, 0, 0)
b_max = max((b, b_rev), key=lambda x: x[3])
protein_seq = Seq(b_max[5]).translate()
protein_seq = str(protein_seq)
return (b_max[5], protein_seq)
print('\n\n', find_orfs(sequence))
输出:
((573, 897), 573, 897, 324, <re.Match object; span=(573, 573), match=''>, 'ATGACGACCATCAGGGACAGCTTCAACGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCGCACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAACAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGCTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGC', 'forward') <class 'tuple'> <re.Match object; span=(573, 573), match=''> 324
Protein sequence: MTTIRDSFNGSYQPNFDHWTADRHGDLCRTWTRCWRFSIGSAPLTSITNKSEVAKPDRTIKIPGVSPWKRSPVPTLPLTGYLSAFLPSGFLNAHAVGISVRCRSFAPS
('ATGACGACCATCAGGGACAGCTTCAACGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCGCACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAACAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGCTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGC', 'MTTIRDS
<details>
<summary>英文:</summary>
my code is similar to yours, but uses Python Max() built-in function;
it is copied from [bioinformatics.stackexchange.com : Find open reading frames in a DNA sequence][1]
import re
from Bio.Seq import Seq
sequence = ('GCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCGGTGGCGAAA'
'CCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCG'
'TGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGAAGCGTGGC'
'TGCTCACGCTGTACCTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTG'
'CCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAA'
'AGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAG'
'ATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACT'
'CCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCGCTGGGCT'
'GGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGG'
'CCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAA'
'CGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCG'
'CACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAA'
'CAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAA'
'GCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGG'
'CTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTG'
'ACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCA'
'ACACGACTTAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCC'
'GCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGG'
'CCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGG'
'CCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCT')
print(sequence,'\n\n')
def find_orfs(sequence):
pattern = re.compile(r'(?=(ATG(?:...)*?)(?=TAG|TGA|TAA))')
revcompseq = sequence[::-1].maketrans("ATGC", "TACG") #reverse complement
# print (pattern.findall(sequence)) #forward search
# print (pattern.findall(sequence[::-1].translate(revcompseq))) #backward search
b = [(m.span(1), m.start(1), m.end(1), (m.end(1)-m.start(1)), m, m.groups()[0] , 'forward') for m in re.finditer(pattern, sequence)]
b_rev = [(m.span(1), m.start(1), m.end(1), (m.end(1)-m.start(1)), m, m.groups()[0] , 'reverse') for m in re.finditer(pattern, sequence[::-1].translate(revcompseq))]
b = max(b, key= lambda x: x[3])
try:
b_rev = max(b_rev, key= lambda x: x[3])
except:
b_rev = (0,0,0,0,0,0)
b_max = max((b,b_rev), key= lambda x: x[3])
# print(b,'\n', b_rev,'\n')
print(b_max, type(b), b[4], len(b[5]))
protein_seq = Seq(b_max[5]).translate()
print("Protein sequence:", protein_seq)
protein_seq = str(protein_seq)
return(b_max[5], protein_seq)
print('\n\n',find_orfs(sequence))
output:
((573, 897), 573, 897, 324, <re.Match object; span=(573, 573), match=''>, 'ATGACGACCATCAGGGACAGCTTCAACGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCGCACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAACAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGCTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGC', 'forward') <class 'tuple'> <re.Match object; span=(573, 573), match=''> 324
Protein sequence: MTTIRDSFNGSYQPNFDHWTADRHGDLCRTWTRCWRFSIGSAPLTSITNKSEVAKPDRTIKIPGVSPWKRSPVPTLPLTGYLSAFLPSGFLNAHAVGISVRCRSFAPS
('ATGACGACCATCAGGGACAGCTTCAACGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCGCACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAACAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGCTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGC', 'MTTIRDSFNGSYQPNFDHWTADRHGDLCRTWTRCWRFSIGSAPLTSITNKSEVAKPDRTIKIPGVSPWKRSPVPTLPLTGYLSAFLPSGFLNAHAVGISVRCRSFAPS')
I an not sure if using `re.Match object` attributes it would be faster than your algorithm for big sequences, my code has a glitch: it assumes that there is only one ORF of the biggest lenght, as your algo too I believe.
if used with:
sequence = ('ATGAAAAAAAAAAAAAAAAATGTAG'
'ATGAAAAAAAAAAAAAAAAAATAG')
it returns the first of the 2 same lenght ORF of the sequence
[1]: https://bioinformatics.stackexchange.com/questions/20442/find-open-reading-frames-in-a-dna-sequence/20452#20452
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论