I wrote the following code to download multiple sequences from NCBI.
import numpy as np
from Bio import Entrez
Entrez.email ="[email protected]"
data = np.loadtxt('/home/Documents/XXX.txt', dtype="string")
data
array(['YP_615060', 'YP_615061', 'YP_615062', ..., 'YP_611146',
'YP_611148', 'YP_611150'],
dtype='|S12')
ids=data[:10]
ids_1=data[10:20]
ids_1=",".join(ids_1)
ids_2=data[20:30]
ids_2=",".join(ids_2)
total=(ids, ids_1, ids_2)
for c in total:
handle = Entrez.efetch(db="protein", id=c, rettype="fasta", retmode="txt")
handle.read()
I get an error
File "<stdin>", line 3
handle.read()
^
SyntaxError: invalid syntax
I guess I am writing the 'foor' loop wrong, but I cannot get what's the problem. It is suppose to be a trivial issues, but I cannot find a way around it.
If I test the for loop and I do not call
handle.read()
running
>>>for c in total:
... handle=Entrez.efetch(db="protein", id=c, rettype="fasta", retmode="txt")
...
the for loop is still waiting for something. What am I missing here?
record = Entrez.read(handle)(see documentation)handle.read()is not part of the loop block. Is that correct?