1

I want to print out the IDs that are in between ">sp|" and "|" from a file, so the output should be:

Q12955
Q16659
Q7Z7A1

Example file f:

>sp|Q12955|ANK3_HUMAN Ankyrin-3 OS=Homo sapiens GN=ANK3 PE=1 SV=3
MAHAASQLKKNRDLEINAEEEPEKKRKHRKRSRDRKKKSDANASYLRAARAGHLEKALDY
IKNGVDINICNQNGLNALHLASKEGHVEVVSELLQREANVDAATKKGNTALHIASLAGQA

>sp|Q16659|MK06_HUMAN Mitogen-activated protein kinase 6 OS=Homo sapiens GN=MAPK6 PE=1 SV=1

MAEKFESLMNIHGFDLGSRYMDLKPLGCGGNGLVFSAVDNDCDKRVAIKKIVLTDPQSVK
HALREIKIIRRLDHDNIVKVFEILGPSGSQLTDDVGSLTELNSVYIVQEYMETDLANVLE
QGPLLEEHARLFMYQLLRGLKYIHSANVLHRDLKPANLFINTEDLVLKIGDFGLARIMDP

>sp|Q7Z7A1|CNTRL_HUMAN Centriolin OS=Homo sapiens GN=CNTRL PE=1 SV=2

MKKGSQQKIFSKAKIPSSSHSPIPSSMSNMRSRSLSPLIGSETLPFHSGGQWCEQVEIAD
ENNMLLDYQDHKGADSHAGVRYITEALIKKLTKQDNLALIKSLNLSLSKDGGKKFKYIEN
LEKCVKLEVLNLSYNLIGKIEKLDKLLKLRELNLSYNKISKIEGIENMCNLQKLNLAGNE

My code:

f=open('seq.fasta','r')

for idline in f:
    ID = re.findall('|......|',idline)
    print ID
    break

Any help would be appreciated, thank you in advance!

1 Answer 1

1

If the ID is always in the middle of the two vertical bars then you could do something like this and not even worry about regular expressions. (Judging by your example it is safe to assume they are always in the middle!)

f=open('seq.fasta','r')

for idline in f:
    if '>' in idline:
        lineSplit = idline.split('|')
        ID = lineSplit[1]
        print ID

Output

Q12955
Q16659
Q7Z7A1

If it does vary you could do something like this and loop through until you get the section beginning with Q and then print that. The two give you the same results.

f=open('seq.fasta','r')

for idline in f:
    if '>' in idline:
        lineSplit = idline.split('|')
        for section in lineSplit:
            if (('OS=' not in section) and ('>sp' not in section)):
                ID = section
                print ID
Sign up to request clarification or add additional context in comments.

1 Comment

This only works if ID is at the beginning of the line. Depending on the FASTA file, it could be anywhere in the line beginning with >.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.