Python: find string in string

Question

I want to print out the IDs that are in between ">sp|" and "|" from a file, so the output should be:

Q12955
Q16659
Q7Z7A1

Example file f:

>sp|Q12955|ANK3_HUMAN Ankyrin-3 OS=Homo sapiens GN=ANK3 PE=1 SV=3
MAHAASQLKKNRDLEINAEEEPEKKRKHRKRSRDRKKKSDANASYLRAARAGHLEKALDY
IKNGVDINICNQNGLNALHLASKEGHVEVVSELLQREANVDAATKKGNTALHIASLAGQA

>sp|Q16659|MK06_HUMAN Mitogen-activated protein kinase 6 OS=Homo sapiens GN=MAPK6 PE=1 SV=1

MAEKFESLMNIHGFDLGSRYMDLKPLGCGGNGLVFSAVDNDCDKRVAIKKIVLTDPQSVK
HALREIKIIRRLDHDNIVKVFEILGPSGSQLTDDVGSLTELNSVYIVQEYMETDLANVLE
QGPLLEEHARLFMYQLLRGLKYIHSANVLHRDLKPANLFINTEDLVLKIGDFGLARIMDP

>sp|Q7Z7A1|CNTRL_HUMAN Centriolin OS=Homo sapiens GN=CNTRL PE=1 SV=2

MKKGSQQKIFSKAKIPSSSHSPIPSSMSNMRSRSLSPLIGSETLPFHSGGQWCEQVEIAD
ENNMLLDYQDHKGADSHAGVRYITEALIKKLTKQDNLALIKSLNLSLSKDGGKKFKYIEN
LEKCVKLEVLNLSYNLIGKIEKLDKLLKLRELNLSYNKISKIEGIENMCNLQKLNLAGNE

My code:

f=open('seq.fasta','r')

for idline in f:
    ID = re.findall('|......|',idline)
    print ID
    break

Any help would be appreciated, thank you in advance!

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

If the ID is always in the middle of the two vertical bars then you could do something like this and not even worry about regular expressions. (Judging by your example it is safe to assume they are always in the middle!)

f=open('seq.fasta','r')

for idline in f:
    if '>' in idline:
        lineSplit = idline.split('|')
        ID = lineSplit[1]
        print ID

Output

Q12955
Q16659
Q7Z7A1

If it does vary you could do something like this and loop through until you get the section beginning with Q and then print that. The two give you the same results.

f=open('seq.fasta','r')

for idline in f:
    if '>' in idline:
        lineSplit = idline.split('|')
        for section in lineSplit:
            if (('OS=' not in section) and ('>sp' not in section)):
                ID = section
                print ID

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Apr 13, 2015 at 19:55

heinst

8,8048 gold badges47 silver badges80 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MattDMo Over a year ago

This only works if ID is at the beginning of the line. Depending on the FASTA file, it could be anywhere in the line beginning with >.

Collectives™ on Stack Overflow

Python: find string in string

1 Answer 1

Output

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Output

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related