0

I'm trying to search a nucleotide sequence (composed of only A,C,G,T) for a user-defined pattern, using regex:

The relevant code is as follows:

    match = re.match(r'{0}'.format(pattern), sequence)

match always returns None, where I need it to return the part of the sequence that matches the user query...

What am I doing wrong?

EDIT: This is how I constructed the search pattern:

   askMotif = raw_input('Enter a motif to search for it in the sequence (The wildcard character ‘?’ represents any nucleotide in that position, and * represents none or many nucleotides in that position.): ')
listMotif= []    
letterlist = ['A','C','G','T', 'a', 'c','g','t']
for letter in askMotif:
    if letter in letterlist:
        a = letter.capitalize()
        listMotif.append(a)
    if letter == '?':
        listMotif.append('.')
    if letter == '*':
        listMotif.append('*?')
pattern = ''
for searcher in listMotif:
    pattern+=searcher

Not very pythonic, I know...

6
  • can your post your test case? Commented Apr 1, 2015 at 22:52
  • do you mean the sequence that i'm searching? it's really long... like more than 1000 chars Commented Apr 1, 2015 at 22:53
  • What happens when you hard code the patterns? Commented Apr 1, 2015 at 22:53
  • Just a small portion of it should be good enough. Commented Apr 1, 2015 at 22:53
  • 2
    I think you mean '*' -> '.*?' for 0 or more Commented Apr 1, 2015 at 23:04

2 Answers 2

2

That should work fine:

>>> tgt='AGAGAGAGACGTACACAC'
>>> re.match(r'{}'.format('ACGT'), tgt)
>>> re.search(r'{}'.format('ACGT'), tgt)
<_sre.SRE_Match object at 0x10a5d6920>

I think it may because you mean to use search vs match


Hint on your posted code:

prompt='''\
    Enter a motif to search for it in the sequence 
    (The wildcard character '?' represents any nucleotide in that position, 
     and * represents none or many nucleotides in that position.)
'''
pattern=None
while pattern==None:
    print prompt
    user_input=raw_input('>>> ')
    letterlist = ['A','C','G','T', '?', '*']
    user_input=user_input.upper()
    if len(user_input)>1 and all(c in letterlist for c in user_input):
        pattern=user_input.replace('?', '.').replace('*', '.*?')
    else:
        print 'Bad pattern, please try again'
Sign up to request clarification or add additional context in comments.

1 Comment

thanks, that works. will accept your answer when stackoverflow allows me to do so :) (in six minutes for some reason)
1

re.match() only matches at the beginning of the sequence. Perhaps you need re.search()?

>>> re.match(r'{0}'.format('bar'), 'foobar').group(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module> 
AttributeError: 'NoneType' object has no attribute 'group'
>>> re.search(r'{0}'.format('bar'), 'foobar').group(0)
'bar'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.