I'm trying to search a nucleotide sequence (composed of only A,C,G,T) for a user-defined pattern, using regex:
The relevant code is as follows:
match = re.match(r'{0}'.format(pattern), sequence)
match always returns None, where I need it to return the part of the sequence that matches the user query...
What am I doing wrong?
EDIT: This is how I constructed the search pattern:
askMotif = raw_input('Enter a motif to search for it in the sequence (The wildcard character ‘?’ represents any nucleotide in that position, and * represents none or many nucleotides in that position.): ')
listMotif= []
letterlist = ['A','C','G','T', 'a', 'c','g','t']
for letter in askMotif:
if letter in letterlist:
a = letter.capitalize()
listMotif.append(a)
if letter == '?':
listMotif.append('.')
if letter == '*':
listMotif.append('*?')
pattern = ''
for searcher in listMotif:
pattern+=searcher
Not very pythonic, I know...
'*' -> '.*?'for 0 or more