4
import regex,re


sequence = 'aaaaaaaaaaaabbbbbbbbbbbbcccccccccccc' #being searched
query = 'aaabbbbbbbbbbbbccc' #100% coverage
query_1 = 'aaaabbbbbbbbcbbbcccc' #95% coverage
query_2 = 'aaabbbbcbbbbbcbccc' #90% coverage

threshold = .95
error = len(query_1) - (len(query_1)*threshold) #for query_1 errors must be <= 1

print regex.search(query_1 + '{e<={}}'.format(error),sequence).group(0)

Im trying to add additional parameters to a regex search so it only works if a certain percentage of the query is in sequence being searched.

For example, if I wanted it to be at least 95% coverage it would work for query_1 but it would not work for query_2

1
  • 2
    The fuzzy matching capabilities of the regex module might be what you are looking for. Commented Jul 2, 2013 at 22:21

1 Answer 1

1

Using the regex module:

import regex
sequence = 'aaaaaaaaaaaabbbbbbbbbbbbcccccccccccc' #being searched
query = 'aaabbbbbbbbbbbbccc' #100% coverage
query_1 = 'aaaabbbbbbbbcbbbcccc' #95% coverage
query_2 = 'aaabbbbcbbbbbcbccc' #90% coverage
threshold = 0.97
queries = (query, query_1, query_2)
for q in queries:
    error = int(len(q) - (len(q)*threshold))
    m = regex.search(r'(%s){e<=%d}'%(q,error), sequence)
    print 'match' if m else 'nomatch'
Sign up to request clarification or add additional context in comments.

2 Comments

what is it called when you add (%s)(%d)%(variable1,variable2) ? i want to look at the documentation because i've seen that before @perreal

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.