-1

I'm trying to do some fuzzy matching on a string of DNA reads. I'd like to allow for up to 1 substitution error while at the same time allowing a particular basepair to be one of two options (A or G in this case).

I've started with the following:

>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "ATTAGATACCCTGGTAGTCA")
['ATTAGATACCCTGGTAGTCA']

matches as expected because I'm matching against the exact string

>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "GTTAGATACCCTGGTAGTCA")
['GTTAGATACCCTGGTAGTCA']

matches as expected because I'm matching against the exact string except the first base pair has been switched from an A to a G (allowed)

>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "GTTAGATACCCTGGTAGTCx")
['GTTAGATACCCTGGTAGTCx']

matches as expected because a single substitution occurs (C->x)

>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "xTTAGATACCCTGGTAGTCx")
[]

does not match (as expected) because there are two substitutions

>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "xTTAGATACCCTGGTAGTCA")
[]

should have matched, since the first basepair error (x instead of A or G) should have been counted as a substitution.

1 Answer 1

0

You have two substitutions in your last example: the first basepair has been substituted with an x while the last has been changed to an A. You only allow one substitution, so there's no match.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.