0

I have a string and I need match that string with an sequence and determine the number of times the matched sequence is found in that sequence But it has following conditions Sequence can contain only ACGT valid chars so seq could be ACGTGTCTG

the string could be ACGnkG where n could be replaced by A or G k could be replaced by C or T

how can we find if the string matches the sequence by substituting valid values for n and k

Is there any regular expression ?

3
  • I have searched for various regex but not able to match as the string requires various substituitions so Commented Sep 5, 2012 at 0:58
  • Thanks so do I have to replace the char at that position with [AG] and [CT] Commented Sep 5, 2012 at 1:05
  • 1
    Do you understand how to build a regex? If not, then the regular expression HOWTO should be more helpful than trying to find a prebaked one specific to your needs. Commented Sep 5, 2012 at 1:28

2 Answers 2

2

re.findall(pattern, string) will return a list with all matches for pattern in string. len(...) will return the number of items in a list.

Sign up to request clarification or add additional context in comments.

2 Comments

len(lst) is wasteful for large strings such as DNA sequences
len() will only be applied to the subset that matches the expression, so that may or may not be an issue depending on how many matches findall() returns.
1

If you want to count occurrences of the pattern:

count_regex = sum(1 for _ in re.finditer(r'ACG[AG][CT]G', s))

If you want to count occurrences of a fixed string that matches first the pattern:

m = re.search(r'ACG[AG][CT]G', s)
count_fixed = s.count(m.group(0), m.start(0)) if m else 0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.