0

I'm trying to do a regex pattern to match all groups of A.. in a string until the next A. (Python)

For example: DFDAXDJSDSJDAFGCJASDJASAGXCJAD into:

'AXDJSDSJD'
'AFGCJ'
'ASDJ'
'AS'
'AGXCJ'
'AD'

The closest thing I came up with was:

string="DFDAXDJSDSJDAFGCJASDJASAGXCJAD"
r=re.compile('(A.[!=A]*)+')
matchObj = r.findall(string, re.M|re.I)

which returns AF, AS, ASA, AD

Why does it skip the first one? Why doesn't it return all chars until the next A?

2 Answers 2

2

You could just split the string on A:

>>> s = "DFDAXDJSDSJDAFGCJASDJASAGXCJAD"
>>> s.split('A')
['DFD', 'XDJSDSJD', 'FGCJ', 'SDJ', 'S', 'GXCJ', 'D']

# add a leading `A` to each match 'on the fly'
>>> [ 'A%s' % s for s in  s.split('A') ]
['ADFD', 'AXDJSDSJD', 'AFGCJ', 'ASDJ', 'AS', 'AGXCJ', 'AD']

Or use an optional positive lookahead:

>>> re.findall('(A[^A]+(?=A)?)', s, re.IGNORECASE | re.MULTILINE)
['AXDJSDSJD', 'AFGCJ', 'ASDJ', 'AS', 'AGXCJ', 'AD']

Or simply (if you do not care about some next A - which is equivalent to saying that it is optional):

>>> re.findall('(A[^A]+)', s, re.IGNORECASE | re.MULTILINE)
['AXDJSDSJD', 'AFGCJ', 'ASDJ', 'AS', 'AGXCJ', 'AD']
Sign up to request clarification or add additional context in comments.

5 Comments

The first term isn't a match: ADFD isn't in s.
Thank you, this is probably a different question and a lot more complicated, but what if I wanted to do that for both A and J? Example: 'AXD' 'JSDS' 'JD' ''AFGC' 'J' 'ASD' 'J' 'AS' 'AHXC' 'J' 'AD'
Also, are there some issues with different letters? I tried with different letters and they don't seem to display the same result (only last match/missing first matches)
@user2443271: Here's an example (not sure it is exactly what you want) that would works with two letters: (A[^AJ]+|J[^AJ]+); you can tweak it here: rubular.com/r/64h5ir4YRz
Yes it works. Thank you very much! This one is even more tricky but what if, I wanted to filter the ones where the letter after A or J is an S? The results would be: 'AXD' 'JD' 'AFGC' 'J' 'J' 'AHXC' 'J' 'AD' ('JSDS' 'ASD' and 'AS' would not appear because the second letter is an S)
2

I can propose following method:

string="DFDAXDJSDSJDAddaFGCJASDJASAGXCJAD"
r=re.compile('A[^A]*', re.I|re.M)
matchObj = r.findall(string)
matchObj

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.