0

Goal: return grouping that matches all the beginning sequence but excluding a size sequence.

## List of strings and desired result
strs = [
   '151002 - Some name',       ## ('151002 - ', 'Some name')
   'Another name here',        ## ('', 'Another name here')
   '13-10-07_300x250_NoName',  ## ('13-10-07_', '300x250_NoName')
   '728x90 - nice name'        ## ('', '728x90 - nice name')
]

Attempted Pattern

## This pattern is close
## 
pat = '''
^                       ## From start of string
(                       ## Group 1
   [0-9\- ._/]*         ## Any number or divider
   (?!                  ## Negative Lookahead
      (?:\b|[\- ._/\|]) ## Beginning of word or divider
      \d{1,3}           ## Size start
      (?:x|X)           ## big or small 'x'
      \d{1,3}           ## Size end
   )           
)
(                       ## Group 2
   .*                   ## Everthing else
)
'''

## Matching
[re.compile(pat, re.VERBOSE).match(s).groups() for s in strs]

Attempted Pattern Result

[
   ('151002 - ', 'Some name'),      ## Good
   ('', 'Another name here'),       ## Good
   ('13-10-07_300', 'x250_NoName'), ## Error
   ('728', 'x90 - nice name')       ## Error
]

2 Answers 2

3

I think this might give you what you want:

[re.match(r"^([^x]+[\-_]\s?)?(.*$)", s).groups() for s in strs]

Explanation of regex: Start at the beginning of the string, look for one or more characters that aren't an x that are followed by a hyphen or underscore and possibly followed by a space. That's group one and there can be zero or one of those. Group two is everything else.

EDIT:

Assuming that your strings can have something other than the letter x amongst the numbers, you can modify the code to this:

[re.match(r"^([^a-zA-Z]+[\-_]\s?)?(.*$)", s).groups() for s in strs]
Sign up to request clarification or add additional context in comments.

2 Comments

thanks for the input! While this regex does not appear to meet what I was looking for (see HERE ) your approach does remind me that there are multiple ways to address the issue. Your approach is more a holistic while mine is trying to be very precise. I can see the benefits of this style. Appreciate your post!
@propjk007 Based on the link you provided in your comment, I am assuming that the extra space after the hyphen resulted in a bad output, yes? If so, we can fix that by moving the space outside of the first captured group, like so: [re.match(r"^([^a-zA-Z]+[\-_])?\s?(.*$)", s).groups() for s in strs], which gives us the desired output. You're right that there are multiple ways to address the same issue, especially when regex is involved. I'm happy you were able to find a solution to your problem.
1

i think you misunderstand the use of lookaheads. This pattern should work

((?:(?!\d{1,3}x\d{1,3})[0-9\- ._/])*)(.*)

Regular expression visualization

Debuggex Demo

if you want an explanation, because I know it is a disgusting regex, just ask :)

3 Comments

Wow! @r-nar what an amazing tool! Thank you very much for the share! I still do not get how to use the lookahead your example and tool gets me closer. : ) It seems like every example I came across on the web used the lookahead as a do not include (so in my example if any string had the size--i.e. 300x250--the pattern would fail). So of course following and modifying their logic I put the lookahead in the front of the desired pattern. Do you have any good lookahead references?
i dont really have a good reference but if it helps, think of the lookahead/lookbehind statements as a probe. Whenever the regex reaches one, it will keep it's current position while using 'another' marker to go ahead or behind the string and match whatever is inside the lookahead statement.
also, i use rexegg.com for any of my regex questions, its a good overview site of regex and tricks and tips on how to use it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.