2

I built my own regex exp

r'(\d+[x]\d+[-._](\w+))|(\d+[x]\d+\w+)'

alphanumeric 1x01-e02-03-04
hello-char 2x01-02-03_04
hello 3x02 char 2x01-02-03_04

I have to grab the sub-strings '1x01' and 'e02', '03', '04' or '2x01','02' etc..

String length is variable, for example:

alphanumeric 1x01-e02-03-04

or

alphanumeric 1x01-e02

The first sub-string is always "nnnxnnn" where n is an integer ( max three digit) and the char 'x' is always present in string. The 'e' char is the only letter after 'x' but it's not always present for example 'e02' and '03', but I need both integer.

Is it possible to improve it?

7
  • If you need a solution, please provide the pattern specifications. What are the rules for matching? Do you want to extract the 3x02 too? What if there are just numbers somewhere, like hello 33 char 2x01-02-03_04? Commented Nov 9, 2021 at 21:10
  • Hello @WiktorStribiżew I added rules , thanks Commented Nov 9, 2021 at 21:38
  • But what is the rule to match 03 and 04 in e02-03-04? Should they always be part of a string that contains an x or e char? You don't want to match a standalone number? Commented Nov 9, 2021 at 21:42
  • 1
    @Thefourthbird 03 or 04 are split by '-' char or '_' char Commented Nov 9, 2021 at 21:46
  • @Homer Then see Wiktor's answer. Commented Nov 9, 2021 at 21:47

1 Answer 1

0

You can use

import re

rx = re.compile(r'\b\d+x\d+(?:[-_]e?\d+)*')

texts = ['alphanumeric 1x01-e02-03-04',
'hello-char 2x01-02-03_04',
'hello 3x02 char 2x01-02-03_04']

for text in texts:
    print([re.split(r'[-_]e?', x) for x in rx.findall(text)])

See the Python demo and the regex demo. Output:

[['1x01', '02', '03', '04']]
[['2x01', '02', '03', '04']]
[['3x02'], ['2x01', '02', '03', '04']]

Regex details:

  • \b - word boundary
  • \d+x\d+ - one or more digits, x, one or more digits
  • (?:[-_]e?\d+)* - zero or more repetitions of - or _ and then an optional e and then one or more digits.

After you get each match, you need to split with _ or - (the separators), hence the use of re.split(r'[-_]e?', x) (it matches - or _ and then an optional e.

Sign up to request clarification or add additional context in comments.

5 Comments

thank @Wiktor Stribiżew , but I need only integer for example 'e02' and '03', I need '02' and '03'. another example 'f02' and '03' I have to discard 'f02' because of 'f' char or any char different from 'e'. I grab integer only in these case : when there is an 'x' letter (nnnxnnn) where n is an integer max three digit , or when there are only numbers ( max three digit) or with 'e' letter and in that case I grab only the integer :S
@Homer Ok, so only the e letter is wrong in the output right? Which solution is closer to you? I do not want to modify all of them.
The first . yes Wiktor , only the 'e' char . Thanks
@Homer I fixed the Python demo. The regex stays the same, all you need is to update the re.split inside the list comprehension to make sure it is removed from the final results.
Thank you Wiktor

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.