Using Python and Regex get last occurrence and remaining part

Question

I'm trying to use python and regex to get the last set of integers in a filename (string) Which the method does what i need, however I want to also return the inverse or remaining parts of the regex. How can i do that?

Here is the regex ([0-9]+|#+)(?!.*([0-9]+|#+))

import re

values = [
    'image.0001',
    'image###',
    '###image###',
    'image001',
    'image_001',
    '001',
    '0001.image',
    '001image',
    '001_image',
    'image',
    '01_image01',
    '03_image01',
]

pattern = '([0-9]+|#+|@+)'
regex = '{0}(?!.*{0})'.format(pattern)

for v in values:
    result = re.search(regex, v)
    if result:
        print result.groups()

Currently it is returning.... ('01', None) I'd like it to return something like ('image', '0001')

Updated

Optionally is there a way to split the strings by groups of numbers...for example

'image.0001' > ['image.', '0001']
'image###' > ['image', '###']
'###image###' > ['###', 'image', '###']
'image001' > ['image', '001']
'image_001' > ['image_', '001']
'001' > ['001']
'0001.image' > ['0001', '.image']
'001image' > ['001', 'image']
'001_image' > ['001', '_image']
'image' > ['image']
'01_image01' > ['01', '_image', '01']
'03_image01' > ['03', '_image', '01']

Have you tried with re.findall(...) ? See docs.python.org/3/library/re.html#re.findall — lsabi
– lsabi, Commented Jan 7, 2021 at 21:14
What are the expected outputs for 0001.image, 001image, 001_image and image? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jan 7, 2021 at 21:15
that's a good question, is there a way for me to return a dict that returns known parts like prefix = all bits and num = last digit occurence? — JokerMartini
– JokerMartini, Commented Jan 7, 2021 at 21:17
check below. You just need to sub all non-letter and all non-numbers to have either one or the other. In what I answered I followed your "last digit" requirement. — Synthaze
– Synthaze, Commented Jan 7, 2021 at 21:18

Ryszard Czech · Accepted Answer · 2021-01-07 22:02:12Z

EDIT:

Use

re.findall(r'\d+|#+|@+|[^#@\d]+', v)

See proof.

Explanation

--------------------------------------------------------------------------------
  \d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  #+                       '#' (1 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  @+                       '@' (1 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  [^#@\d]+                 any character except: '#', '@', digits (0-
                           9) (1 or more times (matching the most
                           amount possible))

ORIGINAL: Use re.split, add capturing group to keep captured part inside the result:

import re

values = [
    'image.0001',
    'image###',
    '###image###',
    'image001',
    'image_001',
    '001',
    '0001.image',
    '001image',
    '001_image',
    'image',
    '01_image01',
    '03_image01',
]

pattern = '[0-9]+|#+|@+'
regex = re.compile(r'({0})(?!.*(?:{0}))'.format(pattern))
for v in values:
    print(regex.split(v))

See Python proof

Results:

['image.', '0001', '']
['image', '###', '']
['###image', '###', '']
['image', '001', '']
['image_', '001', '']
['', '001', '']
['', '0001', '.image']
['', '001', 'image']
['', '001', '_image']
['image']
['01_image', '01', '']
['03_image', '01', '']

See regex proof.

Explanation

--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    #+                       '#' (1 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    @+                       '@' (1 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      [0-9]+                   any character of: '0' to '9' (1 or
                               more times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      #+                       '#' (1 or more times (matching the
                               most amount possible))
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      @+                       '@' (1 or more times (matching the
                               most amount possible))
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
  )                        end of look-ahead

Is there a way to simply split the string by consecutive numbers? That may work better to avoid the random empty parts of the list
@JokerMartini Not sure what you mean. Removing empty items is easy, list(filter(None, result)).
I've update the question above to show what i mean. Your' solution is on the right track for what im doing but now seeing it in action i think the updated question would provide a better solution
How do i modifier it to split not just numbers and words but also #+|@+ like i have above in my code

Synthaze · Accepted Answer · 2021-01-07 21:12:51Z

0

import re

values = [
    'image.0001',
    'image###',
    '###image###',
    'image001',
    'image_001',
    '001',
    '0001.image',
    '001image',
    '001_image',
    'image',
    '01_image01',
    '03_image01',
]

for v in values:
    print (re.sub(r"[^A-Za-z]+","",v), end = " ")
    print (re.sub(r"(.+[_.]){0,1}[^0-9]+","",v))

Output:

image 0001
image 
image 
image 001
image 001
 001
image 
image 001
image 
image 
image 01
image 01

answered Jan 7, 2021 at 21:12

Synthaze

6,1082 gold badges16 silver badges35 bronze badges

3 Comments

JokerMartini Over a year ago

last occurrence. So '0001.image', should still return a number

Synthaze Over a year ago

You you don't want "the last set of integers in a filename "

JokerMartini Over a year ago

I want the last occurrence. I'll update question

Collectives™ on Stack Overflow

Using Python and Regex get last occurrence and remaining part

2 Answers 2

5 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related