0

I need to find the space after 3 or 4 digits in a bunch of filenames and replace the space with an underscore. But I can't seem to even find 4 digits together.

s = "the blue dog and blue cat wore blue hats"
p = re.compile(r'blue (?P<animal>dog|cat)')
print(p.sub(r'gray \g<animal>',s))

#Gives basically what I want.
the gray dog and gray cat wore blue hats


s = "7053 MyFile.pptx"
p = re.compile('[0-9][0-9][0-9][0-9](?P<dig> )')
print(p.sub('_\g<dig>', s))

#Takes out the numbers, which I need to keep
_ MyFile.pptx

Everything I seem to do has the expression taking out the digits, which I need to keep.

In the end, I want

7035 MyFile.pptx

to be

7035_MyFile.pptx

4
  • Why overcomplicate a simple task by using regex? You can just split on the whitespace and join the resulting list on "_". Commented Jan 14, 2019 at 21:28
  • There are other file names with white space in it. 1234 Some Other File.pptx Plus I need to get better with my regex :) Commented Jan 14, 2019 at 21:36
  • That's still no issue as long as the filenames start with the digits. But sure you can a use regex for this task; I'm just recommending not to if this goes beyond personal projects/exploration. Commented Jan 14, 2019 at 21:38
  • Why not just use s.replace(' ', '_')? Commented Jan 14, 2019 at 21:44

1 Answer 1

3

I you want to replace 3 or 4 digits followed by a white space with the same digit followed by an underscore, the correct regex syntax/substitution would be:

re.sub(r"([0-9]{3,4})\s", r"\1_", s)

You might have misread how the groups/back references work. What is supposed to be in the group, needs to be inside the parenthesis. If you wanted to use a named group (which is a bit unnecessary):

re.sub(r"(?P<dig>[0-9]{3,4})\s", r"\g<dig>_", s)

Or with a pre-compiled regex akin to your example:

s = "7053 MyFile.pptx"
p = re.compile(r"(?P<dig>[0-9]{3,4})\s")
print(p.sub('\g<dig>_', s))

{3,4} following [0-9] means three or four matches. \s stands for a white space (not just a space).

Actually just looking for 3 digits as written would also match 4 digits, because we do not limit what happens before the matching pattern. Depending on what you are looking for, you may want to limit the matches by prepending the pattern with ^ (beginning of line) or \b empty character at word edge...

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.