2

I would like to extract a filename from a path using regular expression:

mysting = '/content/drive/My Drive/data/happy (463).jpg'

How do I extract 'happy.jpg'?

I have tried this: '[^/]*$' but the result still includes the number in parenthesis which I do not want: 'happy (463).jpg'

How could I improve it?

1
  • 1
    '/content/drive/My Drive/data/happy (463).jpg'.split("/")[-1] if you don't want to use regex. Or better, '/content/drive/My Drive/data/happy (463).jpg'.split(os.sep)[-1] Commented Dec 15, 2019 at 17:03

3 Answers 3

2

You could use 2 capturing groups. In the first group match / and capture 1+ word chars in group 1.

Then match 1+ digits between parenthesis and capture .jpg asserting the end of the string in group 2.

^.*/(\w+)\s*\(\d+\)(\.jpg)$

In parts that will match

  • ^.*/ Match until last /
  • (\w+) Catpure group 1, match 1+ word chars
  • \s* Match 1+ whitespace chars
  • \(\d+\) Match 1+ digits between parenthesis
  • (\.jpg) Capture group 2, match .jpg
  • $ End of string

Regex demo | Python demo

Then use group 1 and group 2 in the replacement to get happy.jpg

import re

regex = r"^.*/(\w+)\s*\(\d+\)(\.jpg)$"
test_str = "/content/drive/My Drive/data/happy (463).jpg"
result = re.sub(regex,  r"\1\2", test_str, 1)

if result:
    print (result)

Output

happy.jpg
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much, it was very helpful.
1

Without Regex; str methods (str.partition and str.rpartition):

In [185]: filename = mysting.rpartition('/')[-1] 

In [186]: filename 
Out[186]: 'happy (463).jpg'

In [187]: f"{filename.partition(' ')[0]}.{filename.rpartition('.')[-1]}"
Out[187]: 'happy.jpg'

With Regex; re.sub:

re.sub(r'.*/(?!.*/)([^\s]+)[^.]+(\..*)', r'\1\2', mysting)
  • .*/ greedily matches upto last /

  • The zero-width negative lookahead (?!.*/) ensures there is no / in anyplace forward

  • ([^\s]+) matches upto the next whitespace and put as the first captured group

  • [^.]+ matches upto next .

  • (\..*) matches a literal . followed by any number of characters and put as the second captured group; if you want to match more conservatively like 3 characters or even literal .jpg you can do that also

  • in the replacement, only the captured groups are used

Example:

In [183]: mysting = '/content/drive/My Drive/data/happy (463).jpg'

In [184]: re.sub(r'.*/(?!.*/)([^\s]+)[^.]+(\..*)', r'\1\2', mysting)
Out[184]: 'happy.jpg'

Comments

-1

I use javascript.

In javascript case,

const myString="happy (463).jpg";

const result=myString.replace(/\s\(\d*\)/,'');

After you split path in slash separator, you can apply this code.

1 Comment

I glad to help you. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.