0

I have a string Job_Cluster_AK_Alaska_Yakutat_CDP.png

From the string above, I want to extract only the word after this word Job_Cluster_AK_Alaska_ and before .png.

So basically I want to extract after fourth word separated by underscore and till the word before .png

I am new to regex.

Finally I want only Yakutat_CDP.

6
  • 1
    You need to describe what exactly you want to do in the general case, not just on this particular string. Commented Jan 21, 2019 at 14:09
  • @interjay, ya i edited now Commented Jan 21, 2019 at 14:11
  • But you still didn't describe what you want to do in the general case. i.e. one which may contain a different string. Commented Jan 21, 2019 at 14:14
  • @interjay, i am trying to rename the files Commented Jan 21, 2019 at 14:16
  • Still not what I asked.... You need to write a general description like "I want to extract the fifth and sixth words, which are separated by underscores." (this is just an example, I don't know if it's what you actually need because you won't say). Otherwise you'll get an answer like the one below which only works with a specific string. Commented Jan 21, 2019 at 14:19

2 Answers 2

2

I think what you are asking for is something like this:

import os

# I think you will have different jobs/pngs, so pass these variables from somewhere
jobPrefix = 'Job_Cluster_AK_Alaska_'
pngString = 'Job_Cluster_AK_Alaska_Yakutat_CDP.png'

# Split filename/extension
pngTitle = os.path.splitext(pngString)[0]

# Get the filename without the jobPrefix
finalTitle = pngTitle[len(jobPrefix):]

Edit

Try to avoid regular expressions as it is much slower in general than string slicing

Sign up to request clarification or add additional context in comments.

Comments

2

You can do it even without regex like so:

s = 'Job_Cluster_AK_Alaska_Yakutat_CDP.png'
print(s[len('Job_Cluster_AK_Alaska_'):-len('.png')])

In essence here I take the substring starting immediately after Job_Cluster_AK_Alaska_ and ending before .png.

Still probably a regex approach is more readable and maintanable:

import re
m = re.match('Job_Cluster_AK_Alaska_(.*).png')
print(m[1])

2 Comments

I don't want to give explicitly the string name inside re.match(), Could you please help me regardless of any string (s) it should give output like after 'Job_Cluster_AK_Alaska_' and before '.png'
I am do not understand what is the string name in this context

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.