0

I have the foll. strings in python:

Vladimir_SW_crop_mask_ERA.hdr
Ust_Ordynskiy_Buryatskiy_AO_SW_crop_mask_ERA.hdr
Ingush_WW_crop_mask.dat

I want to parse these strings such that:

  1. Get the crop type which can be either SW or WW

  2. Get the region name which is the text preceding _SW or _WW

I was doing str.split('_')[0] to get region name, but that fails in case of Ust_Ordynskiy_Buryatskiy_AO_SW_crop_mask_ERA.hdr, where the region name is Ust_Ordynskiy_Buryatskiy_AO

2
  • 3
    regexes would'nt make too much sense on constant strings :) Commented Nov 11, 2015 at 19:49
  • thanks @Jasper, it is a bad name for the question :) but 'regular expression in python' was already asked Commented Nov 11, 2015 at 19:50

2 Answers 2

2

You can partition and rpartition to do this:

>>> s = 'Vladimir_SW_crop_mask_ERA.hdr'
>>> s.partition('_crop')[0].rpartition('_')[::2]
('Vladimir', 'SW')
>>> s = 'Ust_Ordynskiy_Buryatskiy_AO_SW_crop_mask_ERA.hdr'
>>> s.partition('_crop')[0].rpartition('_')[::2]
('Ust_Ordynskiy_Buryatskiy_AO', 'SW')
Sign up to request clarification or add additional context in comments.

Comments

1

The following regexp should work:

(.*)_(SW|WW)

Match everything up to an underscore followed by either SW or WW and put this in the first matching group and the following SW or WW in the second group:

import re

strs = ["Vladimir_SW_crop_mask_ERA.hdr",
        "Ust_Ordynskiy_Buryatskiy_AO_SW_crop_mask_ERA.hdr",
        "Ingush_WW_crop_mask.dat"]

for s in strs:
    print(re.match("(.*)_(SW|WW)", s).groups())

Result:

('Vladimir', 'SW')
('Ust_Ordynskiy_Buryatskiy_AO', 'SW')
('Ingush', 'WW')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.