0

I have a list of strings in Python 2.7 like this:

lst = [u'Name1_Cap23_o2_A_20160830_20170831_test.tif', 
    u'Name0_Cap44_o6_B_20150907_20170707.tif',
    u'Name99_Vlog_o88_A_20180101_20180305_exten.tif']

What I would like to do is to extract only the string before the two dates so that I get a list like this:

lst = [u'Name1_Cap23_o2_A_20160830_20170831', 
    u'Name0_Cap44_o6_B_20150907_20170707',
    u'Name99_Vlog_o88_A_20180101_20180305']

What I know is how to extract the two dates with re package, but how can I get the list in the example above using datetime and re package. Does anyone have an idea how I could get the rest of the string?

from datetime import datetime
import re
from datetime import datetime
pattern = re.compile(r'(\d{8})_(\d{8})')
dates = pattern.search(lst[0])
startdate = datetime.strptime(dates.group(1), '%Y%m%d')
enddate = datetime.strptime(dates.group(2), '%Y%m%d')
datestring = format(startdate, '%Y%m%d') + "_" + format(startdate, '%Y%m%d')

2 Answers 2

2

If you only want to match the whole string from the start including the 2 dates you don't need to use a capturing group.

You could match 2 times an underscore and a digit and start the match from the start of the string matching 1+ times a word character \w+ which also matches an underscore.

^\w+_\d{8}_\d{8}

Regex demo | Python demo

For example:

lst = [u'Name1_Cap23_o2_A_20160830_20170831_test.tif',
       u'Name0_Cap44_o6_B_20150907_20170707.tif',
       u'Name99_Vlog_o88_A_20180101_20180305_exten.tif']

pattern = re.compile(r'^\w+_\d{8}_\d{8}')
pattern_list=map(lambda x: pattern.search(x).group(), lst)
print(pattern_list)

Result

[u'Name1_Cap23_o2_A_20160830_20170831', u'Name0_Cap44_o6_B_20150907_20170707', u'Name99_Vlog_o88_A_20180101_20180305']
Sign up to request clarification or add additional context in comments.

Comments

1

Your regular expression was almost correct. I've updated your regular expression from (\d{8})_(\d{8}) to (.+\d{8})_(\d{8}). The added .+ means match any character atleast 1 or more times.

from datetime import datetime
import re

lst = [u'Name1_Cap23_o2_A_20160830_20170831_test.tif',
u'Name0_Cap44_o6_B_20150907_20170707.tif',
u'Name99_Vlog_o88_A_20180101_20180305_exten.tif']

# modify list
for i in range(len(lst)):
  # retrieve full name with date
  new_name_pattern = re.compile(r'(.+\d{8})_(\d{8})')
  new_name = new_name_pattern.search(lst[i])

  # replace current processed string
  lst[i] = new_name.group(1)

# print new list
for i in range(len(lst)):
  print lst[i]

An example can be found here: https://repl.it/repls/InternalOrchidVisitors

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.