Remove unpredictable date format at the end of string using regex

Question

I am getting the strings in the following manner, with date in random pattern at the end. But it will only contain underscore , slashes ,numbers or hyphens.

TRAVEL_DELAY_01072015
TRAVEL_DELAY_01_07_2015
TRAVEL_DELAY_2015/01/04
TRAVEL_DELAY_2015-01-04

I need to just take out TRAVEL_DELAY from the above strings . I am using regex for this , but isnt working :

m = re.match("^(.*)[_0-9\/.]+", abovestring)

Padraic Cunningham · Accepted Answer · 2015-02-25 19:49:49Z

3

If you just want the dates just split:

s="""TRAVEL_DELAY_01072015
TRAVEL_DELAY_01_07_2015
TRAVEL_DELAY_2015/01/04
TRAVEL_DELAY_2015-01-04"""

for line in s.splitlines():
    date = line.split("_",2)[-1]

01072015
01_07_2015
2015/01/04
2015-01-04

Or str.replace, there is no need for a regex:

for line in s.splitlines():
    date = line.replace("TRAVEL_DELAY_","")
    print(date)

 01072015
 01_07_2015
 2015/01/04
 2015-01-04

If you were actually trying to parse the dates you could use dateutil and fix the strings:

from dateutil import parser
for line in s.splitlines():
    date = line.replace("TRAVEL_DELAY_","")
    if any(ch in date for ch in ("/","-","_")):
        print(parser.parse(date.replace("_","-")))
    else:
        date = "{}-{}-{}".format(date[:2],date[2:4],date[4:])
        print(parser.parse(date))


2015-01-07 00:00:00
2015-01-07 00:00:00
2015-01-04 00:00:00
2015-01-04 00:00:00

If the digits are only in the date and you want actually want the string not the date:

 s="""TRAVEL_DELAY_01072015
TRAVEL_DELAY_01_07_2015
TRAVEL_DELAY_2015/01/04
Travel_Delay_Data_2015/01/04
TRAVEL_DELAY_2015-01-04"""

for line in s.splitlines():
    ind = next(ind for ind, ele in enumerate(line) if ele.isdigit())
    s = line[:ind-1]
    print(s)

TRAVEL_DELAY
TRAVEL_DELAY
TRAVEL_DELAY
Travel_Delay_Data
TRAVEL_DELAY

edited Feb 25, 2015 at 19:49

answered Feb 25, 2015 at 19:05

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

HavelTheGreat Over a year ago

All of these are great as well. +1 for being like a coding Google.

Ishu Gupta Over a year ago

Thanks for the detailed answer .,The string is not always 'TRAVEL_DELAY', it can be any string with any format of date . So i just need to extract the string part . Apologies for the confusion.

Padraic Cunningham Over a year ago

@IshuGupta, are there always two ` _` in the starting string?

Ishu Gupta Over a year ago

@padraic No the string can be as follows : Travel_Delay_Data

Padraic Cunningham Over a year ago

@IshuGupta, have a look at the last part of the answer and see if we are on the same page

|

HavelTheGreat · Accepted Answer · 2015-02-25 20:06:01Z

3

If that's all you have to do, why not just remove TRAVEL_DELAY instead of matching the rest? You could implement something like this :

m = re.sub('TRAVEL_DELAY', '', m)

If your problem is more complex than this, please do let me know.

EDIT: Based on your comments, you want to remove all alpha characters, so you're looking for this regex.

m = re.sub('[_A-Z_a-z_]','', m)

edited Feb 25, 2015 at 20:06

answered Feb 25, 2015 at 18:55

HavelTheGreat

3,3862 gold badges18 silver badges35 bronze badges

4 Comments

syntonym Over a year ago

A deleted answer used m = line[len('TRAVEL_DELAY':] to archieve that. Can you explain if/why your approach is better?

HavelTheGreat Over a year ago

No real difference, unless of course TRAVEL_DELAY wasn't at the start of the string for some reason, then my method would remove it properly instead of cutting off the start of the string.

Ishu Gupta Over a year ago

The string is not always 'TRAVEL_DELAY', it can be any string with any format of date . So i just need to extract the string part .

Ishu Gupta Over a year ago

Thanks for the detailed answer .,The string is not always 'TRAVEL_DELAY', it can be any string with any format of date . So i just need to extract the string part . Apologies for the confusion

Collectives™ on Stack Overflow

Remove unpredictable date format at the end of string using regex

2 Answers 2

10 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related