1

I am getting the strings in the following manner, with date in random pattern at the end. But it will only contain underscore , slashes ,numbers or hyphens.

TRAVEL_DELAY_01072015
TRAVEL_DELAY_01_07_2015
TRAVEL_DELAY_2015/01/04
TRAVEL_DELAY_2015-01-04

I need to just take out TRAVEL_DELAY from the above strings . I am using regex for this , but isnt working :

m = re.match("^(.*)[_0-9\/.]+", abovestring)
0

2 Answers 2

3

If you just want the dates just split:

s="""TRAVEL_DELAY_01072015
TRAVEL_DELAY_01_07_2015
TRAVEL_DELAY_2015/01/04
TRAVEL_DELAY_2015-01-04"""

for line in s.splitlines():
    date = line.split("_",2)[-1]

01072015
01_07_2015
2015/01/04
2015-01-04

Or str.replace, there is no need for a regex:

for line in s.splitlines():
    date = line.replace("TRAVEL_DELAY_","")
    print(date)

 01072015
 01_07_2015
 2015/01/04
 2015-01-04

If you were actually trying to parse the dates you could use dateutil and fix the strings:

from dateutil import parser
for line in s.splitlines():
    date = line.replace("TRAVEL_DELAY_","")
    if any(ch in date for ch in ("/","-","_")):
        print(parser.parse(date.replace("_","-")))
    else:
        date = "{}-{}-{}".format(date[:2],date[2:4],date[4:])
        print(parser.parse(date))


2015-01-07 00:00:00
2015-01-07 00:00:00
2015-01-04 00:00:00
2015-01-04 00:00:00

If the digits are only in the date and you want actually want the string not the date:

 s="""TRAVEL_DELAY_01072015
TRAVEL_DELAY_01_07_2015
TRAVEL_DELAY_2015/01/04
Travel_Delay_Data_2015/01/04
TRAVEL_DELAY_2015-01-04"""

for line in s.splitlines():
    ind = next(ind for ind, ele in enumerate(line) if ele.isdigit())
    s = line[:ind-1]
    print(s)

TRAVEL_DELAY
TRAVEL_DELAY
TRAVEL_DELAY
Travel_Delay_Data
TRAVEL_DELAY
Sign up to request clarification or add additional context in comments.

10 Comments

All of these are great as well. +1 for being like a coding Google.
Thanks for the detailed answer .,The string is not always 'TRAVEL_DELAY', it can be any string with any format of date . So i just need to extract the string part . Apologies for the confusion.
@IshuGupta, are there always two ` _` in the starting string?
@padraic No the string can be as follows : Travel_Delay_Data
@IshuGupta, have a look at the last part of the answer and see if we are on the same page
|
3

If that's all you have to do, why not just remove TRAVEL_DELAY instead of matching the rest? You could implement something like this :

m = re.sub('TRAVEL_DELAY', '', m)

If your problem is more complex than this, please do let me know.

EDIT: Based on your comments, you want to remove all alpha characters, so you're looking for this regex.

m = re.sub('[_A-Z_a-z_]','', m)

4 Comments

A deleted answer used m = line[len('TRAVEL_DELAY':] to archieve that. Can you explain if/why your approach is better?
No real difference, unless of course TRAVEL_DELAY wasn't at the start of the string for some reason, then my method would remove it properly instead of cutting off the start of the string.
The string is not always 'TRAVEL_DELAY', it can be any string with any format of date . So i just need to extract the string part .
Thanks for the detailed answer .,The string is not always 'TRAVEL_DELAY', it can be any string with any format of date . So i just need to extract the string part . Apologies for the confusion

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.