Extract part of string according to pattern using regular expression Python

Question

I have a files that follow a specific format which look something like this:

test_0800_20180102_filepath.csv
anotherone_0800_20180101_hello.csv

The numbers in the middle represent timestamps, so I would like to extract that information. I know that there is a specific pattern which will always be _time_date_, so essentially I want the part of the string that lies between the first and third underscores. I found some examples and somehow similar problems, but I am new to Python and I am having trouble adapting them.

This is what I have implemented thus far:

datetime = re.search(r"\d+_(\d+)_", "test_0800_20180102_filepath.csv")

But the result I get is only the date part:

20180102

But what I actually need is:

0800_20180101

I have tried various things but nothing has really worked up to now. The reason why I did not add any minimal example, is that I know it must be something extremely simple with someone that possesses some experience! — Nisfa
– Nisfa, Commented Jan 10, 2018 at 9:53

meow · Accepted Answer · 2018-01-10 10:31:58Z

5

That's quite simple:

match = re.search(r"_((\d+)_(\d+))_", your_string)

print(match.group(1))  # print time_date >> 0800_20180101
print(match.group(2))  # print time >> 0800
print(match.group(3))  # print date >> 20180101

Note that for such tasks the group operator () inside the regexp is really helpful, it allows you to access certain substrings of a bigger pattern without having to match each one individually (which can sometimes be much more ambiguous than matching a larger one).

The order in which you then access the groups is from 1-n_specified, where group 0 is the whole matched pattern. Groups themselves are assigned from left to right, as defined in your pattern.

On a side note, if you have control over it, use unix timestamps so you only have one number defining both date and time universally.

edited Jan 10, 2018 at 10:31

answered Jan 10, 2018 at 9:51

meow

2,2122 gold badges20 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Nisfa Over a year ago

This is exactly where I have got up to :) but this actually extracts only the date part, not the time part! I need both of them.

Imran · Accepted Answer · 2018-01-10 09:56:52Z

1

They key here is you want everything between the first and the third underscores on each line, so there is no need to worry about designing a regex to match your time and date pattern.

with open('myfile.txt', 'r') as f:
    for line in f:
        x = '_'.join(line.split('_')[1:3])
        print(x)

The problem with your implementation is that you are only capturing the date part of your pattern. If you want to stick with a regex solution then simply move your parentheses to capture the entire pattern you want:

re.search(r"(\d+_\d+)_", "test_0800_20180102_filepath.csv").group(1)

gives:

'0800_20180102'

edited Jan 10, 2018 at 9:56

answered Jan 10, 2018 at 9:51

Imran

13.6k8 gold badges69 silver badges82 bronze badges

Comments

user4020527 · Accepted Answer · 2018-01-10 09:52:11Z

-1

This is very easy to do with .split():

time = filename.split("_")[1]
date = filename.split("_")[2]

answered Jan 10, 2018 at 9:52

user4020527

7349 silver badges22 bronze badges

Collectives™ on Stack Overflow

Extract part of string according to pattern using regular expression Python

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related