Python - adding datetime column to dataframe

Question

I have a huge number of excel files i need to extract data from, hopefully into a pandas df. The file contains a number of columns, one of which is time as a string like "16:30"

The filenames are "Monday 21st September 2020.xlsx" for example

I'm trying to loop through the files, and add a datetime column that includes the date from the filename, and the time from the column in the excel file. I've tried the following with the loop limited to one file

import pandas as pd
import datetime
import dateutil
import glob
import pathlib

folder = r"C:\temp\Friday 1st April 2022 (SB).xlsx"

for file in glob.glob(folder, recursive=False):
    #read in the excel file
    df = pd.read_excel(file, sheet_name="SB", usecols="B,I,J")
    
    #workout the date from the file name
    filedate = dateutil.parser.parse(pathlib.Path(file).stem.replace(" (SB)",""))
    
    #print filedate because it doesnt end up in the df correctly!
    print(type(filedate))
    print(filedate)
    df.insert(0,'Date', racedate.strftime('%d-$m-%Y'))
  
    print(df)

And that gives, this..... So the date add to the df is going wrong somewhere

C:\temp\Friday 1st April 2022 (SB).xlsx
<class 'datetime.datetime'>
2022-04-01 00:00:00
           Date   Time  R1  R2
0    01-$m-2022  16:30   9   5
1    01-$m-2022  16:30   5   5
2    01-$m-2022  16:30   6   5
3    01-$m-2022  16:30   3   6
4    01-$m-2022  16:30   3   3
..          ...    ...  ..  ..
446  01-$m-2022  16:15   3   4
447  01-$m-2022  16:15   3   3
448  01-$m-2022  16:15   3   3
449  01-$m-2022  16:15   5   3
450  01-$m-2022  16:15   5   4

[451 rows x 4 columns]

Also once i get this sorted i want to merge the two Date and Time columns into one datetime object.

It looks like you've just accidentally put a $M in your string format instead of %M, hence the $M output. I think correcting that should give you your desired output for the day column — markd227
– markd227, Commented Jan 23, 2023 at 21:20
I just realised that too, I've changed to %m and it's made no difference so i have edited the original question. — Chris Lewis
– Chris Lewis, Commented Jan 23, 2023 at 21:23

Tasos · Accepted Answer · 2023-01-24 20:30:54Z

1

You can extract the date from the folder variable like this.

from dateutil import parser
import re 

folder = r"C:\temp\Friday 1st April 2022 (SB).xlsx"

date_string = re.findall(r'\d{1,2}[a-z]{2} \w+ \d{4}', folder)[0]
date_obj = parser.parse(date_string)
print(date_obj.strftime("%d-%m-%Y"))

This will print 01-04-2022

The date_string variable is calculated by regex and you get it as 1st April 2022 which the dateutil library can easily parse then.

Regex explanation:

\d{1,2}: one or two digits to match the day of month number

[a-z]{2}: two lowercase letters to match the "st", "nd" and "rd"

\w+: one or more word characters to match the month name

\d{4}: four digits to match the year number

With the above, you can get the 1st April 2022 from the whole path.

edited Jan 24, 2023 at 20:30

answered Jan 24, 2023 at 10:15

Tasos

7,65922 gold badges90 silver badges185 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Chris Lewis Over a year ago

Thanks. The problem is that the folder will in reality end in *.xlsx not a file name. The loop below will then loop through the files adding the correct date column for each file each time.

Tasos Over a year ago

Then instead of the folder use the file instead the loop.

Chris Lewis Over a year ago

Of course! Thanks, I'll give this a try, not that I understand the regex lol

Tasos Over a year ago

@ChrisLewis I added a regex explanation. Hope it is more clear now.

Chris Lewis Over a year ago

That's very clear. Thanks!

Collectives™ on Stack Overflow

Python - adding datetime column to dataframe

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related