I have a huge number of excel files i need to extract data from, hopefully into a pandas df. The file contains a number of columns, one of which is time as a string like "16:30"
The filenames are "Monday 21st September 2020.xlsx" for example
I'm trying to loop through the files, and add a datetime column that includes the date from the filename, and the time from the column in the excel file. I've tried the following with the loop limited to one file
import pandas as pd
import datetime
import dateutil
import glob
import pathlib
folder = r"C:\temp\Friday 1st April 2022 (SB).xlsx"
for file in glob.glob(folder, recursive=False):
#read in the excel file
df = pd.read_excel(file, sheet_name="SB", usecols="B,I,J")
#workout the date from the file name
filedate = dateutil.parser.parse(pathlib.Path(file).stem.replace(" (SB)",""))
#print filedate because it doesnt end up in the df correctly!
print(type(filedate))
print(filedate)
df.insert(0,'Date', racedate.strftime('%d-$m-%Y'))
print(df)
And that gives, this..... So the date add to the df is going wrong somewhere
C:\temp\Friday 1st April 2022 (SB).xlsx
<class 'datetime.datetime'>
2022-04-01 00:00:00
Date Time R1 R2
0 01-$m-2022 16:30 9 5
1 01-$m-2022 16:30 5 5
2 01-$m-2022 16:30 6 5
3 01-$m-2022 16:30 3 6
4 01-$m-2022 16:30 3 3
.. ... ... .. ..
446 01-$m-2022 16:15 3 4
447 01-$m-2022 16:15 3 3
448 01-$m-2022 16:15 3 3
449 01-$m-2022 16:15 5 3
450 01-$m-2022 16:15 5 4
[451 rows x 4 columns]
Also once i get this sorted i want to merge the two Date and Time columns into one datetime object.
$Min your string format instead of%M, hence the $M output. I think correcting that should give you your desired output for the day column