I'm encountering an issue while reading date columns from an Excel file into a Pandas DataFrame. The date values in my Excel sheet are formatted as DD-MM-YYYY (e.g., 05-03-2024), but when I use pd.read_excel, Pandas interprets these values as datetime objects and appends 00:00:00, resulting in output like this:
Actual Delivery Date
0 2024-03-05 00:00:00
1 2024-03-05 00:00:00
2 2024-03-05 00:00:00
3 2024-03-05 00:00:00
I've tried the following approaches without success:
Using dtype=str when reading the Excel file. Explicitly converting the date column to strings after loading.
import pandas as pd
def load_excel_sheet(file_path, sheet_name):
excel_file = pd.ExcelFile(file_path, engine='openpyxl')
df_pandas = pd.read_excel(excel_file, sheet_name=sheet_name, dtype=str)
# Explicitly convert specific date columns to strings
for col in df_pandas.columns:
if df_pandas[col].dtype == 'datetime64[ns]':
df_pandas[col] = df_pandas[col].astype(str)
return df_pandas
def process_data_quality_checks(file_path, sheet_name):
df = load_excel_sheet(file_path, sheet_name)
for col in df.columns:
if not all(isinstance(x, str) for x in df[col]):
print(f"Column {col} has non-string data")
else:
print(f"Column {col} is all strings")
return df
file_path = r"path_to_your_excel_file.xlsx"
sheet_name = 'Sheet1'
df = process_data_quality_checks(file_path, sheet_name)
print(df.head())
Despite these efforts, my date columns still appear in the DataFrame with 00:00:00 appended. How can I ensure that Pandas reads these date values strictly as strings without any additional time information?