0

I'm encountering an issue while reading date columns from an Excel file into a Pandas DataFrame. The date values in my Excel sheet are formatted as DD-MM-YYYY (e.g., 05-03-2024), but when I use pd.read_excel, Pandas interprets these values as datetime objects and appends 00:00:00, resulting in output like this:

   Actual Delivery Date
0  2024-03-05 00:00:00
1  2024-03-05 00:00:00
2  2024-03-05 00:00:00
3  2024-03-05 00:00:00

I've tried the following approaches without success:

Using dtype=str when reading the Excel file. Explicitly converting the date column to strings after loading.

import pandas as pd

def load_excel_sheet(file_path, sheet_name):
    excel_file = pd.ExcelFile(file_path, engine='openpyxl')
    df_pandas = pd.read_excel(excel_file, sheet_name=sheet_name, dtype=str)
    
    # Explicitly convert specific date columns to strings
    for col in df_pandas.columns:
        if df_pandas[col].dtype == 'datetime64[ns]':
            df_pandas[col] = df_pandas[col].astype(str)
    
    return df_pandas

def process_data_quality_checks(file_path, sheet_name):
    df = load_excel_sheet(file_path, sheet_name)
    
    for col in df.columns:
        if not all(isinstance(x, str) for x in df[col]):
            print(f"Column {col} has non-string data")
        else:
            print(f"Column {col} is all strings")
    
    return df

file_path = r"path_to_your_excel_file.xlsx"
sheet_name = 'Sheet1'

df = process_data_quality_checks(file_path, sheet_name)
print(df.head())

Despite these efforts, my date columns still appear in the DataFrame with 00:00:00 appended. How can I ensure that Pandas reads these date values strictly as strings without any additional time information?

5
  • The issue most likely comes from excel, ensure the data doesn't have a datetime type in excel Commented Jun 28, 2024 at 13:04
  • I tried creating new excel file and putting data as 05-03-2024. It is still appending 00:00:00 Commented Jun 28, 2024 at 13:11
  • you can format the datetime without the time, then convert to a string Commented Jun 28, 2024 at 13:11
  • Check in excel's format options, typing "05-03-2024" should transform it to a date automatically. Commented Jun 28, 2024 at 13:12
  • yeah I understand but what if I don't have control over source data. I tried using few tools and it was able to automatically dump it as string. But when I try using pandas way for same data 00:00:00 is getting appended Commented Jun 28, 2024 at 13:17

1 Answer 1

1

you can format datetime objects without the time by using the .dt accessor:

from datetime import date

df = pd.DataFrame({'A': pd.date_range("2018-01-01", periods=3, freq="h")})
print(df)

                    A
0 2018-01-01 00:00:00
1 2018-01-01 01:00:00
2 2018-01-01 02:00:00

df['A'] = df['A'].dt.date
print(df)
            A
0  2018-01-01
1  2018-01-01
2  2018-01-01

For your instance you could do:

# Explicitly convert specific date columns to strings
    for col in df_pandas.columns:
        if df_pandas[col].dtype == 'datetime64[ns]':
            df_pandas[col] = df_pandas[col].dt.date
Sign up to request clarification or add additional context in comments.

4 Comments

In this way I would loose time values if other columns have actually time with them
do you know which columns are dates vs timestamps?
I wanted to have something generic which applies to both because I have many different sheets having these conditions. for instance I had used a copy tool to dump as string in tables. It had worked. I wanted something similar
I would need more information to be able to give you a solution. if you know which columns are dates vs timestamps or there is some type of indicator to decipher between the two then you can just add more logic

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.