0

I'm using python and pandas to query a table in SQL, store it in a DataFrame, then write it to an excel file (.xlsx).

I'm then using a couple of VBA macros to loop through the columns and do some conditional formatting to highlight outliers.

Everything works fine except the date column which excel gets stuck on and presents an error:

"Method 'Average' of object 'WorksheetFunction' failed"

The date is being stored as a string in the format '20-01-2022' which is presumably causing the error so I need to convert it to an actual datetime format that excel will recognise upon opening the file.

Example:

import pandas as pd

df = pd.DataFrame([[1, '21-06-2022'], [2, '19-08-2022'], [3, '06-04-2022']], columns=['id', 'date'])

df.to_excel("output.xlsx")

If you then open "output.xlsx" and try to use conditional formatting on the 'date' column, or try to =AVERAGE(C2:C4) either nothing happens or you get an error. If you double click into the cell, something happens and excel will suddenly recognise it, but this solution isn't suitable with thousands of cells.

How can I convert dates to a format that excel will recognise immediately upon opening the file?

2
  • Please, check How to make good reproducible pandas examples. Then post minimal reproducible example. The sql part is irrelevant as long as you create sample DF with proper column types. Most likely your date column in DF is string, not datetime object Commented Jan 25, 2023 at 16:58
  • Also make sure that there is indeed problem with date in excel and not some other problem with your VBA code Commented Jan 25, 2023 at 16:59

1 Answer 1

1

Before saving your df to excel, you need to parse those ISO8601 string to dates.

There are several ways to do that.

You can use the pandas.read_sql keyword argument parse_dates to parse specific columns as dates, even specifying the format, which can parse as dates directly.

import pandas as pd

df = pd.read_sql(
    sql,
    con,
    parse_dates={
        "<col1>": {"format": "%y-%m-%d"},
        "<col2>": {"format": "%d/%m/%y"}
    },
)

Same as above, but without a format, parses columns as datetimes and then the dates can be extracted.

import pandas as pd

df = pd.read_sql(sql, con, parse_dates=["<col1>", "<col2>"])
df[["<col1>", "<col2>"]] = df[["<col1>", "<col2>"]].dt.date

You can load then parse manually with pd.to_datetime, and again extract the dates only.

import pandas as pd

df = pd.read_sql(sql, con)
df[["<col1>", "<col2>"]] = pd.to_datetime(df[["<col1>", "<col2>"]]).dt.date

Or you could also just parse with datetime.date.fromisoformat.

import pandas as pd
from datetime import date

df = pd.read_sql(sql, con)
df[["<col1>", "<col2>"]] = df[["<col1>", "<col2>"]].applymap(date.fromisoformat)

NB. no specific ordering was used, but it seems the first method is slightly faster than the others, while also being the most elegant (in my opinion).

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! Adding parse_dates=["date"] to the pd.read_sql() worked for me. Do you know if its possible to format a datetime to display as "day-month-year" WITHOUT converting it back to a string? I imagine that's something I'll have to do in excel as its excel that decides how to display it in its own GUI?
Exactly, that will be an excel formatting issue, you can probably do it in python via openpyxl, but I'm not familiar with a way to do it right if df.to_excel.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.