1

I have data upload in MS Excel format. enter image description here

This file has a column with dates in "dd.mm.yyyy 00:00:00" format. Reading file with code:

df = pd.read_excel('data_from_db.xlsx')

I recieve a frame, where dates column has "object" type. Further I convert this column to date format by command:

df['Date_Column'] = pd.to_datetime(df['Date_Column'])

That gives me "datetime64[ns]" type.

But this command does not work correctly each time. I meet rows with muddled data:

  1. somewhere rows have format "yyyy.mm.dd",
  2. somwhere "yyyy.dd.mm".

How should I correctly convert excel column with "dd.mm.yyyy 00:00:00" format to column in pandas dataframe with date type and "dd.mm.yyyy" fromat?

P.S. Also, I noticed this oddity: some values in raw date column have str type, another - float. But I can't wrap my head around it, because raw table is an upload from database.

3
  • Hey there, welcome to StackOverflow! Please provide some more information, e.g. a sample of your data_from_db.xlsx. Have you checked the date format inside the spreadsheet, are they all 'dd.mm.yyyy'? Commented Nov 20, 2018 at 9:57
  • @Finwood thank you for your attention - I uodated question with table image link. Commented Nov 21, 2018 at 13:22
  • @OleV.V. thank you for your advice - I've corrected tag Commented Nov 21, 2018 at 13:22

1 Answer 1

1

Without specifying a format, pd.to_datetime has to guess from the data how a date string is to be interpreted. With default parameters this fails for the second and third row of your data:

In [5]: date_of_hire = pd.Series(['18.01.2018 0:00:00',
                                  '01.02.2018 0:00:00',
                                  '06.11.2018 0:00:00'])                    

In [6]: pd.to_datetime(date_of_hire)
Out[6]: 
0   2018-01-18
1   2018-01-02
2   2018-06-11
dtype: datetime64[ns]

The quickest solution would be to pass dayfirst=True:

In [7]: pd.to_datetime(date_of_hire, dayfirst=True)
Out[7]: 
0   2018-01-18
1   2018-02-01
2   2018-11-06
dtype: datetime64[ns]

If you know the complete format of your data, can specify it directly. This only works if the format is exactly like given, if a row should e.g. lack the time the conversion will fail.

In [8]: pd.to_datetime(date_of_hire, format='%d.%m.%Y %H:%M:%S')
Out[8]: 
0   2018-01-18
1   2018-02-01
2   2018-11-06
dtype: datetime64[ns]

In case you should have little information about the date format, except for it being consistent, pandas has the ability to infer the format from the data beforehand:

In [9]: pd.to_datetime(date_of_hire, infer_datetime_format=True)
Out[9]: 
0   2018-01-18
1   2018-02-01
2   2018-11-06
dtype: datetime64[ns]
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you @Finwood for your comprehensive information. Your answer is really helpfull! Thanks you and StackOverflow :)
In this case, please mark the answer as accepted: stackoverflow.com/help/someone-answers

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.