4

I'm giving pandas a int like this: 01142021223007, the format is '%m%d%Y%H%M%S'. this worked perfectly in 2020. For example:

12192020032906 -> 2020-12-19 03:29:06

Since 2021 it is giving the wrong date:

01142021223007 -> 2021-11-04 22:30:07

Should be 2021-01-14 22:30:07

Code:

self.df['time'] = pd.to_datetime(self.df['time'], format='%m%d%Y%H%M%S', errors = 'coerce')

I am assuming it just skips the 0 at the beginning of 01142020 and therefore gets to 11 4 2020. is there a way to explicitly say MMDDYYYY? format ='%mm%dd%YYYY%HH%MM%SS' does not work.

the CSV file I am reading from:

hum,moist,temp,time
81.1,40,26.30,12192020032906
83.1,38,25.80,12192020033006
85.6,39,25.30,12192020033106
87.3,38,24.90,12192020033206
89.4,38,24.50,12192020033306
90.2,38,24.20,12192020033407
90.9,39,23.90,12192020033506
91.5,38,23.70,12192020033607
92.2,38,23.40,12192020033706
...
57.0,15,25.60,01142021095906
53.6,47,24.30,01142021222407
53.7,44,24.30,01142021222419
54.1,45,24.30,01142021222540
54.9,43,24.30,01142021222706
55.2,43,24.20,01142021222806
55.5,44,24.20,01142021222906
55.7,43,24.20,01142021223007

The resulting pandas df:

          hum  moist  temp                time
0      44.605     40  25.3 2020-12-19 03:29:06
1      45.705     38  24.8 2020-12-19 03:30:06
2      47.080     39  24.3 2020-12-19 03:31:06
3      48.015     38  23.9 2020-12-19 03:32:06
4      49.170     38  23.5 2020-12-19 03:33:06
...       ...    ...   ...                 ...
22387  29.755     45  23.3 2021-11-04 22:25:40
22388  30.195     43  23.3 2021-11-04 22:27:06
22389  30.360     43  23.2 2021-11-04 22:28:06
22390  30.525     44  23.2 2021-11-04 22:29:06
22391  30.635     43  23.2 2021-11-04 22:30:07
5
  • "Im giving pandas a int" - where from - reading it from file or what? Show minimal reproducible example. Commented Jan 14, 2021 at 21:51
  • Added the CSV and pandas df in edit Commented Jan 14, 2021 at 21:54
  • @ALollz that makes sense, thank you. would I convert the column after importing from the csv or would I have to edit the csv? Commented Jan 14, 2021 at 21:55
  • 1
    I would read in the csv with the argument dtype={'time': 'str'} and that should solve your problem. Without that pandas tries to be smart and will cast that column to int because they are all numeric-like values. Commented Jan 14, 2021 at 21:57
  • 1
    Thank you very much, answered! Commented Jan 14, 2021 at 21:59

2 Answers 2

2

The problem is with leading 0s. When reading the csv pandas sees all numeric-like values and infers that the most suitable dtype is int64. You can prevent this by specifying the column should remain a string with the dtype argument, preserving the leading 0s which will give you the proper format.

#`data.csv`
hum,moist,temp,time
89.4,38,24.50,12192020033306
90.2,38,24.20,12192020033407
90.9,39,23.90,12192020033506
91.5,38,23.70,12192020033607
92.2,38,23.40,12192020033706
57.0,15,25.60,01142021095906
53.6,47,24.30,01142021222407
53.7,44,24.30,01142021222419
54.1,45,24.30,01142021222540

df = pd.read_csv('data.csv', dtype={'time': 'str'})
df['time_new'] = pd.to_datetime(df['time'], format='%m%d%Y%H%M%S', errors='coerce')

    hum  moist  temp            time            time_new
0  89.4     38  24.5  12192020033306 2020-12-19 03:33:06
1  90.2     38  24.2  12192020033407 2020-12-19 03:34:07
2  90.9     39  23.9  12192020033506 2020-12-19 03:35:06
3  91.5     38  23.7  12192020033607 2020-12-19 03:36:07
4  92.2     38  23.4  12192020033706 2020-12-19 03:37:06
5  57.0     15  25.6  01142021095906 2021-01-14 09:59:06
6  53.6     47  24.3  01142021222407 2021-01-14 22:24:07
7  53.7     44  24.3  01142021222419 2021-01-14 22:24:19
8  54.1     45  24.3  01142021222540 2021-01-14 22:25:40

Without the dtype option, the leading 0s are removed forcing pandas to try to somehow figure out how 114 represents both month and day, and it decides to go with month 11 and day 4

df = pd.read_csv('/Users/al686/Desktop/data.csv')  # time now int64 
df['time_new'] = pd.to_datetime(df['time'], format='%m%d%Y%H%M%S', errors='coerce')

    hum  moist  temp            time            time_new
0  89.4     38  24.5  12192020033306 2020-12-19 03:33:06
1  90.2     38  24.2  12192020033407 2020-12-19 03:34:07
2  90.9     39  23.9  12192020033506 2020-12-19 03:35:06
3  91.5     38  23.7  12192020033607 2020-12-19 03:36:07
4  92.2     38  23.4  12192020033706 2020-12-19 03:37:06
5  57.0     15  25.6   1142021095906 2021-11-04 09:59:06
6  53.6     47  24.3   1142021222407 2021-11-04 22:24:07
7  53.7     44  24.3   1142021222419 2021-11-04 22:24:19
8  54.1     45  24.3   1142021222540 2021-11-04 22:25:40
Sign up to request clarification or add additional context in comments.

Comments

2

you can pass a date_parser when reading the data from the file

import pandas as pd
from datetime import datetime
   
df = pd.read_csv('your_file.csv', parse_dates=['time'],
                 date_parser=lambda x: datetime.strptime(x, '%m%d%Y%H%M%S'))
print(df)

output

     hum  moist  temp                time
0   81.1     40  26.3 2020-12-19 03:29:06
1   83.1     38  25.8 2020-12-19 03:30:06
2   85.6     39  25.3 2020-12-19 03:31:06
3   87.3     38  24.9 2020-12-19 03:32:06
4   89.4     38  24.5 2020-12-19 03:33:06
5   90.2     38  24.2 2020-12-19 03:34:07
6   90.9     39  23.9 2020-12-19 03:35:06
7   91.5     38  23.7 2020-12-19 03:36:07
8   92.2     38  23.4 2020-12-19 03:37:06
9   57.0     15  25.6 2021-01-14 09:59:06
10  53.6     47  24.3 2021-01-14 22:24:07
11  53.7     44  24.3 2021-01-14 22:24:19
12  54.1     45  24.3 2021-01-14 22:25:40
13  54.9     43  24.3 2021-01-14 22:27:06
14  55.2     43  24.2 2021-01-14 22:28:06
15  55.5     44  24.2 2021-01-14 22:29:06
16  55.7     43  24.2 2021-01-14 22:30:07

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.