Amending datetime format while parsing from csv read - pandas

Question

I am reading a csv file (SimResults_Daily.csv) into pandas, that is structured as follows:

#, Job_ID, Date/Time, value1, value2,
0, ID1,  05/01  24:00:00, 5, 6 
1, ID2,  05/02  24:00:00, 6, 15 
2, ID3,  05/03  24:00:00, 20, 21

etc. As the datetime format cannot be read by pandas parse_dates, I have figured out I can use the command: str.replace('24:','00:').

My code currently is:

dateparse = lambda x: pd.datetime.strptime(x, '%m-%d  %H:%M:%S')

df = pd.read_csv('SimResults_Daily.csv',
    skipinitialspace=True,
    date_parser=dateparse,
    parse_dates=['Date/Time'],
    index_col=['Date/Time'],
    usecols=['Job_ID',
    'Date/Time',
    'value1',
    'value2',
    header=0)

Where in the code should I implement the str.replace command?

jezrael · Accepted Answer · 2016-10-10 14:02:21Z

2

You can use:

import pandas as pd
import io

temp=u"""#,Job_ID,Date/Time,value1,value2,
0,ID1,05/01 24:00:00,5,6
1,ID2,05/02 24:00:00,6,15
2,ID3,05/03 24:00:00,20,21"""

dateparse = lambda x: pd.datetime.strptime(x.replace('24:','00:'), '%m/%d  %H:%M:%S')

#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp),
    skipinitialspace=True,
    date_parser=dateparse,
    parse_dates=['Date/Time'],
    index_col=['Date/Time'],
    usecols=['Job_ID', 'Date/Time', 'value1', 'value2'],
    header=0)

print (df)
           Job_ID  value1  value2
Date/Time                        
1900-05-01    ID1       5       6
1900-05-02    ID2       6      15
1900-05-03    ID3      20      21

Another solution with double replace - year can be added also:

dateparse = lambda x: x.replace('24:','00:').replace(' ','/1900 ')

df = pd.read_csv(io.StringIO(temp),
    skipinitialspace=True,
    date_parser=dateparse,
    parse_dates=['Date/Time'],
    index_col=['Date/Time'],
    usecols=['Job_ID', 'Date/Time', 'value1', 'value2'],
    header=0)

print (df)
           Job_ID  value1  value2
Date/Time                        
1900-05-01    ID1       5       6
1900-05-02    ID2       6      15
1900-05-03    ID3      20      21

dateparse = lambda x: x.replace('24:','00:').replace(' ','/2016 ')

df = pd.read_csv(io.StringIO(temp),
    skipinitialspace=True,
    date_parser=dateparse,
    parse_dates=['Date/Time'],
    index_col=['Date/Time'],
    usecols=['Job_ID', 'Date/Time', 'value1', 'value2'],
    header=0)

print (df)
           Job_ID  value1  value2
Date/Time                        
2016-05-01    ID1       5       6
2016-05-02    ID2       6      15
2016-05-03    ID3      20      21

edited Oct 10, 2016 at 14:02

answered Oct 10, 2016 at 13:55

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Andreuccio Over a year ago

you are always spot-on!

Andreuccio Over a year ago

I am confronted with the task of importing similar datasets, with hourly values rather than daily. Hence, rather than replacing 24: with 00: I would need to move all hours back 1 unit, that is: 24: -> 23:,...,01: -> 00: . How would the code change to do that?

jezrael Over a year ago

I think same way, only substract one hour like df.index = df.index - pd.Timedelta(1, unit='h')

jezrael Over a year ago

I think not, unfortunately.

Andreuccio Over a year ago

the problem is that hour 24:00 is not recognised by pandas and replacing that with 00:00 would send the data point back to the start of the day..

|

Collectives™ on Stack Overflow

Amending datetime format while parsing from csv read - pandas

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related