1

I am reading a csv file (SimResults_Daily.csv) into pandas, that is structured as follows:

#, Job_ID, Date/Time, value1, value2,
0, ID1,  05/01  24:00:00, 5, 6 
1, ID2,  05/02  24:00:00, 6, 15 
2, ID3,  05/03  24:00:00, 20, 21 

etc. As the datetime format cannot be read by pandas parse_dates, I have figured out I can use the command: str.replace('24:','00:').

My code currently is:

dateparse = lambda x: pd.datetime.strptime(x, '%m-%d  %H:%M:%S')

df = pd.read_csv('SimResults_Daily.csv',
    skipinitialspace=True,
    date_parser=dateparse,
    parse_dates=['Date/Time'],
    index_col=['Date/Time'],
    usecols=['Job_ID',
    'Date/Time',
    'value1',
    'value2',
    header=0)

Where in the code should I implement the str.replace command?

1 Answer 1

2

You can use:

import pandas as pd
import io

temp=u"""#,Job_ID,Date/Time,value1,value2,
0,ID1,05/01 24:00:00,5,6
1,ID2,05/02 24:00:00,6,15
2,ID3,05/03 24:00:00,20,21"""

dateparse = lambda x: pd.datetime.strptime(x.replace('24:','00:'), '%m/%d  %H:%M:%S')

#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp),
    skipinitialspace=True,
    date_parser=dateparse,
    parse_dates=['Date/Time'],
    index_col=['Date/Time'],
    usecols=['Job_ID', 'Date/Time', 'value1', 'value2'],
    header=0)

print (df)
           Job_ID  value1  value2
Date/Time                        
1900-05-01    ID1       5       6
1900-05-02    ID2       6      15
1900-05-03    ID3      20      21

Another solution with double replace - year can be added also:

dateparse = lambda x: x.replace('24:','00:').replace(' ','/1900 ')

df = pd.read_csv(io.StringIO(temp),
    skipinitialspace=True,
    date_parser=dateparse,
    parse_dates=['Date/Time'],
    index_col=['Date/Time'],
    usecols=['Job_ID', 'Date/Time', 'value1', 'value2'],
    header=0)

print (df)
           Job_ID  value1  value2
Date/Time                        
1900-05-01    ID1       5       6
1900-05-02    ID2       6      15
1900-05-03    ID3      20      21

dateparse = lambda x: x.replace('24:','00:').replace(' ','/2016 ')

df = pd.read_csv(io.StringIO(temp),
    skipinitialspace=True,
    date_parser=dateparse,
    parse_dates=['Date/Time'],
    index_col=['Date/Time'],
    usecols=['Job_ID', 'Date/Time', 'value1', 'value2'],
    header=0)

print (df)
           Job_ID  value1  value2
Date/Time                        
2016-05-01    ID1       5       6
2016-05-02    ID2       6      15
2016-05-03    ID3      20      21
Sign up to request clarification or add additional context in comments.

8 Comments

you are always spot-on!
I am confronted with the task of importing similar datasets, with hourly values rather than daily. Hence, rather than replacing 24: with 00: I would need to move all hours back 1 unit, that is: 24: -> 23:,...,01: -> 00: . How would the code change to do that?
I think same way, only substract one hour like df.index = df.index - pd.Timedelta(1, unit='h')
I think not, unfortunately.
the problem is that hour 24:00 is not recognised by pandas and replacing that with 00:00 would send the data point back to the start of the day..
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.