1

I read a csv with separate date and time columns. I generated an index from them in the following way:

data = p.read_csv(fileName,usecols=["date","time","price"])
data.set_index(["date","time"],inplace=True)

This however isn't very useful when I want to get the difference in days or hours between rows. How do I generate a single datetime index from the separate date and time columns?

1
  • You should join them together and call pd.to_datetime. Commented Oct 24, 2017 at 6:26

1 Answer 1

1

I think you need parameter parse_dates with nested list with both columns and parameter index_col with new column created by concanecated columns names separated by _:

data = p.read_csv(fileName,
                  usecols=["date","time","price"], 
                  parse_dates=[["date","time"]], 
                  index_col=['date_time'])

Sample:

from pandas.compat import StringIO

temp=u"""date,time,price
2015-01-01,14:00:10,7
2014-01-01,10:20:10,1"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), 
                 usecols=["date","time","price"], 
                 parse_dates=[["date","time"]],
                 index_col=['date_time'])

print (df)
                     price
date_time                 
2015-01-01 14:00:10      7
2014-01-01 10:20:10      1

print (df.index)
DatetimeIndex(['2015-01-01 14:00:10', '2014-01-01 10:20:10'], 
               dtype='datetime64[ns]', 
               name='date_time', freq=None)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, that works for me. Using parse_dates that way seem to significantly increase the processing time though. I have about 75,000 rows so I guess it should be expected.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.