3

I am reading some data and creating a dataframe with from_records in which the data contains a text timestamp HH:MM:SS:000000. I can convert to timeseries with pd.to_datetime(data.timestamp, format='%H:%M:%S:%f'). I know the date of the file from the filename. What is a pythonic and performant way to insert the date (and eventually set it as the index)?

Data looks like:

12:00:00:000000 100
12:00:01:123456 200
12:00:02:000000 300

Without the date inserted I get a dataframe that looks like:

1900-01-01 12:00:00.000000 100
1900-01-01 12:00:01.123456 200
1900-01-01 12:00:02.000000 300

And what I'd want is (given date = datetime.date(2017, 6, 28):

2017-06-28 12:00:00.000000 100
2017-06-28 12:00:01.123456 200
2017-06-28 12:00:02.000000 300

pd.to_datetime origin arg sounded like what I want, but it requires the input as a numeric timestamp rather than a string.

1
  • You should be able to just do df.index += date. That worked for me anyway. Commented Jul 26, 2017 at 16:31

2 Answers 2

2

You can create string by strftime from date and add it to column time:

df['datetime'] = pd.to_datetime(date.strftime('%Y-%m-%d ') + df['time'],
                                format='%Y-%m-%d %H:%M:%S:%f')

print (df)
              time    A                   datetime
0  12:00:00:000000  100 2017-06-28 12:00:00.000000
1  12:00:01:123456  200 2017-06-28 12:00:01.123456
2  12:00:02:000000  300 2017-06-28 12:00:02.000000

And for index:

df.index = pd.to_datetime(date.strftime('%Y-%m-%d ') + df['time'],
                                format='%Y-%m-%d %H:%M:%S:%f')

print (df)
                                       time    A
time                                            
2017-06-28 12:00:00.000000  12:00:00:000000  100
2017-06-28 12:00:01.123456  12:00:01:123456  200
2017-06-28 12:00:02.000000  12:00:02:000000  300

Another solution:

date = datetime.date(2017, 6, 28)
days = date - datetime.date(1900, 1, 1)

df['datetime'] = pd.to_datetime(df['time'],format='%H:%M:%S:%f') + 
                 pd.to_timedelta(days, unit='d')

print (df)
              time    A                   datetime
0  12:00:00:000000  100 2017-06-28 12:00:00.000000
1  12:00:01:123456  200 2017-06-28 12:00:01.123456
2  12:00:02:000000  300 2017-06-28 12:00:02.000000
Sign up to request clarification or add additional context in comments.

3 Comments

Functionally that works but is there a faster way that doesn't require the date to be parsed to and from a string? (I have tens of millions of rows).
What about another solution?
@jezreal Awesome! It would however be great, if you could specify the date offset, when using read_csv() and similar functions.
0

Here is what I ended up with, based on @jezrael's 'Another' answer:

df.index = pd.to_datetime(df.timestamp, format='%H:%M:%S:%f')
days = date - df.index[0].date()
df.index += pd.to_timedelta(days, unit='d')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.