Adding a fixed date to pandas dataframe

Question

I am reading some data and creating a dataframe with from_records in which the data contains a text timestamp HH:MM:SS:000000. I can convert to timeseries with pd.to_datetime(data.timestamp, format='%H:%M:%S:%f'). I know the date of the file from the filename. What is a pythonic and performant way to insert the date (and eventually set it as the index)?

Data looks like:

12:00:00:000000 100
12:00:01:123456 200
12:00:02:000000 300

Without the date inserted I get a dataframe that looks like:

1900-01-01 12:00:00.000000 100
1900-01-01 12:00:01.123456 200
1900-01-01 12:00:02.000000 300

And what I'd want is (given date = datetime.date(2017, 6, 28):

2017-06-28 12:00:00.000000 100
2017-06-28 12:00:01.123456 200
2017-06-28 12:00:02.000000 300

pd.to_datetime origin arg sounded like what I want, but it requires the input as a numeric timestamp rather than a string.

You should be able to just do df.index += date. That worked for me anyway. — tommy.carstensen
– tommy.carstensen, Commented Jul 26, 2017 at 16:31

jezrael · Accepted Answer · 2017-06-28 14:17:02Z

2

You can create string by strftime from date and add it to column time:

df['datetime'] = pd.to_datetime(date.strftime('%Y-%m-%d ') + df['time'],
                                format='%Y-%m-%d %H:%M:%S:%f')

print (df)
              time    A                   datetime
0  12:00:00:000000  100 2017-06-28 12:00:00.000000
1  12:00:01:123456  200 2017-06-28 12:00:01.123456
2  12:00:02:000000  300 2017-06-28 12:00:02.000000

And for index:

df.index = pd.to_datetime(date.strftime('%Y-%m-%d ') + df['time'],
                                format='%Y-%m-%d %H:%M:%S:%f')

print (df)
                                       time    A
time                                            
2017-06-28 12:00:00.000000  12:00:00:000000  100
2017-06-28 12:00:01.123456  12:00:01:123456  200
2017-06-28 12:00:02.000000  12:00:02:000000  300

Another solution:

date = datetime.date(2017, 6, 28)
days = date - datetime.date(1900, 1, 1)

df['datetime'] = pd.to_datetime(df['time'],format='%H:%M:%S:%f') + 
                 pd.to_timedelta(days, unit='d')

print (df)
              time    A                   datetime
0  12:00:00:000000  100 2017-06-28 12:00:00.000000
1  12:00:01:123456  200 2017-06-28 12:00:01.123456
2  12:00:02:000000  300 2017-06-28 12:00:02.000000

edited Jun 28, 2017 at 14:17

answered Jun 28, 2017 at 14:09

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Kyle Over a year ago

Functionally that works but is there a faster way that doesn't require the date to be parsed to and from a string? (I have tens of millions of rows).

jezrael Over a year ago

What about another solution?

tommy.carstensen Over a year ago

@jezreal Awesome! It would however be great, if you could specify the date offset, when using read_csv() and similar functions.

Kyle · Accepted Answer · 2017-06-28 14:34:08Z

0

Here is what I ended up with, based on @jezrael's 'Another' answer:

df.index = pd.to_datetime(df.timestamp, format='%H:%M:%S:%f')
days = date - df.index[0].date()
df.index += pd.to_timedelta(days, unit='d')

answered Jun 28, 2017 at 14:34

Kyle

2,9342 gold badges21 silver badges30 bronze badges

Collectives™ on Stack Overflow

Adding a fixed date to pandas dataframe

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related