5

I have the following data frame my_df:

Person       event         time
---------------------------------
John          A        2017-10-11
John          B        2017-10-12
John          C        2017-10-14
John          D        2017-10-15
Ann           X        2017-09-01
Ann           Y        2017-09-02
Dave          M        2017-10-05
Dave          N        2017-10-07
Dave          Q        2017-10-20

I want to create a new column, which is the (event, time) pair. It should look like:

Person       event         time        event_time
------------------------------------------------------
John          A        2017-10-11     (A, 2017-10-11)
John          B        2017-10-12     (B, 2017-10-12)
John          C        2017-10-14     (C, 2017-10-14)
John          D        2017-10-15     (D, 2017-10-15)
Ann           X        2017-09-01     (X, 2017-09-01)
Ann           Y        2017-09-02     (Y, 2017-09-02)
Dave          M        2017-10-05     (M, 2017-10-05)
Dave          N        2017-10-07     (N, 2017-10-07)
Dave          Q        2017-10-20     (Q, 2017-10-20)

Here is my code:

my_df['event_time'] = my_df.apply(lambda row: (row['event'] , row['time']), axis=1)

But I got the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
   4309         blocks = form_blocks(arrays, names, axes)
-> 4310         mgr = BlockManager(blocks, axes)
   4311         mgr._consolidate_inplace()

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   2794         if do_integrity_check:
-> 2795             self._verify_integrity()
   2796 

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in _verify_integrity(self)
   3005             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
-> 3006                 construction_error(tot_items, block.shape[1:], self.axes)
   3007         if len(self.items) != tot_items:

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4279     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4280         passed, implied))
   4281 

ValueError: Shape of passed values is (128, 2), indices imply (128, 3)

Any idea what I did wrong in my code? Thanks!

0

3 Answers 3

7

You can use:

my_df['event_time'] = my_df[['event','time']].apply(tuple, axis=1)

Or:

my_df['event_time'] = tuple(zip(my_df['event'], my_df['time']))

Or:

my_df['event_time'] = [tuple(x) for x in my_df[['event','time']].values.tolist()]

All return:

print (my_df)
  Person event        time       event_time
0   John     A  2017-10-11  (A, 2017-10-11)
1   John     B  2017-10-12  (B, 2017-10-12)
2   John     C  2017-10-14  (C, 2017-10-14)
3   John     D  2017-10-15  (D, 2017-10-15)
4    Ann     X  2017-09-01  (X, 2017-09-01)
5    Ann     Y  2017-09-02  (Y, 2017-09-02)
6   Dave     M  2017-10-05  (M, 2017-10-05)
7   Dave     N  2017-10-07  (N, 2017-10-07)
8   Dave     Q  2017-10-20  (Q, 2017-10-20)
Sign up to request clarification or add additional context in comments.

3 Comments

I tried the first approach, but got error: ValueError: Wrong number of items passed 2, placement implies 1 What did I miss here?
Is possible some NaNs? I am going test it.
Yes, I have some event as 'None', but still with timestamp. I was hoping the corresponding tuple could be (None, timestamp)
2

Without apply

df.assign(event_time=list(zip(df.event,df.time)))
Out[1011]: 
  Person event        time        event_time
0   John     A  2017-10-11  (A, 2017-10-11)
1   John     B  2017-10-12  (B, 2017-10-12)
2   John     C  2017-10-14  (C, 2017-10-14)
3   John     D  2017-10-15  (D, 2017-10-15)
4    Ann     X  2017-09-01  (X, 2017-09-01)
5    Ann     Y  2017-09-02  (Y, 2017-09-02)
6   Dave     M  2017-10-05  (M, 2017-10-05)
7   Dave     N  2017-10-07  (N, 2017-10-07)
8   Dave     Q  2017-10-20  (Q, 2017-10-20)

Comments

0
my_df['event_time'] = my_df.apply(lambda x: tuple(x[['event','time']]),axis = 1)

This will be my approach, if you you want to use lambda for running efficiency

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.