Merging multiple rows into one based on row name (Python)

Question

So I have a a set of csv files of the following general format:

Post_Type      Time1      Time2      ...      TimeN
Type1          1:12
Type1                     2:34
Type1                                         0:35
Type2          1:11
Type3          5:34
Type3                                         2:45

And I would like to reformat the data frame to be of the format:

Post_Type      Time1      Time2      ...      TimeN
Type1          1:12       2:34                0:35                                      
Type2          1:11
Type3          5:34                           2:45

Im moving to python from R so I have a very very limited understanding of what I'm doing do far in term of manipulating these dataframes in python and I cant seem to find any examples of others attempting to do anything like this. Another way of phrasing what I'm doing is attempting to overlay each row of the same type into one row that contains all of the times each corresponding with their original columns. All columns are predefined in the original csv so I do not need to, nor want to create any more columns.

Possible duplicate of grouping rows in list in pandas groupby — pyeR_biz
– pyeR_biz, Commented Jun 21, 2018 at 1:27
Its not a duplicate because I have more than one column and I dont want each column to have more than its one data point. — CoffeePoweredComputers
– CoffeePoweredComputers, Commented Jun 21, 2018 at 1:32
I edited the post to hopefully clarify. Please comment if you require further clarification. — CoffeePoweredComputers
– CoffeePoweredComputers, Commented Jun 21, 2018 at 1:37

sacuL · Accepted Answer · 2018-06-21 01:53:02Z

2

You could try this: first replace your blank cells with NaN, then use groupby to group on Post_Type and call .first, then re-replace NaN with blank cells:

df.replace('', np.nan).groupby('Post_Type').first().replace(np.nan, '')

Example:

# Original Dataframe
>>> df
  Post_Type Time1 Time2 TimeN
0     Type1  1:12            
1     Type1        2:34      
2     Type1              0:35
3     Type2  1:11            
4     Type3  5:34            
5     Type3              2:45

# Processed:
>>> df.replace('', np.nan).groupby('Post_Type').first().replace(np.nan, '')
          Time1 Time2 TimeN
Post_Type                  
Type1      1:12  2:34  0:35
Type2      1:11            
Type3      5:34        2:45

Note: Personally, I would keep NaNs rather than replace with blank cells, as they can be useful.

edited Jun 21, 2018 at 1:53

answered Jun 21, 2018 at 1:46

sacuL

51.6k9 gold badges88 silver badges115 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

CoffeePoweredComputers Over a year ago

I cant upvote due to my reputation but it worked perfectly! I was trying something similar however I was missing the .first() and that seems to have done the trick. Thanks so much!

sacuL Over a year ago

No problem! Glad it helped. Please consider accepting my answer if it helped (it will give you a couple reputation points too!)

Collectives™ on Stack Overflow

Merging multiple rows into one based on row name (Python)

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related