0

So I have a a set of csv files of the following general format:

Post_Type      Time1      Time2      ...      TimeN
Type1          1:12
Type1                     2:34
Type1                                         0:35
Type2          1:11
Type3          5:34
Type3                                         2:45

And I would like to reformat the data frame to be of the format:

Post_Type      Time1      Time2      ...      TimeN
Type1          1:12       2:34                0:35                                      
Type2          1:11
Type3          5:34                           2:45

Im moving to python from R so I have a very very limited understanding of what I'm doing do far in term of manipulating these dataframes in python and I cant seem to find any examples of others attempting to do anything like this. Another way of phrasing what I'm doing is attempting to overlay each row of the same type into one row that contains all of the times each corresponding with their original columns. All columns are predefined in the original csv so I do not need to, nor want to create any more columns.

3
  • Possible duplicate of grouping rows in list in pandas groupby Commented Jun 21, 2018 at 1:27
  • Its not a duplicate because I have more than one column and I dont want each column to have more than its one data point. Commented Jun 21, 2018 at 1:32
  • I edited the post to hopefully clarify. Please comment if you require further clarification. Commented Jun 21, 2018 at 1:37

1 Answer 1

2

You could try this: first replace your blank cells with NaN, then use groupby to group on Post_Type and call .first, then re-replace NaN with blank cells:

df.replace('', np.nan).groupby('Post_Type').first().replace(np.nan, '')

Example:

# Original Dataframe
>>> df
  Post_Type Time1 Time2 TimeN
0     Type1  1:12            
1     Type1        2:34      
2     Type1              0:35
3     Type2  1:11            
4     Type3  5:34            
5     Type3              2:45

# Processed:
>>> df.replace('', np.nan).groupby('Post_Type').first().replace(np.nan, '')
          Time1 Time2 TimeN
Post_Type                  
Type1      1:12  2:34  0:35
Type2      1:11            
Type3      5:34        2:45

Note: Personally, I would keep NaNs rather than replace with blank cells, as they can be useful.

Sign up to request clarification or add additional context in comments.

2 Comments

I cant upvote due to my reputation but it worked perfectly! I was trying something similar however I was missing the .first() and that seems to have done the trick. Thanks so much!
No problem! Glad it helped. Please consider accepting my answer if it helped (it will give you a couple reputation points too!)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.