Pandas groupby and replace duplicates with empty string

Question

I have a dataframe like the following:

import pandas as pd

d = {'one':[1,1,1,1,2, 2, 2, 2],
     'two':['a','a','a','b', 'a','a','b','b'],
     'letter':[' a','b','c','a', 'a', 'b', 'a', 'b']}

df = pd.DataFrame(d)
>    one two letter
0    1   a      a
1    1   a      b
2    1   a      c
3    1   b      a
4    2   a      a
5    2   a      b
6    2   b      a
7    2   b      b

And I am trying to convert it to a dataframe like the following, where empty cells are filled with empty string '':

one  two  letter
1    a    a        
          b        
          c         
     b    a         
2    a    a         
          b         
     b    a         
          b

When I perform groupby with all columns I get a series object that is basically exactly what I am looking for, but not a dataframe:

df.groupby(df.columns.tolist()).size()   
1    a    a         1
          b         1
          c         1
     b    a         1
2    a    a         1
          b         1
     b    a         1
          b         1

How can I get the desired dataframe?

Could you explain the purpose of this exercise? So far it sounds like an xy problem. — DYZ
– DYZ, Commented Aug 2, 2018 at 5:29
For my situation it is a convenient format to have near the end of an application which involves converting DataFrames into a python "Table" objects I'm using to print tables in a terminal environment. — bwrabbit
– bwrabbit, Commented Aug 2, 2018 at 6:05

sacuL · Accepted Answer · 2018-08-02 05:29:24Z

2

You can mask your columns where the value is not the same as the value below, then use where to change it to a blank string:

df[['one','two']] = df[['one','two']].where(df[['one', 'two']].apply(lambda x: x != x.shift()), '')

>>> df
  one two letter
0   1   a      a
1              b
2              c
3       b      a
4   2   a      a
5              b
6       b      a
7              b

some explanation:

Your mask looks like this:

>>> df[['one', 'two']].apply(lambda x: x != x.shift())
     one    two
0   True   True
1  False  False
2  False  False
3  False   True
4   True   True
5  False  False
6  False   True
7  False  False

All that where is doing is finding the values where that is true, and replacing the rest with ''

answered Aug 2, 2018 at 5:29

sacuL

51.6k9 gold badges88 silver badges115 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

DYZ · Accepted Answer · 2018-08-02 05:34:27Z

1

The solution to the original problem is to find the dublicated cells in each of the first two columns and set them to empty:

df.loc[df.duplicated(subset=['one', 'two']), 'two'] = ''
df.loc[df.duplicated(subset=['one']),        'one'] = ''

However, the purpose of this transformation is unclear. Perhaps you are trying to solve a wrong problem.

answered Aug 2, 2018 at 5:34

DYZ

57.3k10 gold badges73 silver badges101 bronze badges

Collectives™ on Stack Overflow

Pandas groupby and replace duplicates with empty string

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related