1

I have a dataframe like the following:

import pandas as pd

d = {'one':[1,1,1,1,2, 2, 2, 2],
     'two':['a','a','a','b', 'a','a','b','b'],
     'letter':[' a','b','c','a', 'a', 'b', 'a', 'b']}

df = pd.DataFrame(d)
>    one two letter
0    1   a      a
1    1   a      b
2    1   a      c
3    1   b      a
4    2   a      a
5    2   a      b
6    2   b      a
7    2   b      b

And I am trying to convert it to a dataframe like the following, where empty cells are filled with empty string '':

one  two  letter
1    a    a        
          b        
          c         
     b    a         
2    a    a         
          b         
     b    a         
          b          

When I perform groupby with all columns I get a series object that is basically exactly what I am looking for, but not a dataframe:

df.groupby(df.columns.tolist()).size()   
1    a    a         1
          b         1
          c         1
     b    a         1
2    a    a         1
          b         1
     b    a         1
          b         1

How can I get the desired dataframe?

2
  • Could you explain the purpose of this exercise? So far it sounds like an xy problem. Commented Aug 2, 2018 at 5:29
  • 1
    For my situation it is a convenient format to have near the end of an application which involves converting DataFrames into a python "Table" objects I'm using to print tables in a terminal environment. Commented Aug 2, 2018 at 6:05

2 Answers 2

2

You can mask your columns where the value is not the same as the value below, then use where to change it to a blank string:

df[['one','two']] = df[['one','two']].where(df[['one', 'two']].apply(lambda x: x != x.shift()), '')

>>> df
  one two letter
0   1   a      a
1              b
2              c
3       b      a
4   2   a      a
5              b
6       b      a
7              b

some explanation:

Your mask looks like this:

>>> df[['one', 'two']].apply(lambda x: x != x.shift())
     one    two
0   True   True
1  False  False
2  False  False
3  False   True
4   True   True
5  False  False
6  False   True
7  False  False

All that where is doing is finding the values where that is true, and replacing the rest with ''

Sign up to request clarification or add additional context in comments.

Comments

1

The solution to the original problem is to find the dublicated cells in each of the first two columns and set them to empty:

df.loc[df.duplicated(subset=['one', 'two']), 'two'] = ''
df.loc[df.duplicated(subset=['one']),        'one'] = ''

However, the purpose of this transformation is unclear. Perhaps you are trying to solve a wrong problem.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.