0

Let's say I have a dataframe below.

       a        b        c
0    one      two    three
1  three      one      two

I want to make row 0 and 1 to be treated as a same list? or something, since both row contains 'one', 'two', 'three' even though the order is different.

Should I make a new column which stores all the string from a, b, c column such as,

       a        b        c                d
0    one      two    three    one two three
1  three      one      two    three one two

and then compare row 0 and 1 of column d?

After this, I want to do .groupby('d') and as a result, 'one two three' and 'three one two' must not be seperated.

I can't think of a way to solve this and need help.

2
  • can you provide an example of a row that should not be treated the same? Commented Jun 26, 2018 at 16:50
  • a row like one two four should not be treated the same, because row 0, 1 don't have a sting 'four' Commented Jun 26, 2018 at 17:01

2 Answers 2

1

The new column you create should be a tuple, since lists aren't hashable (groupby will fail). So we create the column with tolist() first, then we sort it and transform it to a tuple.

Setup

import pandas as pd

data = {'a': ['one', 'three'], 'b': ['two', 'one'], 'c': ['three', 'two']}
df = pd.DataFrame(data)

Sorting and transforming...

df['d'] = df.values.tolist()
df['d'] = (    
     df['d'].transform(sorted)
         .transform(tuple)
)
print(df.groupby('d').sum()) # I'm calling sum() just to show groupby working 

# prints only one group:
#                           a       b         c
# d
# (one, three, two)  onethree  twoone  threetwo
Sign up to request clarification or add additional context in comments.

2 Comments

sort and transform, get to know new skills, thank you!
Glad I could help. The family of methods to split, combine and apply functions to data provided by pandas is really rich. I always keep the docs at hand.
1

sort the cells in each row before joining to create the grouping string.

use apply with axis=1 to apply this function row-wise.

df['d'] = df.apply(lambda x: ' '.join(x.sort_values()), axis=1)

# outputs:

       a    b      c              d
0    one  two  three  one three two
1  three  one    two  one three two

grouping by d will place both rows in the same group. example:

df.groupby('d').agg('count')

               a  b  c
d
one three two  2  2  2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.