Python - check several columns and compare string

Question

Let's say I have a dataframe below.

       a        b        c
0    one      two    three
1  three      one      two

I want to make row 0 and 1 to be treated as a same list? or something, since both row contains 'one', 'two', 'three' even though the order is different.

Should I make a new column which stores all the string from a, b, c column such as,

       a        b        c                d
0    one      two    three    one two three
1  three      one      two    three one two

and then compare row 0 and 1 of column d?

After this, I want to do .groupby('d') and as a result, 'one two three' and 'three one two' must not be seperated.

I can't think of a way to solve this and need help.

can you provide an example of a row that should not be treated the same? — R Balasubramanian
– R Balasubramanian, Commented Jun 26, 2018 at 16:50
a row like one two four should not be treated the same, because row 0, 1 don't have a sting 'four' — sblck
– sblck, Commented Jun 26, 2018 at 17:01

Tomas Farias · Accepted Answer · 2018-06-26 16:59:14Z

1

The new column you create should be a tuple, since lists aren't hashable (groupby will fail). So we create the column with tolist() first, then we sort it and transform it to a tuple.

Setup

import pandas as pd

data = {'a': ['one', 'three'], 'b': ['two', 'one'], 'c': ['three', 'two']}
df = pd.DataFrame(data)

Sorting and transforming...

df['d'] = df.values.tolist()
df['d'] = (    
     df['d'].transform(sorted)
         .transform(tuple)
)
print(df.groupby('d').sum()) # I'm calling sum() just to show groupby working 

# prints only one group:
#                           a       b         c
# d
# (one, three, two)  onethree  twoone  threetwo

answered Jun 26, 2018 at 16:59

Tomas Farias

1,3531 gold badge14 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sblck Over a year ago

sort and transform, get to know new skills, thank you!

Tomas Farias Over a year ago

Glad I could help. The family of methods to split, combine and apply functions to data provided by pandas is really rich. I always keep the docs at hand.

Haleemur Ali · Accepted Answer · 2018-06-26 17:15:00Z

1

sort the cells in each row before joining to create the grouping string.

use apply with axis=1 to apply this function row-wise.

df['d'] = df.apply(lambda x: ' '.join(x.sort_values()), axis=1)

# outputs:

       a    b      c              d
0    one  two  three  one three two
1  three  one    two  one three two

grouping by d will place both rows in the same group. example:

df.groupby('d').agg('count')

               a  b  c
d
one three two  2  2  2

edited Jun 26, 2018 at 17:15

answered Jun 26, 2018 at 17:08

Haleemur Ali

28.6k6 gold badges67 silver badges89 bronze badges

Collectives™ on Stack Overflow

Python - check several columns and compare string

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related