Pattern Clustering using Python

Question

I wanted to recognise all the patterns data can be present in "Subset" Column of following data frame.

DataFrame-

Name    Subset    
A-1001  0
A-1001  1
A-1001  2
B-1005  3
B-1005  4
D-1015  0
D-1015  1
D-1015  2
L-650   0 
L-650   5
L-650   6
V-895   3
V-895   4

Here in this data frame, Patterns for Subsets of A-1001, D-1015 Are matching, while B-1005, V-895 have same pattern. Pattern for L-650 is different.

i.e Output Example

Pattern#    Name
1           A-1001, D-1015
2           B-1005, V-895
3           L-650

How can I recognize these patterns using Python?

P.S There may be many unknown patterns.

What is a "pattern"? The number in Subset?

mozway
– mozway

2022-01-27 18:58:46 +00:00
Commented Jan 27, 2022 at 18:58 — mozway
– mozway, Commented Jan 27, 2022 at 18:58
Yes. Subset (0,1,2); (3,4), etc are patterns

spd
– spd

2022-01-28 06:45:34 +00:00
Commented Jan 28, 2022 at 6:45 — spd
– spd, Commented Jan 28, 2022 at 6:45

mozway · Accepted Answer · 2022-01-27 19:06:30Z

0

IIUC, you want to perform a double groupby, once to aggregate the Subsets per name and form "patterns", and a second time to aggregate the patterns to group the Names:

import numpy as np

(df.groupby('Name', as_index=False)
   ['Subset'].agg(frozenset)      # use a tuple instead of frozenset if order matters
   .groupby('Subset', as_index=False)
   .agg(list)                     # use ', '.join instead of list to have a string
   .assign(Pattern=lambda d: np.arange(len(d))+1)
)

output:

      Subset              Name  Pattern
0  (0, 1, 2)  [A-1001, D-1015]        1
1     (3, 4)   [B-1005, V-895]        2
2  (0, 5, 6)           [L-650]        3

answered Jan 27, 2022 at 19:06

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

spd Over a year ago

Please solve them Q1- stackoverflow.com/q/70953148/17778275 Q2- stackoverflow.com/q/70966309/17778275

Collectives™ on Stack Overflow

Pattern Clustering using Python

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related