2

I wanted to recognise all the patterns data can be present in "Subset" Column of following data frame.

DataFrame-

Name    Subset    
A-1001  0
A-1001  1
A-1001  2
B-1005  3
B-1005  4
D-1015  0
D-1015  1
D-1015  2
L-650   0 
L-650   5
L-650   6
V-895   3
V-895   4

Here in this data frame, Patterns for Subsets of A-1001, D-1015 Are matching, while B-1005, V-895 have same pattern. Pattern for L-650 is different.

i.e Output Example

Pattern#    Name
1           A-1001, D-1015
2           B-1005, V-895
3           L-650

How can I recognize these patterns using Python?

P.S There may be many unknown patterns.

2
  • What is a "pattern"? The number in Subset? Commented Jan 27, 2022 at 18:58
  • Yes. Subset (0,1,2); (3,4), etc are patterns Commented Jan 28, 2022 at 6:45

1 Answer 1

0

IIUC, you want to perform a double groupby, once to aggregate the Subsets per name and form "patterns", and a second time to aggregate the patterns to group the Names:

import numpy as np

(df.groupby('Name', as_index=False)
   ['Subset'].agg(frozenset)      # use a tuple instead of frozenset if order matters
   .groupby('Subset', as_index=False)
   .agg(list)                     # use ', '.join instead of list to have a string
   .assign(Pattern=lambda d: np.arange(len(d))+1)
)

output:

      Subset              Name  Pattern
0  (0, 1, 2)  [A-1001, D-1015]        1
1     (3, 4)   [B-1005, V-895]        2
2  (0, 5, 6)           [L-650]        3

Sign up to request clarification or add additional context in comments.

1 Comment

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.