I have a dataset
Name System
A AZ
A NaN
B AZ
B NaN
B NaN
C AY
C AY
D AZ
E AY
E AY
E NaN
F AZ
F AZ
F NaN
Using this dataset, I need to cluster the dataset based on the number of times "System" is repeated for a particular "Name".
In the above example, Names A, B and D have one "AZ" "Subset" while C, E have two "AY" subsets and F has two AZ so it is a different cluster. We can ignore NaN.
Output Example:
Cluster Names
AZ A,B
AY,AY C,E
AZ,AZ F
How can I do it using Python?
PS. Actual dataset may vary in number of rows and columns Also, how can I do it using ML based classification algorithms like KNN, Naive Bayes, etc?