I have a DataFrame (df) that resembles the following:
A B
1 2
1 3
1 4
2 5
4 6
4 7
8 9
9 8
I would like to add a column that essentially determines a related cluster based upon the values in columns A and B:
A B C
1 2 a
1 3 a
1 4 a
2 5 a
3 1 a
3 2 a
4 6 a
4 7 a
8 9 b
9 8 b
Note that since 1 (in A) is related to 2 (in B), and 2 (in A) is related to 5 (in B), these are all placed in the same cluster. 8 (in A) is only related to 9 (in B) and are therefore placed in another cluster.
To sum up, how do I define clusters based upon pairwise connections where pairs are defined by two columns in a DataFrame?