I have a pd.Series of list items. I define two locations to be duplicates if they have one or more list items in common. This definition should be transitive, meaning that if locations A and B are duplicates, and locations B and C are duplicates, then locations A and C are duplicates.
Examples:
In [117]: df
Out[117]:
A dupe_group_ix
0 [A, B] 0
1 [D, X] 0
2 [B] 0
3 [D, A] 0
4 [A] 0
All rows are duplicates. Note that row 0 and 1 are duplicates because row 0 and 3 are duplicates, as are row 1 and 3.
In [125]: df
Out[125]:
A dupe_group_ix
0 [A, B] 0
1 [D, X] 1
2 [B] 0
3 [K, D] 1
4 [A] 0
In this examples, there are two separate groups of duplicates.
Ais in0,4index,Bin0,2,Din1, 3?